[jira] [Commented] (SPARK-36086) The case of the delta table is inconsistent with parquet

2021-08-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395858#comment-17395858
 ] 

Apache Spark commented on SPARK-36086:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/33686

> The case of the delta table is inconsistent with parquet
> 
>
> Key: SPARK-36086
> URL: https://issues.apache.org/jira/browse/SPARK-36086
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Yuming Wang
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> How to reproduce this issue:
> {noformat}
> 1. Add delta-core_2.12-1.0.0-SNAPSHOT.jar to ${SPARK_HOME}/jars.
> 2. bin/spark-shell --conf 
> spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf 
> spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
> {noformat}
> {code:scala}
> spark.sql("create table t1 using parquet as select id, id as lower_id from 
> range(5)")
> spark.sql("CREATE VIEW v1 as SELECT * FROM t1")
> spark.sql("CREATE TABLE t2 USING DELTA PARTITIONED BY (LOWER_ID) SELECT 
> LOWER_ID, ID FROM v1")
> spark.sql("CREATE TABLE t3 USING PARQUET PARTITIONED BY (LOWER_ID) SELECT 
> LOWER_ID, ID FROM v1")
> spark.sql("desc extended t2").show(false)
> spark.sql("desc extended t3").show(false)
> {code}
> {noformat}
> scala> spark.sql("desc extended t2").show(false)
> ++--+---+
> |col_name|data_type   
>   |comment|
> ++--+---+
> |lower_id|bigint  
>   |   |
> |id  |bigint  
>   |   |
> ||
>   |   |
> |# Partitioning  |
>   |   |
> |Part 0  |lower_id
>   |   |
> ||
>   |   |
> |# Detailed Table Information|
>   |   |
> |Name|default.t2  
>   |   |
> |Location
> |file:/Users/yumwang/Downloads/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t2|  
>  |
> |Provider|delta   
>   |   |
> |Table Properties
> |[Type=MANAGED,delta.minReaderVersion=1,delta.minWriterVersion=2]  |  
>  |
> ++--+---+
> scala> spark.sql("desc extended t3").show(false)
> ++--+---+
> |col_name|data_type   
>   |comment|
> ++--+---+
> |ID  |bigint  
>   |null   |
> |LOWER_ID|bigint  
>   |null   |
> |# Partition Information |
>   |   |
> |# col_name  |data_type   
>   |comment|
> |LOWER_ID|bigint  
>   |null   |
> ||
>   |   |
> |# Detailed Table Information|
>   |   |
> |Database|default 
>   |   |
> |Table   |t3  
>   |   |
> |Owner   |yumwang 

[jira] [Commented] (SPARK-36086) The case of the delta table is inconsistent with parquet

2021-08-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395856#comment-17395856
 ] 

Apache Spark commented on SPARK-36086:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/33685

> The case of the delta table is inconsistent with parquet
> 
>
> Key: SPARK-36086
> URL: https://issues.apache.org/jira/browse/SPARK-36086
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Yuming Wang
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> How to reproduce this issue:
> {noformat}
> 1. Add delta-core_2.12-1.0.0-SNAPSHOT.jar to ${SPARK_HOME}/jars.
> 2. bin/spark-shell --conf 
> spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf 
> spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
> {noformat}
> {code:scala}
> spark.sql("create table t1 using parquet as select id, id as lower_id from 
> range(5)")
> spark.sql("CREATE VIEW v1 as SELECT * FROM t1")
> spark.sql("CREATE TABLE t2 USING DELTA PARTITIONED BY (LOWER_ID) SELECT 
> LOWER_ID, ID FROM v1")
> spark.sql("CREATE TABLE t3 USING PARQUET PARTITIONED BY (LOWER_ID) SELECT 
> LOWER_ID, ID FROM v1")
> spark.sql("desc extended t2").show(false)
> spark.sql("desc extended t3").show(false)
> {code}
> {noformat}
> scala> spark.sql("desc extended t2").show(false)
> ++--+---+
> |col_name|data_type   
>   |comment|
> ++--+---+
> |lower_id|bigint  
>   |   |
> |id  |bigint  
>   |   |
> ||
>   |   |
> |# Partitioning  |
>   |   |
> |Part 0  |lower_id
>   |   |
> ||
>   |   |
> |# Detailed Table Information|
>   |   |
> |Name|default.t2  
>   |   |
> |Location
> |file:/Users/yumwang/Downloads/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t2|  
>  |
> |Provider|delta   
>   |   |
> |Table Properties
> |[Type=MANAGED,delta.minReaderVersion=1,delta.minWriterVersion=2]  |  
>  |
> ++--+---+
> scala> spark.sql("desc extended t3").show(false)
> ++--+---+
> |col_name|data_type   
>   |comment|
> ++--+---+
> |ID  |bigint  
>   |null   |
> |LOWER_ID|bigint  
>   |null   |
> |# Partition Information |
>   |   |
> |# col_name  |data_type   
>   |comment|
> |LOWER_ID|bigint  
>   |null   |
> ||
>   |   |
> |# Detailed Table Information|
>   |   |
> |Database|default 
>   |   |
> |Table   |t3  
>   |   |
> |Owner   |yumwang 

[jira] [Commented] (SPARK-36429) JacksonParser should throw exception when data type unsupported.

2021-08-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395857#comment-17395857
 ] 

Apache Spark commented on SPARK-36429:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/33684

> JacksonParser should throw exception when data type unsupported.
> 
>
> Key: SPARK-36429
> URL: https://issues.apache.org/jira/browse/SPARK-36429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, when set spark.sql.timestampType=TIMESTAMP_NTZ, the behavior is 
> different between from_json and from_csv.
> {code:java}
> -- !query
> select from_json('{"t":"26/October/2015"}', 't Timestamp', 
> map('timestampFormat', 'dd/M/'))
> -- !query schema
> struct>
> -- !query output
> {"t":null}
> {code}
> {code:java}
> -- !query
> select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', 
> 'dd/M/'))
> -- !query schema
> struct<>
> -- !query output
> java.lang.Exception
> Unsupported type: timestamp_ntz
> {code}
> We should make from_json throws exception too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36457) Review and fix issues in API docs

2021-08-08 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395848#comment-17395848
 ] 

Gengliang Wang commented on SPARK-36457:


[~beliefer][~linhongliu-db] are you interested in this one?

> Review and fix issues in API docs
> -
>
> Key: SPARK-36457
> URL: https://issues.apache.org/jira/browse/SPARK-36457
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Priority: Blocker
>
> Compare the 3.2.0 API doc with the latest release version 3.1.2. Fix the 
> following issues:
> * Add missing `Since` annotation for new APIs
> * Remove the leaking class/object in API doc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-36457) Review and fix issues in API docs

2021-08-08 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395848#comment-17395848
 ] 

Gengliang Wang edited comment on SPARK-36457 at 8/9/21, 6:21 AM:
-

[~beliefer] [~linhongliu-db] are you interested in this one?


was (Author: gengliang.wang):
[~beliefer][~linhongliu-db] are you interested in this one?

> Review and fix issues in API docs
> -
>
> Key: SPARK-36457
> URL: https://issues.apache.org/jira/browse/SPARK-36457
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Priority: Blocker
>
> Compare the 3.2.0 API doc with the latest release version 3.1.2. Fix the 
> following issues:
> * Add missing `Since` annotation for new APIs
> * Remove the leaking class/object in API doc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36457) Review and fix issues in API docs

2021-08-08 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-36457:

Target Version/s: 3.2.0

> Review and fix issues in API docs
> -
>
> Key: SPARK-36457
> URL: https://issues.apache.org/jira/browse/SPARK-36457
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Priority: Blocker
>
> Compare the 3.2.0 API doc with the latest release version 3.1.2. Fix the 
> following issues:
> * Add missing `Since` annotation for new APIs
> * Remove the leaking class/object in API doc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36457) Review and fix issues in API docs

2021-08-08 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-36457:

Priority: Blocker  (was: Major)

> Review and fix issues in API docs
> -
>
> Key: SPARK-36457
> URL: https://issues.apache.org/jira/browse/SPARK-36457
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Priority: Blocker
>
> Compare the 3.2.0 API doc with the latest release version 3.1.2. Fix the 
> following issues:
> * Add missing `Since` annotation for new APIs
> * Remove the leaking class/object in API doc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36041) Introduce the RocksDBStateStoreProvider in the programming guide

2021-08-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395841#comment-17395841
 ] 

Apache Spark commented on SPARK-36041:
--

User 'xuanyuanking' has created a pull request for this issue:
https://github.com/apache/spark/pull/33683

> Introduce the RocksDBStateStoreProvider in the programming guide
> 
>
> Key: SPARK-36041
> URL: https://issues.apache.org/jira/browse/SPARK-36041
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36041) Introduce the RocksDBStateStoreProvider in the programming guide

2021-08-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36041:


Assignee: Apache Spark

> Introduce the RocksDBStateStoreProvider in the programming guide
> 
>
> Key: SPARK-36041
> URL: https://issues.apache.org/jira/browse/SPARK-36041
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Assignee: Apache Spark
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36041) Introduce the RocksDBStateStoreProvider in the programming guide

2021-08-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395840#comment-17395840
 ] 

Apache Spark commented on SPARK-36041:
--

User 'xuanyuanking' has created a pull request for this issue:
https://github.com/apache/spark/pull/33683

> Introduce the RocksDBStateStoreProvider in the programming guide
> 
>
> Key: SPARK-36041
> URL: https://issues.apache.org/jira/browse/SPARK-36041
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35050) Deprecate Apache Mesos as resource manager

2021-08-08 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-35050:

Labels: release-notes  (was: )

> Deprecate Apache Mesos as resource manager
> --
>
> Key: SPARK-35050
> URL: https://issues.apache.org/jira/browse/SPARK-35050
> Project: Spark
>  Issue Type: Task
>  Components: Mesos, Spark Core
>Affects Versions: 3.2.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Major
>  Labels: release-notes
> Fix For: 3.2.0
>
>
> As highlighted in 
> https://lists.apache.org/thread.html/rab2a820507f7c846e54a847398ab20f47698ec5bce0c8e182bfe51ba%40%3Cdev.mesos.apache.org%3E
>  , Apache Mesos is moving to the attic and ceasing development.
> We can/should maintain support for some time, but, can probably go ahead and 
> deprecate it now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36041) Introduce the RocksDBStateStoreProvider in the programming guide

2021-08-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36041:


Assignee: (was: Apache Spark)

> Introduce the RocksDBStateStoreProvider in the programming guide
> 
>
> Key: SPARK-36041
> URL: https://issues.apache.org/jira/browse/SPARK-36041
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29330) Allow users to chose the name of Spark Shuffle service

2021-08-08 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-29330.
-
Fix Version/s: 3.2.0
   Resolution: Duplicate

> Allow users to chose the name of Spark Shuffle service
> --
>
> Key: SPARK-29330
> URL: https://issues.apache.org/jira/browse/SPARK-29330
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.1.0
>Reporter: Alexander Bessonov
>Priority: Minor
> Fix For: 3.2.0
>
>
> As of now, Spark uses hardcoded value {{spark_shuffle}} as the name of the 
> Shuffle Service.
> HDP distribution of Spark, on the other hand, uses 
> [{{spark2_shuffle}}|https://github.com/hortonworks/spark2-release/blob/HDP-3.1.0.0-78-tag/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala#L117].
>  This is done to be able to run both Spark 1.6 and Spark 2.x on the same 
> Hadoop cluster.
> Running vanilla Spark on HDP cluster with only Spark 2.x shuffle service (HDP 
> favor) running becomes impossible due to the shuffle service name mismatch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36041) Introduce the RocksDBStateStoreProvider in the programming guide

2021-08-08 Thread Yuanjian Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395839#comment-17395839
 ] 

Yuanjian Li commented on SPARK-36041:
-

[~Gengliang.Wang] Thanks for reminding, PR submitted.

> Introduce the RocksDBStateStoreProvider in the programming guide
> 
>
> Key: SPARK-36041
> URL: https://issues.apache.org/jira/browse/SPARK-36041
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34828) YARN Shuffle Service: Support configurability of aux service name and service-specific config overrides

2021-08-08 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-34828:

Labels: release-notes  (was: )

> YARN Shuffle Service: Support configurability of aux service name and 
> service-specific config overrides
> ---
>
> Key: SPARK-34828
> URL: https://issues.apache.org/jira/browse/SPARK-34828
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, YARN
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>  Labels: release-notes
> Fix For: 3.2.0
>
>
> In some cases it may be desirable to run multiple instances of the Spark 
> Shuffle Service which are using different versions of Spark. This can be 
> helpful, for example, when running a YARN cluster with a mixed workload of 
> applications running multiple Spark versions, since a given version of the 
> shuffle service is not always compatible with other versions of Spark. (See 
> SPARK-27780 for more detail on this)
> YARN versions since 2.9.0 support the ability to run shuffle services within 
> an isolated classloader (see YARN-4577), meaning multiple Spark versions can 
> coexist within a single NodeManager.
> To support this from the Spark side, we need to make two enhancements:
> * Make the name of the shuffle service configurable. Currently it is 
> hard-coded to be {{spark_shuffle}} on both the client and server side. The 
> server-side name is not actually used anywhere, as it is the value within the 
> {{yarn.nodemanager.aux-services}} which is considered by the NodeManager to 
> be definitive name. However, if you change this in the configs, the 
> hard-coded name within the client will no longer match. So, this needs to be 
> configurable.
> * Add a way to separately configure the two shuffle service instances. Since 
> the configurations such as the port number are taken from the NodeManager 
> config, they will both try to use the same port, which obviously won't work. 
> So, we need to provide a way to selectively configure the two shuffle service 
> instances. I will go into details on my proposal for how to achieve this 
> within the PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34105) In addition to killing exlcuded/flakey executors which should support decommissioning

2021-08-08 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395833#comment-17395833
 ] 

Gengliang Wang commented on SPARK-34105:


[~holden][~hyukjin.kwon]Shall we mark this as done?

> In addition to killing exlcuded/flakey executors which should support 
> decommissioning
> -
>
> Key: SPARK-34105
> URL: https://issues.apache.org/jira/browse/SPARK-34105
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> Decommissioning will give the executor a chance to migrate it's files to a 
> more stable node.
>  
> Note: we want SPARK-34104 to be integrated as well so that flaky executors 
> which can not decommission are eventually killed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34104) Allow users to specify a maximum decommissioning time

2021-08-08 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395832#comment-17395832
 ] 

Gengliang Wang commented on SPARK-34104:


[~holden][~hyukjin.kwon]Shall we mark this as done?

> Allow users to specify a maximum decommissioning time
> -
>
> Key: SPARK-34104
> URL: https://issues.apache.org/jira/browse/SPARK-34104
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1, 3.2.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> We currently have the ability for users to set the predicted time of the 
> cluster manager or cloud provider to terminate a decommissioning executor, 
> but for nodes where Spark it's self is triggering decommissioning we should 
> add the ability of users to specify a maximum time we want to allow the 
> executor to decommission.
>  
> This is important especially if we start to in more places (like with 
> excluded executors that are found to be flaky, that may or may not be able to 
> decommission successfully).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36457) Review and fix issues in API docs

2021-08-08 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-36457:
--

 Summary: Review and fix issues in API docs
 Key: SPARK-36457
 URL: https://issues.apache.org/jira/browse/SPARK-36457
 Project: Spark
  Issue Type: Improvement
  Components: docs
Affects Versions: 3.2.0
Reporter: Gengliang Wang


Compare the 3.2.0 API doc with the latest release version 3.1.2. Fix the 
following issues:

* Add missing `Since` annotation for new APIs
* Remove the leaking class/object in API doc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34198) Add RocksDB StateStore implementation

2021-08-08 Thread Yuanjian Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395822#comment-17395822
 ] 

Yuanjian Li commented on SPARK-34198:
-

[~Gengliang.Wang] Thanks for reminding. I'll submit the document PR now.

> Add RocksDB StateStore implementation
> -
>
> Key: SPARK-34198
> URL: https://issues.apache.org/jira/browse/SPARK-34198
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Priority: Major
>
> Currently Spark SS only has one built-in StateStore implementation 
> HDFSBackedStateStore. Actually it uses in-memory map to store state rows. As 
> there are more and more streaming applications, some of them requires to use 
> large state in stateful operations such as streaming aggregation and join.
> Several other major streaming frameworks already use RocksDB for state 
> management. So it is proven to be good choice for large state usage. But 
> Spark SS still lacks of a built-in state store for the requirement.
> We would like to explore the possibility to add RocksDB-based StateStore into 
> Spark SS.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36456) Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly

2021-08-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395737#comment-17395737
 ] 

Apache Spark commented on SPARK-36456:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/33682

> Clean up the depredation usage of  o.a.c.io.IOUtils.closeQuietly
> 
>
> Key: SPARK-36456
> URL: https://issues.apache.org/jira/browse/SPARK-36456
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.1.2
>Reporter: Yang Jie
>Priority: Minor
>
> Compilation warnings related to `method closeQuietly in class IOUtils is 
> deprecated` are as follows:
> {code:java}
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/storage/BlockManager.scala:344:
>  [deprecation @ 
> org.apache.spark.storage.BlockManager.BlockStoreUpdater.saveDeserializedValuesToMemoryStore
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:1307:
>  [deprecation @ 
> org.apache.spark.storage.BufferReleasingInputStream.tryOrFetchFailedException 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3142: 
> [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3143: 
> [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:97:
>  [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:98:
>  [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/FileAppenderSuite.scala:383:
>  [deprecation @ 
> org.apache.spark.util.FileAppenderSuite.testRolling.allText.$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:248: 
> [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:249: 
> [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala:150:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.applyFnToBatchByStream
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala:66:
>  [deprecation @ org.apache.spark.sql.execution.streaming.StreamMetadata.read 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:545:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.cancelDeltaFile
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:461:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is depreca

[jira] [Assigned] (SPARK-36456) Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly

2021-08-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36456:


Assignee: (was: Apache Spark)

> Clean up the depredation usage of  o.a.c.io.IOUtils.closeQuietly
> 
>
> Key: SPARK-36456
> URL: https://issues.apache.org/jira/browse/SPARK-36456
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.1.2
>Reporter: Yang Jie
>Priority: Minor
>
> Compilation warnings related to `method closeQuietly in class IOUtils is 
> deprecated` are as follows:
> {code:java}
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/storage/BlockManager.scala:344:
>  [deprecation @ 
> org.apache.spark.storage.BlockManager.BlockStoreUpdater.saveDeserializedValuesToMemoryStore
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:1307:
>  [deprecation @ 
> org.apache.spark.storage.BufferReleasingInputStream.tryOrFetchFailedException 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3142: 
> [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3143: 
> [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:97:
>  [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:98:
>  [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/FileAppenderSuite.scala:383:
>  [deprecation @ 
> org.apache.spark.util.FileAppenderSuite.testRolling.allText.$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:248: 
> [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:249: 
> [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala:150:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.applyFnToBatchByStream
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala:66:
>  [deprecation @ org.apache.spark.sql.execution.streaming.StreamMetadata.read 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:545:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.cancelDeltaFile
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:461:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileMan

[jira] [Assigned] (SPARK-36456) Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly

2021-08-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36456:


Assignee: Apache Spark

> Clean up the depredation usage of  o.a.c.io.IOUtils.closeQuietly
> 
>
> Key: SPARK-36456
> URL: https://issues.apache.org/jira/browse/SPARK-36456
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.1.2
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> Compilation warnings related to `method closeQuietly in class IOUtils is 
> deprecated` are as follows:
> {code:java}
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/storage/BlockManager.scala:344:
>  [deprecation @ 
> org.apache.spark.storage.BlockManager.BlockStoreUpdater.saveDeserializedValuesToMemoryStore
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:1307:
>  [deprecation @ 
> org.apache.spark.storage.BufferReleasingInputStream.tryOrFetchFailedException 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3142: 
> [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3143: 
> [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:97:
>  [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:98:
>  [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/FileAppenderSuite.scala:383:
>  [deprecation @ 
> org.apache.spark.util.FileAppenderSuite.testRolling.allText.$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:248: 
> [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:249: 
> [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala:150:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.applyFnToBatchByStream
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala:66:
>  [deprecation @ org.apache.spark.sql.execution.streaming.StreamMetadata.read 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:545:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.cancelDeltaFile
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:461:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/strea

[jira] [Assigned] (SPARK-36455) Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36455:


Assignee: (was: Apache Spark)

> Provide an example of complex session window via flatMapGroupsWithState
> ---
>
> Key: SPARK-36455
> URL: https://issues.apache.org/jira/browse/SPARK-36455
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> Now that we replaced sessionization example with native support of session 
> window, we may want to provide another example of session window which can 
> only be dealt with flatMapGroupsWithState.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36455) Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395736#comment-17395736
 ] 

Apache Spark commented on SPARK-36455:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/33681

> Provide an example of complex session window via flatMapGroupsWithState
> ---
>
> Key: SPARK-36455
> URL: https://issues.apache.org/jira/browse/SPARK-36455
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> Now that we replaced sessionization example with native support of session 
> window, we may want to provide another example of session window which can 
> only be dealt with flatMapGroupsWithState.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36455) Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36455:


Assignee: Apache Spark

> Provide an example of complex session window via flatMapGroupsWithState
> ---
>
> Key: SPARK-36455
> URL: https://issues.apache.org/jira/browse/SPARK-36455
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Major
>
> Now that we replaced sessionization example with native support of session 
> window, we may want to provide another example of session window which can 
> only be dealt with flatMapGroupsWithState.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36456) Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly

2021-08-08 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-36456:
-
Summary: Clean up the depredation usage of  o.a.c.io.IOUtils.closeQuietly  
(was: Clean up the depredation use of  o.a.c.io.IOUtils.closeQuietly)

> Clean up the depredation usage of  o.a.c.io.IOUtils.closeQuietly
> 
>
> Key: SPARK-36456
> URL: https://issues.apache.org/jira/browse/SPARK-36456
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.1.2
>Reporter: Yang Jie
>Priority: Minor
>
> Compilation warnings related to `method closeQuietly in class IOUtils is 
> deprecated` are as follows:
> {code:java}
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/storage/BlockManager.scala:344:
>  [deprecation @ 
> org.apache.spark.storage.BlockManager.BlockStoreUpdater.saveDeserializedValuesToMemoryStore
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:1307:
>  [deprecation @ 
> org.apache.spark.storage.BufferReleasingInputStream.tryOrFetchFailedException 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3142: 
> [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3143: 
> [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:97:
>  [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:98:
>  [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/FileAppenderSuite.scala:383:
>  [deprecation @ 
> org.apache.spark.util.FileAppenderSuite.testRolling.allText.$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:248: 
> [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:249: 
> [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | 
> origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala:150:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.applyFnToBatchByStream
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala:66:
>  [deprecation @ org.apache.spark.sql.execution.streaming.StreamMetadata.read 
> | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:545:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.cancelDeltaFile
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:461:
>  [deprecation @ 
> org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile
>  | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
> closeQuietly in class IOUtils is deprecated
> [WARNING] 
> /s

[jira] [Created] (SPARK-36456) Clean up the depredation use of o.a.c.io.IOUtils.closeQuietly

2021-08-08 Thread Yang Jie (Jira)
Yang Jie created SPARK-36456:


 Summary: Clean up the depredation use of  
o.a.c.io.IOUtils.closeQuietly
 Key: SPARK-36456
 URL: https://issues.apache.org/jira/browse/SPARK-36456
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Affects Versions: 3.1.2
Reporter: Yang Jie


Compilation warnings related to `method closeQuietly in class IOUtils is 
deprecated` are as follows:
{code:java}
[WARNING] 
/spark-source/core/src/main/scala/org/apache/spark/storage/BlockManager.scala:344:
 [deprecation @ 
org.apache.spark.storage.BlockManager.BlockStoreUpdater.saveDeserializedValuesToMemoryStore
 | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
closeQuietly in class IOUtils is deprecated
[WARNING] 
/spark-source/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:1307:
 [deprecation @ 
org.apache.spark.storage.BufferReleasingInputStream.tryOrFetchFailedException | 
origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
closeQuietly in class IOUtils is deprecated
[WARNING] 
/spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3142: 
[deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | 
origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
closeQuietly in class IOUtils is deprecated
[WARNING] 
/spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3143: 
[deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | 
origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
closeQuietly in class IOUtils is deprecated
[WARNING] 
/spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:97:
 [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile | 
origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
closeQuietly in class IOUtils is deprecated
[WARNING] 
/spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:98:
 [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile | 
origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
closeQuietly in class IOUtils is deprecated
[WARNING] 
/spark-source/core/src/test/scala/org/apache/spark/util/FileAppenderSuite.scala:383:
 [deprecation @ 
org.apache.spark.util.FileAppenderSuite.testRolling.allText.$anonfun | 
origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
closeQuietly in class IOUtils is deprecated
[WARNING] 
/spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:248: 
[deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | 
origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
closeQuietly in class IOUtils is deprecated
[WARNING] 
/spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:249: 
[deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | 
origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
closeQuietly in class IOUtils is deprecated
[WARNING] 
/spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala:150:
 [deprecation @ 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.applyFnToBatchByStream 
| origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
closeQuietly in class IOUtils is deprecated
[WARNING] 
/spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala:66:
 [deprecation @ org.apache.spark.sql.execution.streaming.StreamMetadata.read | 
origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
closeQuietly in class IOUtils is deprecated
[WARNING] 
/spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:545:
 [deprecation @ 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.cancelDeltaFile
 | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
closeQuietly in class IOUtils is deprecated
[WARNING] 
/spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:461:
 [deprecation @ 
org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile 
| origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
closeQuietly in class IOUtils is deprecated
[WARNING] 
/spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:462:
 [deprecation @ 
org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile 
| origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method 
closeQuietly in class IOUtils is deprecated
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.

[jira] [Updated] (SPARK-36455) Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-36455:
-
Summary: Provide an example of complex session window via 
flatMapGroupsWithState  (was: Provide an example of complex session window)

> Provide an example of complex session window via flatMapGroupsWithState
> ---
>
> Key: SPARK-36455
> URL: https://issues.apache.org/jira/browse/SPARK-36455
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> Now that we replaced sessionization example with native support of session 
> window, we may want to provide another example of session window which can 
> only be dealt with flatMapGroupsWithState.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36455) Provide an example of complex session window

2021-08-08 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-36455:


 Summary: Provide an example of complex session window
 Key: SPARK-36455
 URL: https://issues.apache.org/jira/browse/SPARK-36455
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.2.0
Reporter: Jungtaek Lim


Now that we replaced sessionization example with native support of session 
window, we may want to provide another example of session window which can only 
be dealt with flatMapGroupsWithState.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36369) Fix Index.union to follow pandas 1.3

2021-08-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36369.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33634
[https://github.com/apache/spark/pull/33634]

> Fix Index.union to follow pandas 1.3
> 
>
> Key: SPARK-36369
> URL: https://issues.apache.org/jira/browse/SPARK-36369
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36369) Fix Index.union to follow pandas 1.3

2021-08-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-36369:


Assignee: Haejoon Lee

> Fix Index.union to follow pandas 1.3
> 
>
> Key: SPARK-36369
> URL: https://issues.apache.org/jira/browse/SPARK-36369
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32953) Lower memory usage in toPandas with Arrow self_destruct

2021-08-08 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-32953:

Description: 
As described on the mailing list:
 
[http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Reducing-memory-usage-of-toPandas-with-Arrow-quot-self-destruct-quot-option-td30149.html]
 
[https://lists.apache.org/thread.html/r581d7c82ada1c2ac3f0584615785cc60cf5ac231e1f29737d3a6569f%40%3Cdev.spark.apache.org%3E]

toPandas() can as much as double memory usage as both Arrow and Pandas retain a 
copy of a dataframe in memory during the conversion. Arrow >= 0.16 offers a 
self_destruct mode that avoids this with some caveats.

  was:
As described on the mailing list:
[http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Reducing-memory-usage-of-toPandas-with-Arrow-quot-self-destruct-quot-option-td30149.html]

toPandas() can as much as double memory usage as both Arrow and Pandas retain a 
copy of a dataframe in memory during the conversion. Arrow >= 0.16 offers a 
self_destruct mode that avoids this with some caveats.


> Lower memory usage in toPandas with Arrow self_destruct
> ---
>
> Key: SPARK-32953
> URL: https://issues.apache.org/jira/browse/SPARK-32953
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.1
>Reporter: David Li
>Assignee: David Li
>Priority: Major
> Fix For: 3.2.0
>
>
> As described on the mailing list:
>  
> [http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Reducing-memory-usage-of-toPandas-with-Arrow-quot-self-destruct-quot-option-td30149.html]
>  
> [https://lists.apache.org/thread.html/r581d7c82ada1c2ac3f0584615785cc60cf5ac231e1f29737d3a6569f%40%3Cdev.spark.apache.org%3E]
> toPandas() can as much as double memory usage as both Arrow and Pandas retain 
> a copy of a dataframe in memory during the conversion. Arrow >= 0.16 offers a 
> self_destruct mode that avoids this with some caveats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36454) Not push down partition filter to ORCScan for DSv2

2021-08-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36454:


Assignee: Apache Spark

> Not push down partition filter to ORCScan for DSv2
> --
>
> Key: SPARK-36454
> URL: https://issues.apache.org/jira/browse/SPARK-36454
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Apache Spark
>Priority: Minor
>
> Seems to me that partition filter is only used for partition pruning and 
> shouldn't be pushed down to ORCScan. We don't push down partition filter to 
> ORCScan in DSv1, and we don't push down partition filter for parquet in both 
> DSv1 and DSv2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36454) Not push down partition filter to ORCScan for DSv2

2021-08-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395683#comment-17395683
 ] 

Apache Spark commented on SPARK-36454:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/33680

> Not push down partition filter to ORCScan for DSv2
> --
>
> Key: SPARK-36454
> URL: https://issues.apache.org/jira/browse/SPARK-36454
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Minor
>
> Seems to me that partition filter is only used for partition pruning and 
> shouldn't be pushed down to ORCScan. We don't push down partition filter to 
> ORCScan in DSv1, and we don't push down partition filter for parquet in both 
> DSv1 and DSv2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36454) Not push down partition filter to ORCScan for DSv2

2021-08-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395684#comment-17395684
 ] 

Apache Spark commented on SPARK-36454:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/33680

> Not push down partition filter to ORCScan for DSv2
> --
>
> Key: SPARK-36454
> URL: https://issues.apache.org/jira/browse/SPARK-36454
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Minor
>
> Seems to me that partition filter is only used for partition pruning and 
> shouldn't be pushed down to ORCScan. We don't push down partition filter to 
> ORCScan in DSv1, and we don't push down partition filter for parquet in both 
> DSv1 and DSv2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36454) Not push down partition filter to ORCScan for DSv2

2021-08-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36454:


Assignee: Apache Spark

> Not push down partition filter to ORCScan for DSv2
> --
>
> Key: SPARK-36454
> URL: https://issues.apache.org/jira/browse/SPARK-36454
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Apache Spark
>Priority: Minor
>
> Seems to me that partition filter is only used for partition pruning and 
> shouldn't be pushed down to ORCScan. We don't push down partition filter to 
> ORCScan in DSv1, and we don't push down partition filter for parquet in both 
> DSv1 and DSv2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36454) Not push down partition filter to ORCScan for DSv2

2021-08-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36454:


Assignee: (was: Apache Spark)

> Not push down partition filter to ORCScan for DSv2
> --
>
> Key: SPARK-36454
> URL: https://issues.apache.org/jira/browse/SPARK-36454
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Minor
>
> Seems to me that partition filter is only used for partition pruning and 
> shouldn't be pushed down to ORCScan. We don't push down partition filter to 
> ORCScan in DSv1, and we don't push down partition filter for parquet in both 
> DSv1 and DSv2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36432) Upgrade Jetty version to 9.4.43

2021-08-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-36432:


Assignee: Sajith A

> Upgrade Jetty version to 9.4.43
> ---
>
> Key: SPARK-36432
> URL: https://issues.apache.org/jira/browse/SPARK-36432
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Sajith A
>Assignee: Sajith A
>Priority: Minor
> Fix For: 3.2.0
>
>
> Upgrade Jetty version to 9.4.43.v20210629 in current Spark master in order to 
> fix vulnerability https://nvd.nist.gov/vuln/detail/CVE-2021-34429.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36432) Upgrade Jetty version to 9.4.43

2021-08-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36432.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33656
[https://github.com/apache/spark/pull/33656]

> Upgrade Jetty version to 9.4.43
> ---
>
> Key: SPARK-36432
> URL: https://issues.apache.org/jira/browse/SPARK-36432
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Sajith A
>Priority: Minor
> Fix For: 3.2.0
>
>
> Upgrade Jetty version to 9.4.43.v20210629 in current Spark master in order to 
> fix vulnerability https://nvd.nist.gov/vuln/detail/CVE-2021-34429.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36454) Not push down partition filter to ORCScan for DSv2

2021-08-08 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-36454:
--

 Summary: Not push down partition filter to ORCScan for DSv2
 Key: SPARK-36454
 URL: https://issues.apache.org/jira/browse/SPARK-36454
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Huaxin Gao


Seems to me that partition filter is only used for partition pruning and 
shouldn't be pushed down to ORCScan. We don't push down partition filter to 
ORCScan in DSv1, and we don't push down partition filter for parquet in both 
DSv1 and DSv2.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36425) PySpark: support CrossValidatorModel get standard deviation of metrics for each paramMap

2021-08-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36425.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33652
[https://github.com/apache/spark/pull/33652]

> PySpark: support CrossValidatorModel get standard deviation of metrics for 
> each paramMap 
> -
>
> Key: SPARK-36425
> URL: https://issues.apache.org/jira/browse/SPARK-36425
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Affects Versions: 3.2.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
> Fix For: 3.3.0
>
>
> PySpark: support CrossValidatorModel get standard deviation of metrics for 
> each paramMap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36453) Improve consistency processing floating point special literals

2021-08-08 Thread Pablo Langa Blanco (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395658#comment-17395658
 ] 

Pablo Langa Blanco commented on SPARK-36453:


I'm working on it

> Improve consistency processing floating point special literals
> --
>
> Key: SPARK-36453
> URL: https://issues.apache.org/jira/browse/SPARK-36453
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Pablo Langa Blanco
>Priority: Minor
>
> Special literal in floating point are not consistent between cast and json 
> expressions
>  
> {code:java}
> scala> spark.sql("SELECT CAST('+Inf' as Double)").show
> ++
> |CAST(+Inf AS DOUBLE)|
> ++
> |            Infinity|
> ++
> {code}
>  
> {code:java}
> scala> val schema =  StructType(StructField("a", DoubleType) :: Nil)
> scala> Seq("""{"a" : 
> "+Inf"}""").toDF("col1").select(from_json(col("col1"),schema)).show
> +---+
> |from_json(col1)|
> +---+
> |         {null}|
> +---+
> scala> Seq("""{"a" : "+Inf"}""").toDF("col").withColumn("col", 
> from_json(col("col"), StructType.fromDDL("a 
> DOUBLE"))).write.json("/tmp/jsontests12345")
> scala> 
> spark.read.schema(StructType(Seq(StructField("col",schema.json("/tmp/jsontests12345").show
> +--+
> |   col|
> +--+
> |{null}|
> +--+
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36453) Improve consistency processing floating point special literals

2021-08-08 Thread Pablo Langa Blanco (Jira)
Pablo Langa Blanco created SPARK-36453:
--

 Summary: Improve consistency processing floating point special 
literals
 Key: SPARK-36453
 URL: https://issues.apache.org/jira/browse/SPARK-36453
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Pablo Langa Blanco


Special literal in floating point are not consistent between cast and json 
expressions

 
{code:java}
scala> spark.sql("SELECT CAST('+Inf' as Double)").show
++
|CAST(+Inf AS DOUBLE)|
++
|            Infinity|
++
{code}
 
{code:java}
scala> val schema =  StructType(StructField("a", DoubleType) :: Nil)

scala> Seq("""{"a" : 
"+Inf"}""").toDF("col1").select(from_json(col("col1"),schema)).show

+---+
|from_json(col1)|
+---+
|         {null}|
+---+

scala> Seq("""{"a" : "+Inf"}""").toDF("col").withColumn("col", 
from_json(col("col"), StructType.fromDDL("a 
DOUBLE"))).write.json("/tmp/jsontests12345")
scala> 
spark.read.schema(StructType(Seq(StructField("col",schema.json("/tmp/jsontests12345").show

+--+
|   col|
+--+
|{null}|
+--+

{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36452) Add the support in Spark for having group by map datatype column for the scenario that works in Hive

2021-08-08 Thread Saurabh Chawla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Chawla updated SPARK-36452:
---
Description: 
Add the support in Spark for having group by map datatype column for the 
scenario that works in Hive.

In hive the below scenario works 
{code:java}
describe extended complex2;
OK
id                  string 
c1                  map   
Detailed Table Information Table(tableName:complex2, dbName:default, owner:abc, 
createTime:1627994412, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), 
FieldSchema(name:c1, type:map, comment:null)], 
location:/user/hive/warehouse/complex2, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,serdeInfo:SerDeInfo(name:null,
 serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=1})

select * from complex2;
OK
1 {1:"u"}
2 {1:"u",2:"uo"}
1 {1:"u",2:"uo"}
Time taken: 0.363 seconds, Fetched: 3 row(s)

Working Scenario in Hive -: 

select id, c1, count(*) from complex2 group by id, c1;
OK
1 {1:"u"} 1
1 {1:"u",2:"uo"} 1
2 {1:"u",2:"uo"} 1
Time taken: 1.621 seconds, Fetched: 3 row(s)

Failed Scenario in Hive -: failed when map type is present in aggregated 
expression 

select id, max(c1), count(*) from complex2 group by id, c1;

FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or 
complex type containing map<>.
{code}
But in spark this scenario where the group by map column failed for this 
scenario where the map column is used in the select without any aggregation
{code:java}
scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
org.apache.spark.sql.AnalysisException: expression 
spark_catalog.default.complex2.`c1` cannot be used as a grouping expression 
because its data type map is not an orderable data type.;
Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L]
+- SubqueryAlias spark_catalog.default.complex2
 +- HiveTableRelation [`default`.`complex2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], 
Partition Cols: []]
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50)
{code}
There is need to add the this scenario where grouping expression can have map 
type if aggregated expression does not have the that map type reference. This 
helps in migrating the user from hive to Spark.

After the code change 
{code:java}
scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
+---+-++                                                
| id|               c1|count(1)|
+---+-++
|  1|         {1 -> u}|       1|
|  2|{1 -> u, 2 -> uo}|       1|
|  1|{1 -> u, 2 -> uo}|       1|
+---+-++
 {code}

  was:
Add the support in Spark for having group by map datatype column for the 
scenario that works in Hive.

In hive the below scenario works 
{code:java}
describe extended complex2;
OK
id                  string 
c1                  map   
Detailed Table Information Table(tableName:complex2, dbName:default, owner:abc, 
createTime:1627994412, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), 
FieldSchema(name:c1, type:map, comment:null)], 
location:/user/hive/warehouse/complex2, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,serdeInfo:SerDeInfo(name:null,
 serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=1})

select * from complex2;
OK
1 {1:"u"}
2 {1:"u",2:"uo"}
1 {1:"u",2:"uo"}
Time taken: 0.363 seconds, Fetched: 3 row(s)

select id, c1, count(*) from complex2 group by id, c1;
OK
1 {1:"u"} 1
1 {1:"u",2:"uo"} 1
2 {1:"u",2:"uo"} 1
Time taken: 1.621 seconds, Fetched: 3 row(s)

failed when map type is present in aggregated expression 
select id, max(c1), count(*) from complex2 group by id, c1; 
FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or 
complex type containing map<>.
{code}
But in spark this scenario where the group by map column failed for this 
scenario where the map column is used in the select without any aggregation
{code:java}
scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
org.apache.spark.sql.AnalysisException: expression 
spark_catalog.default.complex2.`c1` cannot be used as a grouping expression 
because its data type map is not an orderable data type.;
Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L]
+- SubqueryAlias spark_catalog.default.complex2
 +- HiveTableRelation [`default`.`complex2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], 
Partition Cols: []]
at 
org.apache.spark.s

[jira] [Commented] (SPARK-34198) Add RocksDB StateStore implementation

2021-08-08 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395526#comment-17395526
 ] 

Gengliang Wang commented on SPARK-34198:


[~XuanYuan][~kabhwan][~viirya][~vkorukanti] Thanks for the great work.  I will 
cut 3.2.0 RC1 next week. Please help add documentation for the feature and 
check if there is any remaining work before Spark 3.2.0.  Thanks!

> Add RocksDB StateStore implementation
> -
>
> Key: SPARK-34198
> URL: https://issues.apache.org/jira/browse/SPARK-34198
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Priority: Major
>
> Currently Spark SS only has one built-in StateStore implementation 
> HDFSBackedStateStore. Actually it uses in-memory map to store state rows. As 
> there are more and more streaming applications, some of them requires to use 
> large state in stateful operations such as streaming aggregation and join.
> Several other major streaming frameworks already use RocksDB for state 
> management. So it is proven to be good choice for large state usage. But 
> Spark SS still lacks of a built-in state store for the requirement.
> We would like to explore the possibility to add RocksDB-based StateStore into 
> Spark SS.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36041) Introduce the RocksDBStateStoreProvider in the programming guide

2021-08-08 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395525#comment-17395525
 ] 

Gengliang Wang commented on SPARK-36041:


[~XuanYuan][~kabhwan]What is the status of this one?

> Introduce the RocksDBStateStoreProvider in the programming guide
> 
>
> Key: SPARK-36041
> URL: https://issues.apache.org/jira/browse/SPARK-36041
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36452) Add the support in Spark for having group by map datatype column for the scenario that works in Hive

2021-08-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395524#comment-17395524
 ] 

Apache Spark commented on SPARK-36452:
--

User 'SaurabhChawla100' has created a pull request for this issue:
https://github.com/apache/spark/pull/33679

> Add the support in Spark for having group by map datatype column for the 
> scenario that works in Hive
> 
>
> Key: SPARK-36452
> URL: https://issues.apache.org/jira/browse/SPARK-36452
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Saurabh Chawla
>Priority: Major
>
> Add the support in Spark for having group by map datatype column for the 
> scenario that works in Hive.
> In hive the below scenario works 
> {code:java}
> describe extended complex2;
> OK
> id                  string 
> c1                  map   
> Detailed Table Information Table(tableName:complex2, dbName:default, 
> owner:abc, createTime:1627994412, lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), 
> FieldSchema(name:c1, type:map, comment:null)], 
> location:/user/hive/warehouse/complex2, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,serdeInfo:SerDeInfo(name:null,
>  serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=1})
> select * from complex2;
> OK
> 1 {1:"u"}
> 2 {1:"u",2:"uo"}
> 1 {1:"u",2:"uo"}
> Time taken: 0.363 seconds, Fetched: 3 row(s)
> select id, c1, count(*) from complex2 group by id, c1;
> OK
> 1 {1:"u"} 1
> 1 {1:"u",2:"uo"} 1
> 2 {1:"u",2:"uo"} 1
> Time taken: 1.621 seconds, Fetched: 3 row(s)
> failed when map type is present in aggregated expression 
> select id, max(c1), count(*) from complex2 group by id, c1; 
> FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or 
> complex type containing map<>.
> {code}
> But in spark this scenario where the group by map column failed for this 
> scenario where the map column is used in the select without any aggregation
> {code:java}
> scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
> org.apache.spark.sql.AnalysisException: expression 
> spark_catalog.default.complex2.`c1` cannot be used as a grouping expression 
> because its data type map is not an orderable data type.;
> Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L]
> +- SubqueryAlias spark_catalog.default.complex2
>  +- HiveTableRelation [`default`.`complex2`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], 
> Partition Cols: []]
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50)
> {code}
> There is need to add the this scenario where grouping expression can have map 
> type if aggregated expression does not have the that map type reference. This 
> helps in migrating the user from hive to Spark.
> After the code change 
> {code:java}
> scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
> +---+-++                                              
>   
> | id|               c1|count(1)|
> +---+-++
> |  1|         {1 -> u}|       1|
> |  2|{1 -> u, 2 -> uo}|       1|
> |  1|{1 -> u, 2 -> uo}|       1|
> +---+-++
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36452) Add the support in Spark for having group by map datatype column for the scenario that works in Hive

2021-08-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36452:


Assignee: Apache Spark

> Add the support in Spark for having group by map datatype column for the 
> scenario that works in Hive
> 
>
> Key: SPARK-36452
> URL: https://issues.apache.org/jira/browse/SPARK-36452
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Saurabh Chawla
>Assignee: Apache Spark
>Priority: Major
>
> Add the support in Spark for having group by map datatype column for the 
> scenario that works in Hive.
> In hive the below scenario works 
> {code:java}
> describe extended complex2;
> OK
> id                  string 
> c1                  map   
> Detailed Table Information Table(tableName:complex2, dbName:default, 
> owner:abc, createTime:1627994412, lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), 
> FieldSchema(name:c1, type:map, comment:null)], 
> location:/user/hive/warehouse/complex2, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,serdeInfo:SerDeInfo(name:null,
>  serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=1})
> select * from complex2;
> OK
> 1 {1:"u"}
> 2 {1:"u",2:"uo"}
> 1 {1:"u",2:"uo"}
> Time taken: 0.363 seconds, Fetched: 3 row(s)
> select id, c1, count(*) from complex2 group by id, c1;
> OK
> 1 {1:"u"} 1
> 1 {1:"u",2:"uo"} 1
> 2 {1:"u",2:"uo"} 1
> Time taken: 1.621 seconds, Fetched: 3 row(s)
> failed when map type is present in aggregated expression 
> select id, max(c1), count(*) from complex2 group by id, c1; 
> FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or 
> complex type containing map<>.
> {code}
> But in spark this scenario where the group by map column failed for this 
> scenario where the map column is used in the select without any aggregation
> {code:java}
> scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
> org.apache.spark.sql.AnalysisException: expression 
> spark_catalog.default.complex2.`c1` cannot be used as a grouping expression 
> because its data type map is not an orderable data type.;
> Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L]
> +- SubqueryAlias spark_catalog.default.complex2
>  +- HiveTableRelation [`default`.`complex2`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], 
> Partition Cols: []]
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50)
> {code}
> There is need to add the this scenario where grouping expression can have map 
> type if aggregated expression does not have the that map type reference. This 
> helps in migrating the user from hive to Spark.
> After the code change 
> {code:java}
> scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
> +---+-++                                              
>   
> | id|               c1|count(1)|
> +---+-++
> |  1|         {1 -> u}|       1|
> |  2|{1 -> u, 2 -> uo}|       1|
> |  1|{1 -> u, 2 -> uo}|       1|
> +---+-++
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36452) Add the support in Spark for having group by map datatype column for the scenario that works in Hive

2021-08-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395523#comment-17395523
 ] 

Apache Spark commented on SPARK-36452:
--

User 'SaurabhChawla100' has created a pull request for this issue:
https://github.com/apache/spark/pull/33679

> Add the support in Spark for having group by map datatype column for the 
> scenario that works in Hive
> 
>
> Key: SPARK-36452
> URL: https://issues.apache.org/jira/browse/SPARK-36452
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Saurabh Chawla
>Priority: Major
>
> Add the support in Spark for having group by map datatype column for the 
> scenario that works in Hive.
> In hive the below scenario works 
> {code:java}
> describe extended complex2;
> OK
> id                  string 
> c1                  map   
> Detailed Table Information Table(tableName:complex2, dbName:default, 
> owner:abc, createTime:1627994412, lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), 
> FieldSchema(name:c1, type:map, comment:null)], 
> location:/user/hive/warehouse/complex2, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,serdeInfo:SerDeInfo(name:null,
>  serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=1})
> select * from complex2;
> OK
> 1 {1:"u"}
> 2 {1:"u",2:"uo"}
> 1 {1:"u",2:"uo"}
> Time taken: 0.363 seconds, Fetched: 3 row(s)
> select id, c1, count(*) from complex2 group by id, c1;
> OK
> 1 {1:"u"} 1
> 1 {1:"u",2:"uo"} 1
> 2 {1:"u",2:"uo"} 1
> Time taken: 1.621 seconds, Fetched: 3 row(s)
> failed when map type is present in aggregated expression 
> select id, max(c1), count(*) from complex2 group by id, c1; 
> FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or 
> complex type containing map<>.
> {code}
> But in spark this scenario where the group by map column failed for this 
> scenario where the map column is used in the select without any aggregation
> {code:java}
> scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
> org.apache.spark.sql.AnalysisException: expression 
> spark_catalog.default.complex2.`c1` cannot be used as a grouping expression 
> because its data type map is not an orderable data type.;
> Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L]
> +- SubqueryAlias spark_catalog.default.complex2
>  +- HiveTableRelation [`default`.`complex2`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], 
> Partition Cols: []]
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50)
> {code}
> There is need to add the this scenario where grouping expression can have map 
> type if aggregated expression does not have the that map type reference. This 
> helps in migrating the user from hive to Spark.
> After the code change 
> {code:java}
> scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
> +---+-++                                              
>   
> | id|               c1|count(1)|
> +---+-++
> |  1|         {1 -> u}|       1|
> |  2|{1 -> u, 2 -> uo}|       1|
> |  1|{1 -> u, 2 -> uo}|       1|
> +---+-++
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36452) Add the support in Spark for having group by map datatype column for the scenario that works in Hive

2021-08-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36452:


Assignee: (was: Apache Spark)

> Add the support in Spark for having group by map datatype column for the 
> scenario that works in Hive
> 
>
> Key: SPARK-36452
> URL: https://issues.apache.org/jira/browse/SPARK-36452
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Saurabh Chawla
>Priority: Major
>
> Add the support in Spark for having group by map datatype column for the 
> scenario that works in Hive.
> In hive the below scenario works 
> {code:java}
> describe extended complex2;
> OK
> id                  string 
> c1                  map   
> Detailed Table Information Table(tableName:complex2, dbName:default, 
> owner:abc, createTime:1627994412, lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), 
> FieldSchema(name:c1, type:map, comment:null)], 
> location:/user/hive/warehouse/complex2, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,serdeInfo:SerDeInfo(name:null,
>  serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=1})
> select * from complex2;
> OK
> 1 {1:"u"}
> 2 {1:"u",2:"uo"}
> 1 {1:"u",2:"uo"}
> Time taken: 0.363 seconds, Fetched: 3 row(s)
> select id, c1, count(*) from complex2 group by id, c1;
> OK
> 1 {1:"u"} 1
> 1 {1:"u",2:"uo"} 1
> 2 {1:"u",2:"uo"} 1
> Time taken: 1.621 seconds, Fetched: 3 row(s)
> failed when map type is present in aggregated expression 
> select id, max(c1), count(*) from complex2 group by id, c1; 
> FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or 
> complex type containing map<>.
> {code}
> But in spark this scenario where the group by map column failed for this 
> scenario where the map column is used in the select without any aggregation
> {code:java}
> scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
> org.apache.spark.sql.AnalysisException: expression 
> spark_catalog.default.complex2.`c1` cannot be used as a grouping expression 
> because its data type map is not an orderable data type.;
> Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L]
> +- SubqueryAlias spark_catalog.default.complex2
>  +- HiveTableRelation [`default`.`complex2`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], 
> Partition Cols: []]
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50)
> {code}
> There is need to add the this scenario where grouping expression can have map 
> type if aggregated expression does not have the that map type reference. This 
> helps in migrating the user from hive to Spark.
> After the code change 
> {code:java}
> scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
> +---+-++                                              
>   
> | id|               c1|count(1)|
> +---+-++
> |  1|         {1 -> u}|       1|
> |  2|{1 -> u, 2 -> uo}|       1|
> |  1|{1 -> u, 2 -> uo}|       1|
> +---+-++
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36452) Add the support in Spark for having group by map datatype column for the scenario that works in Hive

2021-08-08 Thread Saurabh Chawla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Chawla updated SPARK-36452:
---
Description: 
Add the support in Spark for having group by map datatype column for the 
scenario that works in Hive.

In hive the below scenario works 
{code:java}
describe extended complex2;
OK
id                  string 
c1                  map   
Detailed Table Information Table(tableName:complex2, dbName:default, owner:abc, 
createTime:1627994412, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), 
FieldSchema(name:c1, type:map, comment:null)], 
location:/user/hive/warehouse/complex2, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,serdeInfo:SerDeInfo(name:null,
 serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=1})

select * from complex2;
OK
1 {1:"u"}
2 {1:"u",2:"uo"}
1 {1:"u",2:"uo"}
Time taken: 0.363 seconds, Fetched: 3 row(s)

select id, c1, count(*) from complex2 group by id, c1;
OK
1 {1:"u"} 1
1 {1:"u",2:"uo"} 1
2 {1:"u",2:"uo"} 1
Time taken: 1.621 seconds, Fetched: 3 row(s)

failed when map type is present in aggregated expression 
select id, max(c1), count(*) from complex2 group by id, c1; 
FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or 
complex type containing map<>.
{code}
But in spark this scenario where the group by map column failed for this 
scenario where the map column is used in the select without any aggregation
{code:java}
scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
org.apache.spark.sql.AnalysisException: expression 
spark_catalog.default.complex2.`c1` cannot be used as a grouping expression 
because its data type map is not an orderable data type.;
Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L]
+- SubqueryAlias spark_catalog.default.complex2
 +- HiveTableRelation [`default`.`complex2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], 
Partition Cols: []]
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50)
{code}
There is need to add the this scenario where grouping expression can have map 
type if aggregated expression does not have the that map type reference. This 
helps in migrating the user from hive to Spark.

After the code change 
{code:java}
scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
+---+-++                                                
| id|               c1|count(1)|
+---+-++
|  1|         {1 -> u}|       1|
|  2|{1 -> u, 2 -> uo}|       1|
|  1|{1 -> u, 2 -> uo}|       1|
+---+-++
 {code}

  was:
Add the support in Spark for having group by map datatype column for the 
scenario that works in Hive.

In hive the below scenario works 
{code:java}
describe extended complex2;
OK
id                  string 
c1                  map   
Detailed Table Information Table(tableName:complex2, dbName:default, owner:abc, 
createTime:1627994412, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), 
FieldSchema(name:c1, type:map, comment:null)], 
location:/user/hive/warehouse/complex2, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=1})

select * from complex2;
OK
1 {1:"u"}
2 {1:"u",2:"uo"}
1 {1:"u",2:"uo"}
Time taken: 0.363 seconds, Fetched: 3 row(s)

select id, c1, count(*) from complex2 group by id, c1;
OK
1 {1:"u"} 1
1 {1:"u",2:"uo"} 1
2 {1:"u",2:"uo"} 1
Time taken: 1.621 seconds, Fetched: 3 row(s)

failed when map type is present in aggregated expression 
select id, max(c1), count(*) from complex2 group by id, c1; 
FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or 
complex type containing map<>.
{code}
But in spark this scenario where the group by map column failed for this 
scenario where the map column is used in the select without any aggregation
{code:java}
scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
org.apache.spark.sql.AnalysisException: expression 
spark_catalog.default.complex2.`c1` cannot be used as a grouping expression 
because its data type map is not an orderable data type.;
Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L]
+- SubqueryAlias spark_catalog.default.complex2
 +- HiveTableRelation [`default`.`complex2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], 
Partition Cols: []]
at 
org.apache.spark.sql.catalyst.analysis.Che

[jira] [Updated] (SPARK-36452) Add the support in Spark for having group by map datatype column for the scenario that works in Hive

2021-08-08 Thread Saurabh Chawla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Chawla updated SPARK-36452:
---
Description: 
Add the support in Spark for having group by map datatype column for the 
scenario that works in Hive.

In hive the below scenario works 
{code:java}
describe extended complex2;
OK
id                  string 
c1                  map   
Detailed Table Information Table(tableName:complex2, dbName:default, owner:abc, 
createTime:1627994412, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), 
FieldSchema(name:c1, type:map, comment:null)], 
location:/user/hive/warehouse/complex2, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=1})

select * from complex2;
OK
1 {1:"u"}
2 {1:"u",2:"uo"}
1 {1:"u",2:"uo"}
Time taken: 0.363 seconds, Fetched: 3 row(s)

select id, c1, count(*) from complex2 group by id, c1;
OK
1 {1:"u"} 1
1 {1:"u",2:"uo"} 1
2 {1:"u",2:"uo"} 1
Time taken: 1.621 seconds, Fetched: 3 row(s)

failed when map type is present in aggregated expression 
select id, max(c1), count(*) from complex2 group by id, c1; 
FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or 
complex type containing map<>.
{code}
But in spark this scenario where the group by map column failed for this 
scenario where the map column is used in the select without any aggregation
{code:java}
scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
org.apache.spark.sql.AnalysisException: expression 
spark_catalog.default.complex2.`c1` cannot be used as a grouping expression 
because its data type map is not an orderable data type.;
Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L]
+- SubqueryAlias spark_catalog.default.complex2
 +- HiveTableRelation [`default`.`complex2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], 
Partition Cols: []]
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50)
{code}
There is need to add the this scenario where grouping expression can have map 
type if aggregated expression does not have the that map type reference. This 
helps in migrating the user from hive to Spark.

After the code change 
{code:java}
scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
+---+-++                                                
| id|               c1|count(1)|
+---+-++
|  1|         {1 -> u}|       1|
|  2|{1 -> u, 2 -> uo}|       1|
|  1|{1 -> u, 2 -> uo}|       1|
+---+-++
 {code}

  was:
Add the support in Spark for having group by map datatype column for the 
scenario that works in Hive.

In hive the below scenario works 

 
{code:java}
describe extended complex2;
OK
id                  string 
c1                  map   
Detailed Table Information Table(tableName:complex2, dbName:default, owner:abc, 
createTime:1627994412, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), 
FieldSchema(name:c1, type:map, comment:null)], 
location:/user/hive/warehouse/complex2, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=1})
select * from complex2;
OK
1 {1:"u"}
2 {1:"u",2:"uo"}
1 {1:"u",2:"uo"}
Time taken: 0.363 seconds, Fetched: 3 row(s)
select id, c1, count(*) from complex2 group by id, c1;
OK
1 {1:"u"} 1
1 {1:"u",2:"uo"} 1
2 {1:"u",2:"uo"} 1
Time taken: 1.621 seconds, Fetched: 3 row(s)
failed when map type is present in aggregated expression 
select id, max(c1), count(*) from complex2 group by id, c1; 
FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or 
complex type containing map<>.
{code}
 

But in spark this scenario where the group by map column failed for this 
scenario where the map column is used in the select without any aggregation

 
{code:java}
scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
org.apache.spark.sql.AnalysisException: expression 
spark_catalog.default.complex2.`c1` cannot be used as a grouping expression 
because its data type map is not an orderable data type.;
Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L]
+- SubqueryAlias spark_catalog.default.complex2
 +- HiveTableRelation [`default`.`complex2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], 
Partition Cols: []]
at 
o

[jira] [Created] (SPARK-36452) Add the support in Spark for having group by map datatype column for the scenario that works in Hive

2021-08-08 Thread Saurabh Chawla (Jira)
Saurabh Chawla created SPARK-36452:
--

 Summary: Add the support in Spark for having group by map datatype 
column for the scenario that works in Hive
 Key: SPARK-36452
 URL: https://issues.apache.org/jira/browse/SPARK-36452
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.2, 3.0.3, 3.2.0
Reporter: Saurabh Chawla


Add the support in Spark for having group by map datatype column for the 
scenario that works in Hive.

In hive the below scenario works 

 
{code:java}
describe extended complex2;
OK
id                  string 
c1                  map   
Detailed Table Information Table(tableName:complex2, dbName:default, owner:abc, 
createTime:1627994412, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), 
FieldSchema(name:c1, type:map, comment:null)], 
location:/user/hive/warehouse/complex2, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=1})
select * from complex2;
OK
1 {1:"u"}
2 {1:"u",2:"uo"}
1 {1:"u",2:"uo"}
Time taken: 0.363 seconds, Fetched: 3 row(s)
select id, c1, count(*) from complex2 group by id, c1;
OK
1 {1:"u"} 1
1 {1:"u",2:"uo"} 1
2 {1:"u",2:"uo"} 1
Time taken: 1.621 seconds, Fetched: 3 row(s)
failed when map type is present in aggregated expression 
select id, max(c1), count(*) from complex2 group by id, c1; 
FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or 
complex type containing map<>.
{code}
 

But in spark this scenario where the group by map column failed for this 
scenario where the map column is used in the select without any aggregation

 
{code:java}
scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
org.apache.spark.sql.AnalysisException: expression 
spark_catalog.default.complex2.`c1` cannot be used as a grouping expression 
because its data type map is not an orderable data type.;
Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L]
+- SubqueryAlias spark_catalog.default.complex2
 +- HiveTableRelation [`default`.`complex2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], 
Partition Cols: []]
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50)
{code}
There is need to add the this scenario where grouping expression can have map 
type if aggregated expression does not have the that map type reference. This 
helps in migrating the user from hive to Spark.

After the code change 

 
{code:java}
scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show
+---+-++                                                
| id|               c1|count(1)|
+---+-++
|  1|         {1 -> u}|       1|
|  2|{1 -> u, 2 -> uo}|       1|
|  1|{1 -> u, 2 -> uo}|       1|
+---+-++
{code}
 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36451) Ivy skips looking for source and doc pom

2021-08-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36451:


Assignee: (was: Apache Spark)

> Ivy skips looking for source and doc pom
> 
>
> Key: SPARK-36451
> URL: https://issues.apache.org/jira/browse/SPARK-36451
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 3.2.0
>Reporter: dzcxzl
>Priority: Trivial
>
> Because SPARK-35863 Upgrade Ivy to 2.5.0, it supports skip searching the 
> source and doc pom, but the remote repo will still be queried at present.
>  
> org.apache.ivy.plugins.parser.m2.PomModuleDescriptorParser#addSourcesAndJavadocArtifactsIfPresent
> {code:java}
> boolean sourcesLookup = !"false"
> .equals(ivySettings.getVariable("ivy.maven.lookup.sources"));
> boolean javadocLookup = !"false"
> .equals(ivySettings.getVariable("ivy.maven.lookup.javadoc"));
> if (!sourcesLookup && !javadocLookup) {
> Message.debug("Sources and javadocs lookup disabled");
> return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36451) Ivy skips looking for source and doc pom

2021-08-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36451:


Assignee: Apache Spark

> Ivy skips looking for source and doc pom
> 
>
> Key: SPARK-36451
> URL: https://issues.apache.org/jira/browse/SPARK-36451
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 3.2.0
>Reporter: dzcxzl
>Assignee: Apache Spark
>Priority: Trivial
>
> Because SPARK-35863 Upgrade Ivy to 2.5.0, it supports skip searching the 
> source and doc pom, but the remote repo will still be queried at present.
>  
> org.apache.ivy.plugins.parser.m2.PomModuleDescriptorParser#addSourcesAndJavadocArtifactsIfPresent
> {code:java}
> boolean sourcesLookup = !"false"
> .equals(ivySettings.getVariable("ivy.maven.lookup.sources"));
> boolean javadocLookup = !"false"
> .equals(ivySettings.getVariable("ivy.maven.lookup.javadoc"));
> if (!sourcesLookup && !javadocLookup) {
> Message.debug("Sources and javadocs lookup disabled");
> return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36451) Ivy skips looking for source and doc pom

2021-08-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395516#comment-17395516
 ] 

Apache Spark commented on SPARK-36451:
--

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/33678

> Ivy skips looking for source and doc pom
> 
>
> Key: SPARK-36451
> URL: https://issues.apache.org/jira/browse/SPARK-36451
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 3.2.0
>Reporter: dzcxzl
>Priority: Trivial
>
> Because SPARK-35863 Upgrade Ivy to 2.5.0, it supports skip searching the 
> source and doc pom, but the remote repo will still be queried at present.
>  
> org.apache.ivy.plugins.parser.m2.PomModuleDescriptorParser#addSourcesAndJavadocArtifactsIfPresent
> {code:java}
> boolean sourcesLookup = !"false"
> .equals(ivySettings.getVariable("ivy.maven.lookup.sources"));
> boolean javadocLookup = !"false"
> .equals(ivySettings.getVariable("ivy.maven.lookup.javadoc"));
> if (!sourcesLookup && !javadocLookup) {
> Message.debug("Sources and javadocs lookup disabled");
> return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36451) Ivy skips looking for source and doc pom

2021-08-08 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-36451:
---
Description: 
Because SPARK-35863 Upgrade Ivy to 2.5.0, it supports skip searching the source 
and doc pom, but the remote repo will still be queried at present.

 

org.apache.ivy.plugins.parser.m2.PomModuleDescriptorParser#addSourcesAndJavadocArtifactsIfPresent
{code:java}
boolean sourcesLookup = !"false"
.equals(ivySettings.getVariable("ivy.maven.lookup.sources"));
boolean javadocLookup = !"false"
.equals(ivySettings.getVariable("ivy.maven.lookup.javadoc"));
if (!sourcesLookup && !javadocLookup) {
Message.debug("Sources and javadocs lookup disabled");
return;
}
{code}

  was:Because SPARK-35863 Upgrade Ivy to 2.5.0, it supports skip searching the 
source and doc pom, but the remote repo will still be queried at present.


> Ivy skips looking for source and doc pom
> 
>
> Key: SPARK-36451
> URL: https://issues.apache.org/jira/browse/SPARK-36451
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 3.2.0
>Reporter: dzcxzl
>Priority: Trivial
>
> Because SPARK-35863 Upgrade Ivy to 2.5.0, it supports skip searching the 
> source and doc pom, but the remote repo will still be queried at present.
>  
> org.apache.ivy.plugins.parser.m2.PomModuleDescriptorParser#addSourcesAndJavadocArtifactsIfPresent
> {code:java}
> boolean sourcesLookup = !"false"
> .equals(ivySettings.getVariable("ivy.maven.lookup.sources"));
> boolean javadocLookup = !"false"
> .equals(ivySettings.getVariable("ivy.maven.lookup.javadoc"));
> if (!sourcesLookup && !javadocLookup) {
> Message.debug("Sources and javadocs lookup disabled");
> return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36451) Ivy skips looking for source and doc pom

2021-08-08 Thread dzcxzl (Jira)
dzcxzl created SPARK-36451:
--

 Summary: Ivy skips looking for source and doc pom
 Key: SPARK-36451
 URL: https://issues.apache.org/jira/browse/SPARK-36451
 Project: Spark
  Issue Type: Improvement
  Components: Spark Submit
Affects Versions: 3.2.0
Reporter: dzcxzl


Because SPARK-35863 Upgrade Ivy to 2.5.0, it supports skip searching the source 
and doc pom, but the remote repo will still be queried at present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org