[jira] [Resolved] (SPARK-38140) Desc column stats (min, max) for timestamp type is not consistent with the value due to time zone difference

2022-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-38140.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35440
[https://github.com/apache/spark/pull/35440]

> Desc column stats (min, max) for timestamp type is not consistent with the 
> value due to time zone difference
> 
>
> Key: SPARK-38140
> URL: https://issues.apache.org/jira/browse/SPARK-38140
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.1
>Reporter: Zhenhua Wang
>Assignee: Zhenhua Wang
>Priority: Minor
> Fix For: 3.3.0
>
>
> Currently timestamp column's stats (min/max) are stored in UTC in metastore, 
> and when desc its min/max column stats, they are also shown in UTC.
> As a result, for users not in UTC, the column stats (shown to users) are not 
> consistent with the actual value, which causes confusion.
> For example:
> {noformat}
> spark-sql> create table tab_ts_master (ts timestamp) using parquet;
> spark-sql> insert into tab_ts_master values make_timestamp(2022, 1, 1, 0, 0, 
> 1.123456), make_timestamp(2022, 1, 3, 0, 0, 2.987654);
> spark-sql> select * from tab_ts_master;
> 2022-01-01 00:00:01.123456
> 2022-01-03 00:00:02.987654
> spark-sql> set spark.sql.session.timeZone;
> spark.sql.session.timeZoneAsia/Shanghai
> spark-sql> analyze table tab_ts_master compute statistics for all columns;
> spark-sql> desc formatted tab_ts_master ts;
> col_name  ts
> data_type timestamp
> comment   NULL
> min   2021-12-31 16:00:01.123456
> max   2022-01-02 16:00:02.987654
> num_nulls 0
> distinct_count2
> avg_col_len   8
> max_col_len   8
> histogram NULL
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38140) Desc column stats (min, max) for timestamp type is not consistent with the value due to time zone difference

2022-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-38140:
---

Assignee: Zhenhua Wang

> Desc column stats (min, max) for timestamp type is not consistent with the 
> value due to time zone difference
> 
>
> Key: SPARK-38140
> URL: https://issues.apache.org/jira/browse/SPARK-38140
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.1
>Reporter: Zhenhua Wang
>Assignee: Zhenhua Wang
>Priority: Minor
>
> Currently timestamp column's stats (min/max) are stored in UTC in metastore, 
> and when desc its min/max column stats, they are also shown in UTC.
> As a result, for users not in UTC, the column stats (shown to users) are not 
> consistent with the actual value, which causes confusion.
> For example:
> {noformat}
> spark-sql> create table tab_ts_master (ts timestamp) using parquet;
> spark-sql> insert into tab_ts_master values make_timestamp(2022, 1, 1, 0, 0, 
> 1.123456), make_timestamp(2022, 1, 3, 0, 0, 2.987654);
> spark-sql> select * from tab_ts_master;
> 2022-01-01 00:00:01.123456
> 2022-01-03 00:00:02.987654
> spark-sql> set spark.sql.session.timeZone;
> spark.sql.session.timeZoneAsia/Shanghai
> spark-sql> analyze table tab_ts_master compute statistics for all columns;
> spark-sql> desc formatted tab_ts_master ts;
> col_name  ts
> data_type timestamp
> comment   NULL
> min   2021-12-31 16:00:01.123456
> max   2022-01-02 16:00:02.987654
> num_nulls 0
> distinct_count2
> avg_col_len   8
> max_col_len   8
> histogram NULL
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38270) SQL CLI AM should keep same exitcode with client

2022-02-20 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-38270:
--
Description: 
Currently for SQL CLI, we all use  shutdown hook to stop SC

{code:java}
// Clean up after we exit
ShutdownHookManager.addShutdownHook { () => SparkSQLEnv.stop() }

{code}
This cause Yarn AM always success even client exit with code not 0.


> SQL CLI AM should keep same exitcode with client
> 
>
> Key: SPARK-38270
> URL: https://issues.apache.org/jira/browse/SPARK-38270
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: angerszhu
>Priority: Major
>
> Currently for SQL CLI, we all use  shutdown hook to stop SC
> {code:java}
> // Clean up after we exit
> ShutdownHookManager.addShutdownHook { () => SparkSQLEnv.stop() }
> {code}
> This cause Yarn AM always success even client exit with code not 0.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38270) SQL CLI AM should keep same exitcode with client

2022-02-20 Thread angerszhu (Jira)
angerszhu created SPARK-38270:
-

 Summary: SQL CLI AM should keep same exitcode with client
 Key: SPARK-38270
 URL: https://issues.apache.org/jira/browse/SPARK-38270
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.1
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37475) Add Scale Parameter to Floor and Ceil functions

2022-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37475:
---

Assignee: Sathiya Kumar

> Add Scale Parameter to Floor and Ceil functions
> ---
>
> Key: SPARK-37475
> URL: https://issues.apache.org/jira/browse/SPARK-37475
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Sathiya Kumar
>Assignee: Sathiya Kumar
>Priority: Minor
> Fix For: 3.3.0
>
>
> This feature is proposed in the PR : 
> https://github.com/apache/spark/pull/34593
> Currently we support Decimal RoundingModes : HALF_UP (round) and HALF_EVEN 
> (bround). But we have use cases that needs RoundingMode.UP and 
> RoundingMode.DOWN.
> [https://stackoverflow.com/questions/34888419/round-down-double-in-spark/40476117]
> [https://stackoverflow.com/questions/54683066/is-there-a-rounddown-function-in-sql-as-there-is-in-excel]
> [https://stackoverflow.com/questions/48279641/oracle-sql-round-half]
>  
> Floor and Ceil functions helps to do this but it doesn't support the position 
> of the rounding. Adding scale parameter to the functions would help us 
> control the rounding positions. 
>  
> Snowflake supports `scale` parameter to `floor`/`ceil` :
> {code:java}
> FLOOR(  [,  ] ){code}
> REF:
> [https://docs.snowflake.com/en/sql-reference/functions/floor.html]
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37475) Add Scale Parameter to Floor and Ceil functions

2022-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37475.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34729
[https://github.com/apache/spark/pull/34729]

> Add Scale Parameter to Floor and Ceil functions
> ---
>
> Key: SPARK-37475
> URL: https://issues.apache.org/jira/browse/SPARK-37475
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Sathiya Kumar
>Priority: Minor
> Fix For: 3.3.0
>
>
> This feature is proposed in the PR : 
> https://github.com/apache/spark/pull/34593
> Currently we support Decimal RoundingModes : HALF_UP (round) and HALF_EVEN 
> (bround). But we have use cases that needs RoundingMode.UP and 
> RoundingMode.DOWN.
> [https://stackoverflow.com/questions/34888419/round-down-double-in-spark/40476117]
> [https://stackoverflow.com/questions/54683066/is-there-a-rounddown-function-in-sql-as-there-is-in-excel]
> [https://stackoverflow.com/questions/48279641/oracle-sql-round-half]
>  
> Floor and Ceil functions helps to do this but it doesn't support the position 
> of the rounding. Adding scale parameter to the functions would help us 
> control the rounding positions. 
>  
> Snowflake supports `scale` parameter to `floor`/`ceil` :
> {code:java}
> FLOOR(  [,  ] ){code}
> REF:
> [https://docs.snowflake.com/en/sql-reference/functions/floor.html]
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38227) Apply strict nullability of nested column in time window / session window

2022-02-20 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-38227.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35543
[https://github.com/apache/spark/pull/35543]

> Apply strict nullability of nested column in time window / session window
> -
>
> Key: SPARK-38227
> URL: https://issues.apache.org/jira/browse/SPARK-38227
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.3.0
>
>
> In TimeWindow and SessionWindow, we define dataType of these function 
> expressions as StructType having two nested columns "start" and "end", which 
> is "nullable".
> And we replace these expressions in the analyzer via corresponding rules, 
> TimeWindowing for TimeWindow, and SessionWindowing for SessionWindow.
> The rules replace the function expressions with Alias, referring 
> CreateNamedStruct. For the value side of CreateNamedStruct, we don't specify 
> anything about nullability, which leads to a risk the value side may be 
> interpreted (or optimized) as non-nullable, which would make inconsistency.
> We should make sure the nullability of columns in CreateNamedStruct remains 
> the same with dataType definition on these function expressions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38227) Apply strict nullability of nested column in time window / session window

2022-02-20 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-38227:
---

Assignee: Jungtaek Lim

> Apply strict nullability of nested column in time window / session window
> -
>
> Key: SPARK-38227
> URL: https://issues.apache.org/jira/browse/SPARK-38227
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> In TimeWindow and SessionWindow, we define dataType of these function 
> expressions as StructType having two nested columns "start" and "end", which 
> is "nullable".
> And we replace these expressions in the analyzer via corresponding rules, 
> TimeWindowing for TimeWindow, and SessionWindowing for SessionWindow.
> The rules replace the function expressions with Alias, referring 
> CreateNamedStruct. For the value side of CreateNamedStruct, we don't specify 
> anything about nullability, which leads to a risk the value side may be 
> interpreted (or optimized) as non-nullable, which would make inconsistency.
> We should make sure the nullability of columns in CreateNamedStruct remains 
> the same with dataType definition on these function expressions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38269) Clean up redundant type cast

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495325#comment-17495325
 ] 

Apache Spark commented on SPARK-38269:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/35592

> Clean up redundant type cast
> 
>
> Key: SPARK-38269
> URL: https://issues.apache.org/jira/browse/SPARK-38269
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38269) Clean up redundant type cast

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38269:


Assignee: (was: Apache Spark)

> Clean up redundant type cast
> 
>
> Key: SPARK-38269
> URL: https://issues.apache.org/jira/browse/SPARK-38269
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38269) Clean up redundant type cast

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38269:


Assignee: Apache Spark

> Clean up redundant type cast
> 
>
> Key: SPARK-38269
> URL: https://issues.apache.org/jira/browse/SPARK-38269
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38269) Clean up redundant type cast

2022-02-20 Thread Yang Jie (Jira)
Yang Jie created SPARK-38269:


 Summary: Clean up redundant type cast
 Key: SPARK-38269
 URL: https://issues.apache.org/jira/browse/SPARK-38269
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Affects Versions: 3.3.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-38200) [SQL] Spark JDBC Savemode Supports replace

2022-02-20 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495322#comment-17495322
 ] 

melin edited comment on SPARK-38200 at 2/21/22, 5:17 AM:
-

[~beliefer]  

oracle: 
[https://docs.oracle.com/en/database/other-databases/nosql-database/21.1/sqlfornosql/adding-table-rows-using-insert-and-upsert-statements.html]


was (Author: melin):
[~beliefer]  

> [SQL] Spark JDBC Savemode Supports replace
> --
>
> Key: SPARK-38200
> URL: https://issues.apache.org/jira/browse/SPARK-38200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> When writing data into a relational database, data duplication needs to be 
> considered. Both mysql and postgres support upsert syntax.
> mysql:
> {code:java}
> replace into t(id, update_time) values(1, now()); {code}
> pg:
> {code:java}
> INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT 
> (id,name) DO UPDATE SET 
> id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark
>    {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-38200) [SQL] Spark JDBC Savemode Supports replace

2022-02-20 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495322#comment-17495322
 ] 

melin edited comment on SPARK-38200 at 2/21/22, 5:17 AM:
-

[~beliefer]  

oracle: 
[https://docs.oracle.com/en/database/other-databases/nosql-database/21.1/sqlfornosql/adding-table-rows-using-insert-and-upsert-statements.html]

db2 or sqlserver
{code:java}
MERGE INTO mytable AS mt USING (
SELECT * FROM TABLE (
VALUES 
(123, 'text')
)
) AS vt(id, val) ON (mt.id = vt.id)
WHEN MATCHED THEN
UPDATE SET val = vt.val
WHEN NOT MATCHED THEN
INSERT (id, val) VALUES (vt.id, vt.val)
; {code}


was (Author: melin):
[~beliefer]  

oracle: 
[https://docs.oracle.com/en/database/other-databases/nosql-database/21.1/sqlfornosql/adding-table-rows-using-insert-and-upsert-statements.html]

> [SQL] Spark JDBC Savemode Supports replace
> --
>
> Key: SPARK-38200
> URL: https://issues.apache.org/jira/browse/SPARK-38200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> When writing data into a relational database, data duplication needs to be 
> considered. Both mysql and postgres support upsert syntax.
> mysql:
> {code:java}
> replace into t(id, update_time) values(1, now()); {code}
> pg:
> {code:java}
> INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT 
> (id,name) DO UPDATE SET 
> id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark
>    {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38236) Absolute file paths specified in create/alter table are treated as relative

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495324#comment-17495324
 ] 

Apache Spark commented on SPARK-38236:
--

User 'bozhang2820' has created a pull request for this issue:
https://github.com/apache/spark/pull/35591

> Absolute file paths specified in create/alter table are treated as relative
> ---
>
> Key: SPARK-38236
> URL: https://issues.apache.org/jira/browse/SPARK-38236
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.2.0, 3.2.1
>Reporter: Bo Zhang
>Priority: Major
>
> After https://github.com/apache/spark/pull/28527 we change to create table 
> under the database location when the table location specified is relative. 
> However the criteria to determine if a table location is relative/absolute is 
> URI.isAbsolute, which basically checks if the table location URI has a scheme 
> defined. So table URIs like /table/path are treated as relative and the 
> scheme and authority of the database location URI are used to create the 
> table. For example, when the database location URI is s3a://bucket/db, the 
> table will be created at s3a://bucket/table/path, while it should be created 
> under the file system defined in SessionCatalog.hadoopConf instead.
> This also applies to alter table.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38236) Absolute file paths specified in create/alter table are treated as relative

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495323#comment-17495323
 ] 

Apache Spark commented on SPARK-38236:
--

User 'bozhang2820' has created a pull request for this issue:
https://github.com/apache/spark/pull/35591

> Absolute file paths specified in create/alter table are treated as relative
> ---
>
> Key: SPARK-38236
> URL: https://issues.apache.org/jira/browse/SPARK-38236
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.2.0, 3.2.1
>Reporter: Bo Zhang
>Priority: Major
>
> After https://github.com/apache/spark/pull/28527 we change to create table 
> under the database location when the table location specified is relative. 
> However the criteria to determine if a table location is relative/absolute is 
> URI.isAbsolute, which basically checks if the table location URI has a scheme 
> defined. So table URIs like /table/path are treated as relative and the 
> scheme and authority of the database location URI are used to create the 
> table. For example, when the database location URI is s3a://bucket/db, the 
> table will be created at s3a://bucket/table/path, while it should be created 
> under the file system defined in SessionCatalog.hadoopConf instead.
> This also applies to alter table.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports replace

2022-02-20 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495322#comment-17495322
 ] 

melin commented on SPARK-38200:
---

[~beliefer]  

> [SQL] Spark JDBC Savemode Supports replace
> --
>
> Key: SPARK-38200
> URL: https://issues.apache.org/jira/browse/SPARK-38200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> When writing data into a relational database, data duplication needs to be 
> considered. Both mysql and postgres support upsert syntax.
> mysql:
> {code:java}
> replace into t(id, update_time) values(1, now()); {code}
> pg:
> {code:java}
> INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT 
> (id,name) DO UPDATE SET 
> id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark
>    {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38268) Hide the "failOnError" field in the toString method of Abs/CheckOverflow

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38268:


Assignee: Gengliang Wang  (was: Apache Spark)

> Hide the "failOnError" field in the toString method of Abs/CheckOverflow
> 
>
> Key: SPARK-38268
> URL: https://issues.apache.org/jira/browse/SPARK-38268
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> To fix most of the test failures of *PlanStabilitySuite under ANSI mode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38268) Hide the "failOnError" field in the toString method of Abs/CheckOverflow

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495318#comment-17495318
 ] 

Apache Spark commented on SPARK-38268:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/35590

> Hide the "failOnError" field in the toString method of Abs/CheckOverflow
> 
>
> Key: SPARK-38268
> URL: https://issues.apache.org/jira/browse/SPARK-38268
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> To fix most of the test failures of *PlanStabilitySuite under ANSI mode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38268) Hide the "failOnError" field in the toString method of Abs/CheckOverflow

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495319#comment-17495319
 ] 

Apache Spark commented on SPARK-38268:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/35590

> Hide the "failOnError" field in the toString method of Abs/CheckOverflow
> 
>
> Key: SPARK-38268
> URL: https://issues.apache.org/jira/browse/SPARK-38268
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> To fix most of the test failures of *PlanStabilitySuite under ANSI mode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38268) Hide the "failOnError" field in the toString method of Abs/CheckOverflow

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38268:


Assignee: Apache Spark  (was: Gengliang Wang)

> Hide the "failOnError" field in the toString method of Abs/CheckOverflow
> 
>
> Key: SPARK-38268
> URL: https://issues.apache.org/jira/browse/SPARK-38268
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> To fix most of the test failures of *PlanStabilitySuite under ANSI mode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38268) Hide the "failOnError" field in the toString method of Abs/CheckOverflow

2022-02-20 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-38268:
--

 Summary: Hide the "failOnError" field in the toString method of 
Abs/CheckOverflow
 Key: SPARK-38268
 URL: https://issues.apache.org/jira/browse/SPARK-38268
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


To fix most of the test failures of *PlanStabilitySuite under ANSI mode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38267) Replace pattern matches on boolean expressions with conditional statements

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495311#comment-17495311
 ] 

Apache Spark commented on SPARK-38267:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/35589

> Replace pattern matches on boolean expressions with conditional statements
> --
>
> Key: SPARK-38267
> URL: https://issues.apache.org/jira/browse/SPARK-38267
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> Before
>  
> {code:java}
> val bool: Boolean
>   bool match {
>     case true => do something when bool is true
>     case false => do something when bool is false
>   } {code}
>  
>  
> After
>  
> {code:java}
> val bool: Boolean
>   if (bool) {     
> do something when bool is true   
>   } else {     
> do something when bool is false   
>   } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38267) Replace pattern matches on boolean expressions with conditional statements

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38267:


Assignee: Apache Spark

> Replace pattern matches on boolean expressions with conditional statements
> --
>
> Key: SPARK-38267
> URL: https://issues.apache.org/jira/browse/SPARK-38267
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> Before
>  
> {code:java}
> val bool: Boolean
>   bool match {
>     case true => do something when bool is true
>     case false => do something when bool is false
>   } {code}
>  
>  
> After
>  
> {code:java}
> val bool: Boolean
>   if (bool) {     
> do something when bool is true   
>   } else {     
> do something when bool is false   
>   } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38267) Replace pattern matches on boolean expressions with conditional statements

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38267:


Assignee: (was: Apache Spark)

> Replace pattern matches on boolean expressions with conditional statements
> --
>
> Key: SPARK-38267
> URL: https://issues.apache.org/jira/browse/SPARK-38267
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> Before
>  
> {code:java}
> val bool: Boolean
>   bool match {
>     case true => do something when bool is true
>     case false => do something when bool is false
>   } {code}
>  
>  
> After
>  
> {code:java}
> val bool: Boolean
>   if (bool) {     
> do something when bool is true   
>   } else {     
> do something when bool is false   
>   } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38267) Replace pattern matches on boolean expressions with conditional statements

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495310#comment-17495310
 ] 

Apache Spark commented on SPARK-38267:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/35589

> Replace pattern matches on boolean expressions with conditional statements
> --
>
> Key: SPARK-38267
> URL: https://issues.apache.org/jira/browse/SPARK-38267
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> Before
>  
> {code:java}
> val bool: Boolean
>   bool match {
>     case true => do something when bool is true
>     case false => do something when bool is false
>   } {code}
>  
>  
> After
>  
> {code:java}
> val bool: Boolean
>   if (bool) {     
> do something when bool is true   
>   } else {     
> do something when bool is false   
>   } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38267) Replace pattern matches on boolean expressions with conditional statements

2022-02-20 Thread Yang Jie (Jira)
Yang Jie created SPARK-38267:


 Summary: Replace pattern matches on boolean expressions with 
conditional statements
 Key: SPARK-38267
 URL: https://issues.apache.org/jira/browse/SPARK-38267
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Yang Jie


Before

 
{code:java}
val bool: Boolean
  bool match {
    case true => do something when bool is true
    case false => do something when bool is false
  } {code}
 

 

After

 
{code:java}
val bool: Boolean
  if (bool) {     
do something when bool is true   
  } else {     
do something when bool is false   
  } {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running

2022-02-20 Thread gabrywu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gabrywu updated SPARK-38258:

Description: 
As we all know, table & column statistics are very important to spark SQL 
optimizer, however we have to collect & update them using 
{code:java}
analyze table tableName compute statistics{code}
 

It's a little inconvenient, so why can't we collect & update statistics when a 
spark stage runs and finishes?

For example, when a insert overwrite table statement finishes, we can update a 
corresponding table statistics using SQL metrics. And in following queries, 
spark sql optimizer can use these statistics.

As we all know, it's a common case that we run daily batches using Spark SQLs, 
so a same SQL can run every day, and the SQL and its corresponding tables data 
change slowly. That means we can use statistics updated on yesterday to 
optimize current SQLs.

So we'd better add a mechanism to store every stage's statistics somewhere, and 
use it in new SQLs. Not just collect statistics after a stage finishes.

 

  was:
As we all know, table & column statistics are very important to spark SQL 
optimizer, however we have to collect & update them using 
{code:java}
analyze table tableName compute statistics{code}
 

It's a little inconvenient, so why can't we collect & update statistics when a 
spark stage runs and finishes?

For example, when a insert overwrite table statement finishes, we can update a 
corresponding table statistics using SQL metric. And in following queries, 
spark sql optimizer can use these statistics.

As we all know, it's a common case that we run daily batches using Spark SQLs, 
so a same SQL can run every day, and the SQL and its corresponding tables data 
change slowly. That means we can use statistics updated on yesterday to 
optimize current SQL.

So we'd better add a mechanism to store every stage's statistics somewhere, and 
use it in new SQLs. Not just collect statistics after a stage finishes.

 


> [proposal] collect & update statistics automatically when spark SQL is running
> --
>
> Key: SPARK-38258
> URL: https://issues.apache.org/jira/browse/SPARK-38258
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0
>Reporter: gabrywu
>Priority: Minor
>
> As we all know, table & column statistics are very important to spark SQL 
> optimizer, however we have to collect & update them using 
> {code:java}
> analyze table tableName compute statistics{code}
>  
> It's a little inconvenient, so why can't we collect & update statistics when 
> a spark stage runs and finishes?
> For example, when a insert overwrite table statement finishes, we can update 
> a corresponding table statistics using SQL metrics. And in following queries, 
> spark sql optimizer can use these statistics.
> As we all know, it's a common case that we run daily batches using Spark 
> SQLs, so a same SQL can run every day, and the SQL and its corresponding 
> tables data change slowly. That means we can use statistics updated on 
> yesterday to optimize current SQLs.
> So we'd better add a mechanism to store every stage's statistics somewhere, 
> and use it in new SQLs. Not just collect statistics after a stage finishes.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running

2022-02-20 Thread gabrywu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gabrywu updated SPARK-38258:

Description: 
As we all know, table & column statistics are very important to spark SQL 
optimizer, however we have to collect & update them using 
{code:java}
analyze table tableName compute statistics{code}
 

It's a little inconvenient, so why can't we collect & update statistics when a 
spark stage runs and finishes?

For example, when a insert overwrite table statement finishes, we can update a 
corresponding table statistics using SQL metric. And in following queries, 
spark sql optimizer can use these statistics.

As we all know, it's a common case that we run daily batches using Spark SQLs, 
so a same SQL can run every day, and the SQL and its corresponding tables data 
change slowly. That means we can use statistics updated on yesterday to 
optimize current SQL.

So we'd better add a mechanism to store every stage's statistics somewhere, and 
use it in new SQLs. Not just collect statistics after a stage finishes.

 

  was:
As we all know, table & column statistics are very important to spark SQL 
optimizer, however we have to collect & update them using 
{code:java}
analyze table tableName compute statistics{code}
 

It's a little inconvenient, so why can't we collect & update statistics when a 
spark stage runs and finishes?

For example, when a insert overwrite table statement finishes, we can update a 
corresponding table statistics using SQL metric. And in following queries, 
spark sql optimizer can use these statistics.

As we all know, it's a common case that we run daily batches using Spark SQLs, 
so a same SQL can run every day, and the SQL and its corresponding tables data 
change slowly. That means we can use sta

 


> [proposal] collect & update statistics automatically when spark SQL is running
> --
>
> Key: SPARK-38258
> URL: https://issues.apache.org/jira/browse/SPARK-38258
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0
>Reporter: gabrywu
>Priority: Minor
>
> As we all know, table & column statistics are very important to spark SQL 
> optimizer, however we have to collect & update them using 
> {code:java}
> analyze table tableName compute statistics{code}
>  
> It's a little inconvenient, so why can't we collect & update statistics when 
> a spark stage runs and finishes?
> For example, when a insert overwrite table statement finishes, we can update 
> a corresponding table statistics using SQL metric. And in following queries, 
> spark sql optimizer can use these statistics.
> As we all know, it's a common case that we run daily batches using Spark 
> SQLs, so a same SQL can run every day, and the SQL and its corresponding 
> tables data change slowly. That means we can use statistics updated on 
> yesterday to optimize current SQL.
> So we'd better add a mechanism to store every stage's statistics somewhere, 
> and use it in new SQLs. Not just collect statistics after a stage finishes.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports replace

2022-02-20 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495305#comment-17495305
 ] 

jiaan.geng commented on SPARK-38200:


[~melin] OK. Does other SQL could finish the same work as Upsert SQL?

> [SQL] Spark JDBC Savemode Supports replace
> --
>
> Key: SPARK-38200
> URL: https://issues.apache.org/jira/browse/SPARK-38200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> When writing data into a relational database, data duplication needs to be 
> considered. Both mysql and postgres support upsert syntax.
> mysql:
> {code:java}
> replace into t(id, update_time) values(1, now()); {code}
> pg:
> {code:java}
> INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT 
> (id,name) DO UPDATE SET 
> id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark
>    {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running

2022-02-20 Thread gabrywu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gabrywu updated SPARK-38258:

Description: 
As we all know, table & column statistics are very important to spark SQL 
optimizer, however we have to collect & update them using 
{code:java}
analyze table tableName compute statistics{code}
 

It's a little inconvenient, so why can't we collect & update statistics when a 
spark stage runs and finishes?

For example, when a insert overwrite table statement finishes, we can update a 
corresponding table statistics using SQL metric. And in following queries, 
spark sql optimizer can use these statistics.

As we all know, it's a common case that we run daily batches using Spark SQLs, 
so a same SQL can run every day, and the SQL and its corresponding tables data 
change slowly. That means we can use sta

 

  was:
As we all know, table & column statistics are very important to spark SQL 
optimizer, however we have to collect & update them using 
{code:java}
analyze table tableName compute statistics{code}
 

It's a little inconvenient, so why can't we collect & update statistics when a 
spark stage runs and finishes?

For example, when a insert overwrite table statement finishes, we can update a 
corresponding table statistics using SQL metric. And in following queries, 
spark sql optimizer can use these statistics.

So what do you think of it?[~yumwang] , it it reasonable?


> [proposal] collect & update statistics automatically when spark SQL is running
> --
>
> Key: SPARK-38258
> URL: https://issues.apache.org/jira/browse/SPARK-38258
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0
>Reporter: gabrywu
>Priority: Minor
>
> As we all know, table & column statistics are very important to spark SQL 
> optimizer, however we have to collect & update them using 
> {code:java}
> analyze table tableName compute statistics{code}
>  
> It's a little inconvenient, so why can't we collect & update statistics when 
> a spark stage runs and finishes?
> For example, when a insert overwrite table statement finishes, we can update 
> a corresponding table statistics using SQL metric. And in following queries, 
> spark sql optimizer can use these statistics.
> As we all know, it's a common case that we run daily batches using Spark 
> SQLs, so a same SQL can run every day, and the SQL and its corresponding 
> tables data change slowly. That means we can use sta
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495295#comment-17495295
 ] 

Apache Spark commented on SPARK-37090:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/35587

> Upgrade libthrift to resolve security vulnerabilities
> -
>
> Key: SPARK-37090
> URL: https://issues.apache.org/jira/browse/SPARK-37090
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Juliusz Sompolski
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, Spark uses libthrift 0.12, which has reported high severity 
> security vulnerabilities 
> https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift
> Upgrade to 0.14 to get rid of vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495294#comment-17495294
 ] 

Apache Spark commented on SPARK-37090:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/35588

> Upgrade libthrift to resolve security vulnerabilities
> -
>
> Key: SPARK-37090
> URL: https://issues.apache.org/jira/browse/SPARK-37090
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Juliusz Sompolski
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, Spark uses libthrift 0.12, which has reported high severity 
> security vulnerabilities 
> https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift
> Upgrade to 0.14 to get rid of vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495293#comment-17495293
 ] 

Apache Spark commented on SPARK-37090:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/35587

> Upgrade libthrift to resolve security vulnerabilities
> -
>
> Key: SPARK-37090
> URL: https://issues.apache.org/jira/browse/SPARK-37090
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Juliusz Sompolski
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, Spark uses libthrift 0.12, which has reported high severity 
> security vulnerabilities 
> https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift
> Upgrade to 0.14 to get rid of vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38266) UnresolvedException: Invalid call to dataType on unresolved object caused by GetDateFieldOperations

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495286#comment-17495286
 ] 

Apache Spark commented on SPARK-38266:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/35568

> UnresolvedException: Invalid call to dataType on unresolved object caused by 
> GetDateFieldOperations
> ---
>
> Key: SPARK-38266
> URL: https://issues.apache.org/jira/browse/SPARK-38266
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.2
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>
> {code:java}
> test("GetDateFieldOperations should skip unresolved nodes") {
>   withSQLConf(SQLConf.ANSI_ENABLED.key -> "true") {
> val df = Seq("1644821603").map(i => (i.toInt, i)).toDF("tsInt", "tsStr")
> val df1 = df.select(df("tsStr").cast("timestamp")).as("df1")
> val df2 = df.select(df("tsStr").cast("timestamp")).as("df2")
> df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer")
> val df3 = df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer")
>   .select($"df1.tsStr".as("timeStr")).as("df3")
> // This throws "UnresolvedException: Invalid call to
> // dataType on unresolved object" instead of "AnalysisException: Column 
> 'df1.timeStr' does not exist."
> df3.join(df1, year($"df1.timeStr") === year($"df3.tsStr"))
>   }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38266) UnresolvedException: Invalid call to dataType on unresolved object caused by GetDateFieldOperations

2022-02-20 Thread wuyi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyi resolved SPARK-38266.
--
Resolution: Fixed

> UnresolvedException: Invalid call to dataType on unresolved object caused by 
> GetDateFieldOperations
> ---
>
> Key: SPARK-38266
> URL: https://issues.apache.org/jira/browse/SPARK-38266
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.2
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>
> {code:java}
> test("GetDateFieldOperations should skip unresolved nodes") {
>   withSQLConf(SQLConf.ANSI_ENABLED.key -> "true") {
> val df = Seq("1644821603").map(i => (i.toInt, i)).toDF("tsInt", "tsStr")
> val df1 = df.select(df("tsStr").cast("timestamp")).as("df1")
> val df2 = df.select(df("tsStr").cast("timestamp")).as("df2")
> df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer")
> val df3 = df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer")
>   .select($"df1.tsStr".as("timeStr")).as("df3")
> // This throws "UnresolvedException: Invalid call to
> // dataType on unresolved object" instead of "AnalysisException: Column 
> 'df1.timeStr' does not exist."
> df3.join(df1, year($"df1.timeStr") === year($"df3.tsStr"))
>   }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38266) UnresolvedException: Invalid call to dataType on unresolved object caused by GetDateFieldOperations

2022-02-20 Thread wuyi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyi reassigned SPARK-38266:


Assignee: wuyi

> UnresolvedException: Invalid call to dataType on unresolved object caused by 
> GetDateFieldOperations
> ---
>
> Key: SPARK-38266
> URL: https://issues.apache.org/jira/browse/SPARK-38266
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.2
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>
> {code:java}
> test("GetDateFieldOperations should skip unresolved nodes") {
>   withSQLConf(SQLConf.ANSI_ENABLED.key -> "true") {
> val df = Seq("1644821603").map(i => (i.toInt, i)).toDF("tsInt", "tsStr")
> val df1 = df.select(df("tsStr").cast("timestamp")).as("df1")
> val df2 = df.select(df("tsStr").cast("timestamp")).as("df2")
> df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer")
> val df3 = df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer")
>   .select($"df1.tsStr".as("timeStr")).as("df3")
> // This throws "UnresolvedException: Invalid call to
> // dataType on unresolved object" instead of "AnalysisException: Column 
> 'df1.timeStr' does not exist."
> df3.join(df1, year($"df1.timeStr") === year($"df3.tsStr"))
>   }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38266) UnresolvedException: Invalid call to dataType on unresolved object caused by GetDateFieldOperations

2022-02-20 Thread wuyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495285#comment-17495285
 ] 

wuyi commented on SPARK-38266:
--

Issue resolved by https://github.com/apache/spark/pull/35568

> UnresolvedException: Invalid call to dataType on unresolved object caused by 
> GetDateFieldOperations
> ---
>
> Key: SPARK-38266
> URL: https://issues.apache.org/jira/browse/SPARK-38266
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.2
>Reporter: wuyi
>Priority: Major
>
> {code:java}
> test("GetDateFieldOperations should skip unresolved nodes") {
>   withSQLConf(SQLConf.ANSI_ENABLED.key -> "true") {
> val df = Seq("1644821603").map(i => (i.toInt, i)).toDF("tsInt", "tsStr")
> val df1 = df.select(df("tsStr").cast("timestamp")).as("df1")
> val df2 = df.select(df("tsStr").cast("timestamp")).as("df2")
> df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer")
> val df3 = df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer")
>   .select($"df1.tsStr".as("timeStr")).as("df3")
> // This throws "UnresolvedException: Invalid call to
> // dataType on unresolved object" instead of "AnalysisException: Column 
> 'df1.timeStr' does not exist."
> df3.join(df1, year($"df1.timeStr") === year($"df3.tsStr"))
>   }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38266) UnresolvedException: Invalid call to dataType on unresolved object caused by GetDateFieldOperations

2022-02-20 Thread wuyi (Jira)
wuyi created SPARK-38266:


 Summary: UnresolvedException: Invalid call to dataType on 
unresolved object caused by GetDateFieldOperations
 Key: SPARK-38266
 URL: https://issues.apache.org/jira/browse/SPARK-38266
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2, 3.3.0
Reporter: wuyi


{code:java}
test("GetDateFieldOperations should skip unresolved nodes") {
  withSQLConf(SQLConf.ANSI_ENABLED.key -> "true") {
val df = Seq("1644821603").map(i => (i.toInt, i)).toDF("tsInt", "tsStr")
val df1 = df.select(df("tsStr").cast("timestamp")).as("df1")
val df2 = df.select(df("tsStr").cast("timestamp")).as("df2")
df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer")
val df3 = df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer")
  .select($"df1.tsStr".as("timeStr")).as("df3")
// This throws "UnresolvedException: Invalid call to
// dataType on unresolved object" instead of "AnalysisException: Column 
'df1.timeStr' does not exist."
df3.join(df1, year($"df1.timeStr") === year($"df3.tsStr"))
  }
} {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38261) Sync missing R packages with CI

2022-02-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38261.
--
  Assignee: Khalid Mammadov
Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/35583

> Sync missing R packages with CI
> ---
>
> Key: SPARK-38261
> URL: https://issues.apache.org/jira/browse/SPARK-38261
> Project: Spark
>  Issue Type: Github Integration
>  Components: Build
>Affects Versions: 3.2.1
>Reporter: Khalid Mammadov
>Assignee: Khalid Mammadov
>Priority: Minor
>
> Current GitHub workflow job *Linters, licenses, dependencies and 
> documentation generation* is missing R packages to complete Documentation and 
> API build.
> *Build and test* -  is not failing as these packages are installed in the 
> base image.
> We need to keep them in-sync IMO with the base image for easy switch back to 
> ubuntu runner when ready.
> These R packages are missing: *markdown* and  *e1071*
> Reference:
> Base image - 
> https://hub.docker.com/layers/dongjoon/apache-spark-github-action-image/20220207/images/sha256-af09d172ff8e2cbd71df9a1bc5384a47578c4a4cc293786c539333cafaf4a7ce?context=explore



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38261) Sync missing R packages with CI

2022-02-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-38261:
-
Fix Version/s: 3.3.0

> Sync missing R packages with CI
> ---
>
> Key: SPARK-38261
> URL: https://issues.apache.org/jira/browse/SPARK-38261
> Project: Spark
>  Issue Type: Github Integration
>  Components: Build
>Affects Versions: 3.2.1
>Reporter: Khalid Mammadov
>Assignee: Khalid Mammadov
>Priority: Minor
> Fix For: 3.3.0
>
>
> Current GitHub workflow job *Linters, licenses, dependencies and 
> documentation generation* is missing R packages to complete Documentation and 
> API build.
> *Build and test* -  is not failing as these packages are installed in the 
> base image.
> We need to keep them in-sync IMO with the base image for easy switch back to 
> ubuntu runner when ready.
> These R packages are missing: *markdown* and  *e1071*
> Reference:
> Base image - 
> https://hub.docker.com/layers/dongjoon/apache-spark-github-action-image/20220207/images/sha256-af09d172ff8e2cbd71df9a1bc5384a47578c4a4cc293786c539333cafaf4a7ce?context=explore



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38265) Update comments of ExecutorAllocationClient

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495271#comment-17495271
 ] 

Apache Spark commented on SPARK-38265:
--

User 'Shockang' has created a pull request for this issue:
https://github.com/apache/spark/pull/35586

> Update comments of ExecutorAllocationClient
> ---
>
> Key: SPARK-38265
> URL: https://issues.apache.org/jira/browse/SPARK-38265
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Shockang
>Priority: Trivial
> Fix For: 3.3.0
>
>
> The class comment of ExecutorAllocationClient is out of date.
> {code:java}
> This is currently supported only in YARN mode. {code}
> Nowadays, this is supported in the following modes: Spark's Standalone, 
> YARN-Client, YARN-Cluster, Mesos, Kubernetes.
>  
> In my opinion, this comment should be updated.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38265) Update comments of ExecutorAllocationClient

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495270#comment-17495270
 ] 

Apache Spark commented on SPARK-38265:
--

User 'Shockang' has created a pull request for this issue:
https://github.com/apache/spark/pull/35586

> Update comments of ExecutorAllocationClient
> ---
>
> Key: SPARK-38265
> URL: https://issues.apache.org/jira/browse/SPARK-38265
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Shockang
>Priority: Trivial
> Fix For: 3.3.0
>
>
> The class comment of ExecutorAllocationClient is out of date.
> {code:java}
> This is currently supported only in YARN mode. {code}
> Nowadays, this is supported in the following modes: Spark's Standalone, 
> YARN-Client, YARN-Cluster, Mesos, Kubernetes.
>  
> In my opinion, this comment should be updated.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38265) Update comments of ExecutorAllocationClient

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38265:


Assignee: Apache Spark

> Update comments of ExecutorAllocationClient
> ---
>
> Key: SPARK-38265
> URL: https://issues.apache.org/jira/browse/SPARK-38265
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Shockang
>Assignee: Apache Spark
>Priority: Trivial
> Fix For: 3.3.0
>
>
> The class comment of ExecutorAllocationClient is out of date.
> {code:java}
> This is currently supported only in YARN mode. {code}
> Nowadays, this is supported in the following modes: Spark's Standalone, 
> YARN-Client, YARN-Cluster, Mesos, Kubernetes.
>  
> In my opinion, this comment should be updated.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38265) Update comments of ExecutorAllocationClient

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38265:


Assignee: (was: Apache Spark)

> Update comments of ExecutorAllocationClient
> ---
>
> Key: SPARK-38265
> URL: https://issues.apache.org/jira/browse/SPARK-38265
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Shockang
>Priority: Trivial
> Fix For: 3.3.0
>
>
> The class comment of ExecutorAllocationClient is out of date.
> {code:java}
> This is currently supported only in YARN mode. {code}
> Nowadays, this is supported in the following modes: Spark's Standalone, 
> YARN-Client, YARN-Cluster, Mesos, Kubernetes.
>  
> In my opinion, this comment should be updated.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38265) Update comments of ExecutorAllocationClient

2022-02-20 Thread Shockang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495269#comment-17495269
 ] 

Shockang commented on SPARK-38265:
--

Working on this.

> Update comments of ExecutorAllocationClient
> ---
>
> Key: SPARK-38265
> URL: https://issues.apache.org/jira/browse/SPARK-38265
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Shockang
>Priority: Trivial
> Fix For: 3.3.0
>
>
> The class comment of ExecutorAllocationClient is out of date.
> {code:java}
> This is currently supported only in YARN mode. {code}
> Nowadays, this is supported in the following modes: Spark's Standalone, 
> YARN-Client, YARN-Cluster, Mesos, Kubernetes.
>  
> In my opinion, this comment should be updated.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38265) Update comments of ExecutorAllocationClient

2022-02-20 Thread Shockang (Jira)
Shockang created SPARK-38265:


 Summary: Update comments of ExecutorAllocationClient
 Key: SPARK-38265
 URL: https://issues.apache.org/jira/browse/SPARK-38265
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.1
Reporter: Shockang
 Fix For: 3.3.0


The class comment of ExecutorAllocationClient is out of date.
{code:java}
This is currently supported only in YARN mode. {code}
Nowadays, this is supported in the following modes: Spark's Standalone, 
YARN-Client, YARN-Cluster, Mesos, Kubernetes.

 

In my opinion, this comment should be updated.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38264) Add `DataFrame.resample` for pandas API on Spark.

2022-02-20 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-38264:
---

 Summary: Add `DataFrame.resample` for pandas API on Spark.
 Key: SPARK-38264
 URL: https://issues.apache.org/jira/browse/SPARK-38264
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Haejoon Lee


Implement the function DataFrame.resample for pandas API on Spark to follow the 
behavior of pandas 
(https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37426) Inline type hints for python/pyspark/mllib/regression.py

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495257#comment-17495257
 ] 

Apache Spark commented on SPARK-37426:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/35585

> Inline type hints for python/pyspark/mllib/regression.py
> 
>
> Key: SPARK-37426
> URL: https://issues.apache.org/jira/browse/SPARK-37426
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Inline type hints from python/pyspark/mlib/regression.pyi to 
> python/pyspark/mllib/regression.py



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37426) Inline type hints for python/pyspark/mllib/regression.py

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37426:


Assignee: Apache Spark

> Inline type hints for python/pyspark/mllib/regression.py
> 
>
> Key: SPARK-37426
> URL: https://issues.apache.org/jira/browse/SPARK-37426
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Apache Spark
>Priority: Major
>
> Inline type hints from python/pyspark/mlib/regression.pyi to 
> python/pyspark/mllib/regression.py



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37426) Inline type hints for python/pyspark/mllib/regression.py

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495256#comment-17495256
 ] 

Apache Spark commented on SPARK-37426:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/35585

> Inline type hints for python/pyspark/mllib/regression.py
> 
>
> Key: SPARK-37426
> URL: https://issues.apache.org/jira/browse/SPARK-37426
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Inline type hints from python/pyspark/mlib/regression.pyi to 
> python/pyspark/mllib/regression.py



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37426) Inline type hints for python/pyspark/mllib/regression.py

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37426:


Assignee: (was: Apache Spark)

> Inline type hints for python/pyspark/mllib/regression.py
> 
>
> Key: SPARK-37426
> URL: https://issues.apache.org/jira/browse/SPARK-37426
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Inline type hints from python/pyspark/mlib/regression.pyi to 
> python/pyspark/mllib/regression.py



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37400) Inline type hints for python/pyspark/mllib/classification.py

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37400:


Assignee: (was: Apache Spark)

> Inline type hints for python/pyspark/mllib/classification.py
> 
>
> Key: SPARK-37400
> URL: https://issues.apache.org/jira/browse/SPARK-37400
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Inline type hints from python/pyspark/mlib/classification.pyi to 
> python/pyspark/mllib/classification.py.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37400) Inline type hints for python/pyspark/mllib/classification.py

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37400:


Assignee: Apache Spark

> Inline type hints for python/pyspark/mllib/classification.py
> 
>
> Key: SPARK-37400
> URL: https://issues.apache.org/jira/browse/SPARK-37400
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Apache Spark
>Priority: Major
>
> Inline type hints from python/pyspark/mlib/classification.pyi to 
> python/pyspark/mllib/classification.py.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37400) Inline type hints for python/pyspark/mllib/classification.py

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495255#comment-17495255
 ] 

Apache Spark commented on SPARK-37400:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/35585

> Inline type hints for python/pyspark/mllib/classification.py
> 
>
> Key: SPARK-37400
> URL: https://issues.apache.org/jira/browse/SPARK-37400
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Inline type hints from python/pyspark/mlib/classification.pyi to 
> python/pyspark/mllib/classification.py.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities

2022-02-20 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-37090.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/34362

> Upgrade libthrift to resolve security vulnerabilities
> -
>
> Key: SPARK-37090
> URL: https://issues.apache.org/jira/browse/SPARK-37090
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Juliusz Sompolski
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, Spark uses libthrift 0.12, which has reported high severity 
> security vulnerabilities 
> https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift
> Upgrade to 0.14 to get rid of vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities

2022-02-20 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-37090:


Assignee: Yuming Wang

> Upgrade libthrift to resolve security vulnerabilities
> -
>
> Key: SPARK-37090
> URL: https://issues.apache.org/jira/browse/SPARK-37090
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Juliusz Sompolski
>Assignee: Yuming Wang
>Priority: Major
>
> Currently, Spark uses libthrift 0.12, which has reported high severity 
> security vulnerabilities 
> https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift
> Upgrade to 0.14 to get rid of vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36994) Upgrade Apache Thrift

2022-02-20 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-36994.
--
Resolution: Duplicate

> Upgrade Apache Thrift
> -
>
> Key: SPARK-36994
> URL: https://issues.apache.org/jira/browse/SPARK-36994
> Project: Spark
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 3.0.1
>Reporter: kaja girish
>Priority: Major
>
> *Image:*
>  * spark:3.0.1
> *Components Affected:* 
>  * Apache Thrift
> *Recommendation:*
>  * upgrade Apache Thrift 
> *CVE:*
>  
> |Component Name|Component Version Name|Vulnerability|Fixed version|
> |Apache Thrift|0.11.0-4.|CVE-2019-0205|0.13.0|
> |Apache Thrift|0.11.0-4.|CVE-2019-0210|0.13.0|
> |Apache Thrift|0.11.0-4.|CVE-2020-13949|0.14.1|



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities

2022-02-20 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reopened SPARK-37090:
--

I'm going to 'reverse' the direction of the Duplicate as the PRs ended up being 
vs this JIRA

> Upgrade libthrift to resolve security vulnerabilities
> -
>
> Key: SPARK-37090
> URL: https://issues.apache.org/jira/browse/SPARK-37090
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Currently, Spark uses libthrift 0.12, which has reported high severity 
> security vulnerabilities 
> https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift
> Upgrade to 0.14 to get rid of vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38262) Upgrade Google guava to version 30.0-jre

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495220#comment-17495220
 ] 

Apache Spark commented on SPARK-38262:
--

User 'bjornjorgensen' has created a pull request for this issue:
https://github.com/apache/spark/pull/35584

> Upgrade Google guava to version 30.0-jre
> 
>
> Key: SPARK-38262
> URL: https://issues.apache.org/jira/browse/SPARK-38262
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> Apache Spark is using com.google.guava:guava version 14.0.1 which has two 
> security issues.
> [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] 
> [CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] 
> We should upgrade to [version 
> 30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38262) Upgrade Google guava to version 30.0-jre

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38262:


Assignee: (was: Apache Spark)

> Upgrade Google guava to version 30.0-jre
> 
>
> Key: SPARK-38262
> URL: https://issues.apache.org/jira/browse/SPARK-38262
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> Apache Spark is using com.google.guava:guava version 14.0.1 which has two 
> security issues.
> [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] 
> [CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] 
> We should upgrade to [version 
> 30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38262) Upgrade Google guava to version 30.0-jre

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38262:


Assignee: Apache Spark

> Upgrade Google guava to version 30.0-jre
> 
>
> Key: SPARK-38262
> URL: https://issues.apache.org/jira/browse/SPARK-38262
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Bjørn Jørgensen
>Assignee: Apache Spark
>Priority: Major
>
> Apache Spark is using com.google.guava:guava version 14.0.1 which has two 
> security issues.
> [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] 
> [CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] 
> We should upgrade to [version 
> 30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38262) Upgrade Google guava to version 30.0-jre

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495219#comment-17495219
 ] 

Apache Spark commented on SPARK-38262:
--

User 'bjornjorgensen' has created a pull request for this issue:
https://github.com/apache/spark/pull/35584

> Upgrade Google guava to version 30.0-jre
> 
>
> Key: SPARK-38262
> URL: https://issues.apache.org/jira/browse/SPARK-38262
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> Apache Spark is using com.google.guava:guava version 14.0.1 which has two 
> security issues.
> [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] 
> [CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] 
> We should upgrade to [version 
> 30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38262) Upgrade Google guava to version 30.0-jre

2022-02-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-38262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bjørn Jørgensen updated SPARK-38262:

Description: 
Apache Spark is using com.google.guava:guava version 14.0.1 which has two 
security issues.

[CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] 

[CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] 

We should upgrade to [version 
30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] 

  was:
Apache Spark are using com.google.guava:guava version 14.0.1 which has two 
security issues.

[CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] 

[CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] 

We should upgrade to [version 
30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] 


> Upgrade Google guava to version 30.0-jre
> 
>
> Key: SPARK-38262
> URL: https://issues.apache.org/jira/browse/SPARK-38262
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> Apache Spark is using com.google.guava:guava version 14.0.1 which has two 
> security issues.
> [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] 
> [CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] 
> We should upgrade to [version 
> 30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38263) StructType explode

2022-02-20 Thread Sayed Mohammad Hossein Torabi (Jira)
Sayed Mohammad Hossein Torabi created SPARK-38263:
-

 Summary: StructType explode
 Key: SPARK-38263
 URL: https://issues.apache.org/jira/browse/SPARK-38263
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.1
Reporter: Sayed Mohammad Hossein Torabi


Currently explode function only supports Array datatypes and Map datatypes but 
not StructType. Supporting StructType helps spark user's to transform datasets 
to a flatten one and this feature would be helpful with dealing semi-structured 
and unstructured datasets.
the idea is to support StructType in the first place and also add `prefix` and 
`postfix` option to it 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38262) Upgrade Google guava to version 30.0-jre

2022-02-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-38262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bjørn Jørgensen updated SPARK-38262:

Description: 
Apache Spark are using com.google.guava:guava version 14.0.1 which has two 
security issues.

[CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] 

[CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] 

We should upgrade to [version 
30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] 

  was:
Apache Spark are using com.google.guava:guava version 14.0 which has two 
security issues.

[CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] 

[CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] 

We should upgrade to [version 
30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] 


> Upgrade Google guava to version 30.0-jre
> 
>
> Key: SPARK-38262
> URL: https://issues.apache.org/jira/browse/SPARK-38262
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> Apache Spark are using com.google.guava:guava version 14.0.1 which has two 
> security issues.
> [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] 
> [CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] 
> We should upgrade to [version 
> 30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38262) Upgrade Google guava to version 30.0-jre

2022-02-20 Thread Jira
Bjørn Jørgensen created SPARK-38262:
---

 Summary: Upgrade Google guava to version 30.0-jre
 Key: SPARK-38262
 URL: https://issues.apache.org/jira/browse/SPARK-38262
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.3.0
Reporter: Bjørn Jørgensen


Apache Spark are using com.google.guava:guava version 14.0 which has two 
security issues.

[CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] 

[CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] 

We should upgrade to [version 
30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38261) Sync missing R packages with CI

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38261:


Assignee: (was: Apache Spark)

> Sync missing R packages with CI
> ---
>
> Key: SPARK-38261
> URL: https://issues.apache.org/jira/browse/SPARK-38261
> Project: Spark
>  Issue Type: Github Integration
>  Components: Build
>Affects Versions: 3.2.1
>Reporter: Khalid Mammadov
>Priority: Minor
>
> Current GitHub workflow job *Linters, licenses, dependencies and 
> documentation generation* is missing R packages to complete Documentation and 
> API build.
> *Build and test* -  is not failing as these packages are installed in the 
> base image.
> We need to keep them in-sync IMO with the base image for easy switch back to 
> ubuntu runner when ready.
> These R packages are missing: *markdown* and  *e1071*
> Reference:
> Base image - 
> https://hub.docker.com/layers/dongjoon/apache-spark-github-action-image/20220207/images/sha256-af09d172ff8e2cbd71df9a1bc5384a47578c4a4cc293786c539333cafaf4a7ce?context=explore



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38261) Sync missing R packages with CI

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495179#comment-17495179
 ] 

Apache Spark commented on SPARK-38261:
--

User 'khalidmammadov' has created a pull request for this issue:
https://github.com/apache/spark/pull/35583

> Sync missing R packages with CI
> ---
>
> Key: SPARK-38261
> URL: https://issues.apache.org/jira/browse/SPARK-38261
> Project: Spark
>  Issue Type: Github Integration
>  Components: Build
>Affects Versions: 3.2.1
>Reporter: Khalid Mammadov
>Priority: Minor
>
> Current GitHub workflow job *Linters, licenses, dependencies and 
> documentation generation* is missing R packages to complete Documentation and 
> API build.
> *Build and test* -  is not failing as these packages are installed in the 
> base image.
> We need to keep them in-sync IMO with the base image for easy switch back to 
> ubuntu runner when ready.
> These R packages are missing: *markdown* and  *e1071*
> Reference:
> Base image - 
> https://hub.docker.com/layers/dongjoon/apache-spark-github-action-image/20220207/images/sha256-af09d172ff8e2cbd71df9a1bc5384a47578c4a4cc293786c539333cafaf4a7ce?context=explore



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38261) Sync missing R packages with CI

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38261:


Assignee: Apache Spark

> Sync missing R packages with CI
> ---
>
> Key: SPARK-38261
> URL: https://issues.apache.org/jira/browse/SPARK-38261
> Project: Spark
>  Issue Type: Github Integration
>  Components: Build
>Affects Versions: 3.2.1
>Reporter: Khalid Mammadov
>Assignee: Apache Spark
>Priority: Minor
>
> Current GitHub workflow job *Linters, licenses, dependencies and 
> documentation generation* is missing R packages to complete Documentation and 
> API build.
> *Build and test* -  is not failing as these packages are installed in the 
> base image.
> We need to keep them in-sync IMO with the base image for easy switch back to 
> ubuntu runner when ready.
> These R packages are missing: *markdown* and  *e1071*
> Reference:
> Base image - 
> https://hub.docker.com/layers/dongjoon/apache-spark-github-action-image/20220207/images/sha256-af09d172ff8e2cbd71df9a1bc5384a47578c4a4cc293786c539333cafaf4a7ce?context=explore



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38261) Sync missing R packages with CI

2022-02-20 Thread Khalid Mammadov (Jira)
Khalid Mammadov created SPARK-38261:
---

 Summary: Sync missing R packages with CI
 Key: SPARK-38261
 URL: https://issues.apache.org/jira/browse/SPARK-38261
 Project: Spark
  Issue Type: Github Integration
  Components: Build
Affects Versions: 3.2.1
Reporter: Khalid Mammadov


Current GitHub workflow job *Linters, licenses, dependencies and documentation 
generation* is missing R packages to complete Documentation and API build.

*Build and test* -  is not failing as these packages are installed in the base 
image.

We need to keep them in-sync IMO with the base image for easy switch back to 
ubuntu runner when ready.

These R packages are missing: *markdown* and  *e1071*

Reference:

Base image - 
https://hub.docker.com/layers/dongjoon/apache-spark-github-action-image/20220207/images/sha256-af09d172ff8e2cbd71df9a1bc5384a47578c4a4cc293786c539333cafaf4a7ce?context=explore



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37954) old columns should not be available after select or drop

2022-02-20 Thread Varun Shah (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495177#comment-17495177
 ] 

Varun Shah commented on SPARK-37954:


Hi [~andrewfmurphy], there are 2 reasons you dont see any error in the drop 
function:
 # As you rightly pointed out, the function will not throw any error in case 
the column does not exists. I think this is debatable, considering the fact 
that functions like select/filter throws error if column is missing.
 # The other factor is spark catalyst which optimizes your query/DAG by 
performing optimizations like predicate push down, which in the example you 
have mentioned tries pushing the filter downward and during this, recognizes 
the column rename and uses oldCol from the original dataframe/rdd
 # Run the following 2 examples to see how the Plans get created for different 
scenarios:

 
{code:java}
from pyspark.sql import SparkSession
from pyspark.sql.functions import col as col
spark = SparkSession.builder.appName('available_columns').getOrCreate()
df = spark.createDataFrame([{"oldcol": 1}, {"oldcol": 2}])
df = df.withColumnRenamed('oldcol', 'newcol')
df = df.filter(col("oldcol")!=2)
df.count()
df.explain("extended") {code}
 

 
{noformat}
== Parsed Logical Plan ==
'Filter NOT ('oldcol = 2)
+- Project [oldcol#168L AS newcol#170L]
   +- LogicalRDD [oldcol#168L], false

== Analyzed Logical Plan ==
newcol: bigint
Project [newcol#170L]
+- Filter NOT (oldcol#168L = cast(2 as bigint))
   +- Project [oldcol#168L AS newcol#170L, oldcol#168L]
      +- LogicalRDD [oldcol#168L], false

== Optimized Logical Plan ==
Project [oldcol#168L AS newcol#170L]
+- Filter (isnotnull(oldcol#168L) AND NOT (oldcol#168L = 2))
   +- LogicalRDD [oldcol#168L], false

== Physical Plan ==
*(1) Project [oldcol#168L AS newcol#170L]
+- *(1) Filter (isnotnull(oldcol#168L) AND NOT (oldcol#168L = 2))
   +- *(1) Scan ExistingRDD[oldcol#168L]{noformat}
 



 
{code:java}
# action -2 
df2 = df.select(col("newcol"))
df2 = df2.filter(col("oldcol")!=2)
df2.count()
df2.explain("extended"){code}
 
{code:java}
== Parsed Logical Plan ==
'Filter NOT ('oldcol = 2)
+- Project [newcol#170L]
   +- Project [newcol#170L]
      +- Filter NOT (oldcol#168L = cast(2 as bigint))
         +- Project [oldcol#168L AS newcol#170L, oldcol#168L]
            +- LogicalRDD [oldcol#168L], false

== Analyzed Logical Plan ==
newcol: bigint
Project [newcol#170L]
+- Filter NOT (oldcol#168L = cast(2 as bigint))
   +- Project [newcol#170L, oldcol#168L]
      +- Project [newcol#170L, oldcol#168L]
         +- Filter NOT (oldcol#168L = cast(2 as bigint))
            +- Project [oldcol#168L AS newcol#170L, oldcol#168L]
               +- LogicalRDD [oldcol#168L], false

== Optimized Logical Plan ==
Project [oldcol#168L AS newcol#170L]
+- Filter (isnotnull(oldcol#168L) AND NOT (oldcol#168L = 2))
   +- LogicalRDD [oldcol#168L], false

== Physical Plan ==
*(1) Project [oldcol#168L AS newcol#170L]
+- *(1) Filter (isnotnull(oldcol#168L) AND NOT (oldcol#168L = 2))
   +- *(1) Scan ExistingRDD[oldcol#168L]


{code}
 

 

> old columns should not be available after select or drop
> 
>
> Key: SPARK-37954
> URL: https://issues.apache.org/jira/browse/SPARK-37954
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.1
>Reporter: Jean Bon
>Priority: Major
>
>  
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import col as col
> spark = SparkSession.builder.appName('available_columns').getOrCreate()
> df = spark.range(5).select((col("id")+10).alias("id2"))
> assert df.columns==["id2"] #OK
> try:
>     df.select("id")
>     error_raise = False
> except:
>     error_raise = True
> assert error_raise #OK
> df = df.drop("id") #should raise an error
> df.filter(col("id")!=2).count() #returns 4, should raise an error
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38260) Remove dependence on commons-net

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495156#comment-17495156
 ] 

Apache Spark commented on SPARK-38260:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/35582

> Remove dependence on commons-net
> 
>
> Key: SPARK-38260
> URL: https://issues.apache.org/jira/browse/SPARK-38260
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
> Environment: Spark doesn't rely on commons-net directly
>  
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38260) Remove dependence on commons-net

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38260:


Assignee: Apache Spark

> Remove dependence on commons-net
> 
>
> Key: SPARK-38260
> URL: https://issues.apache.org/jira/browse/SPARK-38260
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
> Environment: Spark doesn't rely on commons-net directly
>  
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38260) Remove dependence on commons-net

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38260:


Assignee: (was: Apache Spark)

> Remove dependence on commons-net
> 
>
> Key: SPARK-38260
> URL: https://issues.apache.org/jira/browse/SPARK-38260
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
> Environment: Spark doesn't rely on commons-net directly
>  
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38260) Remove dependence on commons-net

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495155#comment-17495155
 ] 

Apache Spark commented on SPARK-38260:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/35582

> Remove dependence on commons-net
> 
>
> Key: SPARK-38260
> URL: https://issues.apache.org/jira/browse/SPARK-38260
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
> Environment: Spark doesn't rely on commons-net directly
>  
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38260) Remove dependence on commons-net

2022-02-20 Thread Yang Jie (Jira)
Yang Jie created SPARK-38260:


 Summary: Remove dependence on commons-net
 Key: SPARK-38260
 URL: https://issues.apache.org/jira/browse/SPARK-38260
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.3.0
 Environment: Spark doesn't rely on commons-net directly

 
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38259) Upgrade netty to 4.1.74

2022-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495149#comment-17495149
 ] 

Apache Spark commented on SPARK-38259:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/35581

> Upgrade netty to 4.1.74
> ---
>
> Key: SPARK-38259
> URL: https://issues.apache.org/jira/browse/SPARK-38259
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://netty.io/news/2022/02/08/4-1-74-Final.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38259) Upgrade netty to 4.1.74

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38259:


Assignee: Apache Spark

> Upgrade netty to 4.1.74
> ---
>
> Key: SPARK-38259
> URL: https://issues.apache.org/jira/browse/SPARK-38259
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> https://netty.io/news/2022/02/08/4-1-74-Final.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38259) Upgrade netty to 4.1.74

2022-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38259:


Assignee: (was: Apache Spark)

> Upgrade netty to 4.1.74
> ---
>
> Key: SPARK-38259
> URL: https://issues.apache.org/jira/browse/SPARK-38259
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://netty.io/news/2022/02/08/4-1-74-Final.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38259) Upgrade netty to 4.1.74

2022-02-20 Thread Yang Jie (Jira)
Yang Jie created SPARK-38259:


 Summary: Upgrade netty to 4.1.74
 Key: SPARK-38259
 URL: https://issues.apache.org/jira/browse/SPARK-38259
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.3.0
Reporter: Yang Jie


https://netty.io/news/2022/02/08/4-1-74-Final.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running

2022-02-20 Thread gabrywu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gabrywu updated SPARK-38258:

Affects Version/s: 2.4.0

> [proposal] collect & update statistics automatically when spark SQL is running
> --
>
> Key: SPARK-38258
> URL: https://issues.apache.org/jira/browse/SPARK-38258
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0
>Reporter: gabrywu
>Priority: Minor
>
> As we all know, table & column statistics are very important to spark SQL 
> optimizer, however we have to collect & update them using 
> {code:java}
> analyze table tableName compute statistics{code}
>  
> It's a little inconvenient, so why can't we collect & update statistics when 
> a spark stage runs and finishes?
> For example, when a insert overwrite table statement finishes, we can update 
> a corresponding table statistics using SQL metric. And in following queries, 
> spark sql optimizer can use these statistics.
> So what do you think of it?[~yumwang] , it it reasonable?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running

2022-02-20 Thread gabrywu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gabrywu updated SPARK-38258:

Description: 
As we all know, table & column statistics are very important to spark SQL 
optimizer, however we have to collect & update them using 
{code:java}
analyze table tableName compute statistics{code}
 

It's a little inconvenient, so why can't we collect & update statistics when a 
spark stage runs and finishes?

For example, when a insert overwrite table statement finishes, we can update a 
corresponding table statistics using SQL metric. And in following queries, 
spark sql optimizer can use these statistics.

So what do you think of it?[~yumwang] , it it reasonable?

  was:
As we all know, table & column statistics are very important to spark SQL 
optimizer, however we have to collect & update them using 
{code:java}
analyze table tableName compute statistics{code}
 

It's a little inconvenient, so why can't we collect & update statistics when a 
spark stage runs and finishes?

For example, when a insert overwrite table statement finishes, we can update a 
corresponding table statistics using SQL metric. And in next queries, spark sql 
optimizer can use these statistics.

So what do you think of it?[~yumwang] 


> [proposal] collect & update statistics automatically when spark SQL is running
> --
>
> Key: SPARK-38258
> URL: https://issues.apache.org/jira/browse/SPARK-38258
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core, SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0
>Reporter: gabrywu
>Priority: Minor
>
> As we all know, table & column statistics are very important to spark SQL 
> optimizer, however we have to collect & update them using 
> {code:java}
> analyze table tableName compute statistics{code}
>  
> It's a little inconvenient, so why can't we collect & update statistics when 
> a spark stage runs and finishes?
> For example, when a insert overwrite table statement finishes, we can update 
> a corresponding table statistics using SQL metric. And in following queries, 
> spark sql optimizer can use these statistics.
> So what do you think of it?[~yumwang] , it it reasonable?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running

2022-02-20 Thread gabrywu (Jira)
gabrywu created SPARK-38258:
---

 Summary: [proposal] collect & update statistics automatically when 
spark SQL is running
 Key: SPARK-38258
 URL: https://issues.apache.org/jira/browse/SPARK-38258
 Project: Spark
  Issue Type: Wish
  Components: Spark Core, SQL
Affects Versions: 3.2.0, 3.1.0, 3.0.0
Reporter: gabrywu


As we all know, table & column statistics are very important to spark SQL 
optimizer, however we have to collect & update them using 
{code:java}
analyze table tableName compute statistics{code}
 

It's a little inconvenient, so why can't we collect & update statistics when a 
spark stage runs and finishes?

For example, when a insert overwrite table statement finishes, we can update a 
corresponding table statistics using SQL metric. And in next queries, spark sql 
optimizer can use these statistics.

So what do you think of it?[~yumwang] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org