[jira] [Resolved] (SPARK-38140) Desc column stats (min, max) for timestamp type is not consistent with the value due to time zone difference
[ https://issues.apache.org/jira/browse/SPARK-38140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-38140. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35440 [https://github.com/apache/spark/pull/35440] > Desc column stats (min, max) for timestamp type is not consistent with the > value due to time zone difference > > > Key: SPARK-38140 > URL: https://issues.apache.org/jira/browse/SPARK-38140 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.1 >Reporter: Zhenhua Wang >Assignee: Zhenhua Wang >Priority: Minor > Fix For: 3.3.0 > > > Currently timestamp column's stats (min/max) are stored in UTC in metastore, > and when desc its min/max column stats, they are also shown in UTC. > As a result, for users not in UTC, the column stats (shown to users) are not > consistent with the actual value, which causes confusion. > For example: > {noformat} > spark-sql> create table tab_ts_master (ts timestamp) using parquet; > spark-sql> insert into tab_ts_master values make_timestamp(2022, 1, 1, 0, 0, > 1.123456), make_timestamp(2022, 1, 3, 0, 0, 2.987654); > spark-sql> select * from tab_ts_master; > 2022-01-01 00:00:01.123456 > 2022-01-03 00:00:02.987654 > spark-sql> set spark.sql.session.timeZone; > spark.sql.session.timeZoneAsia/Shanghai > spark-sql> analyze table tab_ts_master compute statistics for all columns; > spark-sql> desc formatted tab_ts_master ts; > col_name ts > data_type timestamp > comment NULL > min 2021-12-31 16:00:01.123456 > max 2022-01-02 16:00:02.987654 > num_nulls 0 > distinct_count2 > avg_col_len 8 > max_col_len 8 > histogram NULL > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38140) Desc column stats (min, max) for timestamp type is not consistent with the value due to time zone difference
[ https://issues.apache.org/jira/browse/SPARK-38140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-38140: --- Assignee: Zhenhua Wang > Desc column stats (min, max) for timestamp type is not consistent with the > value due to time zone difference > > > Key: SPARK-38140 > URL: https://issues.apache.org/jira/browse/SPARK-38140 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.1 >Reporter: Zhenhua Wang >Assignee: Zhenhua Wang >Priority: Minor > > Currently timestamp column's stats (min/max) are stored in UTC in metastore, > and when desc its min/max column stats, they are also shown in UTC. > As a result, for users not in UTC, the column stats (shown to users) are not > consistent with the actual value, which causes confusion. > For example: > {noformat} > spark-sql> create table tab_ts_master (ts timestamp) using parquet; > spark-sql> insert into tab_ts_master values make_timestamp(2022, 1, 1, 0, 0, > 1.123456), make_timestamp(2022, 1, 3, 0, 0, 2.987654); > spark-sql> select * from tab_ts_master; > 2022-01-01 00:00:01.123456 > 2022-01-03 00:00:02.987654 > spark-sql> set spark.sql.session.timeZone; > spark.sql.session.timeZoneAsia/Shanghai > spark-sql> analyze table tab_ts_master compute statistics for all columns; > spark-sql> desc formatted tab_ts_master ts; > col_name ts > data_type timestamp > comment NULL > min 2021-12-31 16:00:01.123456 > max 2022-01-02 16:00:02.987654 > num_nulls 0 > distinct_count2 > avg_col_len 8 > max_col_len 8 > histogram NULL > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38270) SQL CLI AM should keep same exitcode with client
[ https://issues.apache.org/jira/browse/SPARK-38270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-38270: -- Description: Currently for SQL CLI, we all use shutdown hook to stop SC {code:java} // Clean up after we exit ShutdownHookManager.addShutdownHook { () => SparkSQLEnv.stop() } {code} This cause Yarn AM always success even client exit with code not 0. > SQL CLI AM should keep same exitcode with client > > > Key: SPARK-38270 > URL: https://issues.apache.org/jira/browse/SPARK-38270 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.1 >Reporter: angerszhu >Priority: Major > > Currently for SQL CLI, we all use shutdown hook to stop SC > {code:java} > // Clean up after we exit > ShutdownHookManager.addShutdownHook { () => SparkSQLEnv.stop() } > {code} > This cause Yarn AM always success even client exit with code not 0. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38270) SQL CLI AM should keep same exitcode with client
angerszhu created SPARK-38270: - Summary: SQL CLI AM should keep same exitcode with client Key: SPARK-38270 URL: https://issues.apache.org/jira/browse/SPARK-38270 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.1 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37475) Add Scale Parameter to Floor and Ceil functions
[ https://issues.apache.org/jira/browse/SPARK-37475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-37475: --- Assignee: Sathiya Kumar > Add Scale Parameter to Floor and Ceil functions > --- > > Key: SPARK-37475 > URL: https://issues.apache.org/jira/browse/SPARK-37475 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Sathiya Kumar >Assignee: Sathiya Kumar >Priority: Minor > Fix For: 3.3.0 > > > This feature is proposed in the PR : > https://github.com/apache/spark/pull/34593 > Currently we support Decimal RoundingModes : HALF_UP (round) and HALF_EVEN > (bround). But we have use cases that needs RoundingMode.UP and > RoundingMode.DOWN. > [https://stackoverflow.com/questions/34888419/round-down-double-in-spark/40476117] > [https://stackoverflow.com/questions/54683066/is-there-a-rounddown-function-in-sql-as-there-is-in-excel] > [https://stackoverflow.com/questions/48279641/oracle-sql-round-half] > > Floor and Ceil functions helps to do this but it doesn't support the position > of the rounding. Adding scale parameter to the functions would help us > control the rounding positions. > > Snowflake supports `scale` parameter to `floor`/`ceil` : > {code:java} > FLOOR( [, ] ){code} > REF: > [https://docs.snowflake.com/en/sql-reference/functions/floor.html] > > > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37475) Add Scale Parameter to Floor and Ceil functions
[ https://issues.apache.org/jira/browse/SPARK-37475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-37475. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34729 [https://github.com/apache/spark/pull/34729] > Add Scale Parameter to Floor and Ceil functions > --- > > Key: SPARK-37475 > URL: https://issues.apache.org/jira/browse/SPARK-37475 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Sathiya Kumar >Priority: Minor > Fix For: 3.3.0 > > > This feature is proposed in the PR : > https://github.com/apache/spark/pull/34593 > Currently we support Decimal RoundingModes : HALF_UP (round) and HALF_EVEN > (bround). But we have use cases that needs RoundingMode.UP and > RoundingMode.DOWN. > [https://stackoverflow.com/questions/34888419/round-down-double-in-spark/40476117] > [https://stackoverflow.com/questions/54683066/is-there-a-rounddown-function-in-sql-as-there-is-in-excel] > [https://stackoverflow.com/questions/48279641/oracle-sql-round-half] > > Floor and Ceil functions helps to do this but it doesn't support the position > of the rounding. Adding scale parameter to the functions would help us > control the rounding positions. > > Snowflake supports `scale` parameter to `floor`/`ceil` : > {code:java} > FLOOR( [, ] ){code} > REF: > [https://docs.snowflake.com/en/sql-reference/functions/floor.html] > > > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38227) Apply strict nullability of nested column in time window / session window
[ https://issues.apache.org/jira/browse/SPARK-38227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-38227. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35543 [https://github.com/apache/spark/pull/35543] > Apply strict nullability of nested column in time window / session window > - > > Key: SPARK-38227 > URL: https://issues.apache.org/jira/browse/SPARK-38227 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.1, 3.3.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.3.0 > > > In TimeWindow and SessionWindow, we define dataType of these function > expressions as StructType having two nested columns "start" and "end", which > is "nullable". > And we replace these expressions in the analyzer via corresponding rules, > TimeWindowing for TimeWindow, and SessionWindowing for SessionWindow. > The rules replace the function expressions with Alias, referring > CreateNamedStruct. For the value side of CreateNamedStruct, we don't specify > anything about nullability, which leads to a risk the value side may be > interpreted (or optimized) as non-nullable, which would make inconsistency. > We should make sure the nullability of columns in CreateNamedStruct remains > the same with dataType definition on these function expressions. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38227) Apply strict nullability of nested column in time window / session window
[ https://issues.apache.org/jira/browse/SPARK-38227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh reassigned SPARK-38227: --- Assignee: Jungtaek Lim > Apply strict nullability of nested column in time window / session window > - > > Key: SPARK-38227 > URL: https://issues.apache.org/jira/browse/SPARK-38227 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.1, 3.3.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > In TimeWindow and SessionWindow, we define dataType of these function > expressions as StructType having two nested columns "start" and "end", which > is "nullable". > And we replace these expressions in the analyzer via corresponding rules, > TimeWindowing for TimeWindow, and SessionWindowing for SessionWindow. > The rules replace the function expressions with Alias, referring > CreateNamedStruct. For the value side of CreateNamedStruct, we don't specify > anything about nullability, which leads to a risk the value side may be > interpreted (or optimized) as non-nullable, which would make inconsistency. > We should make sure the nullability of columns in CreateNamedStruct remains > the same with dataType definition on these function expressions. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38269) Clean up redundant type cast
[ https://issues.apache.org/jira/browse/SPARK-38269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495325#comment-17495325 ] Apache Spark commented on SPARK-38269: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/35592 > Clean up redundant type cast > > > Key: SPARK-38269 > URL: https://issues.apache.org/jira/browse/SPARK-38269 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38269) Clean up redundant type cast
[ https://issues.apache.org/jira/browse/SPARK-38269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38269: Assignee: (was: Apache Spark) > Clean up redundant type cast > > > Key: SPARK-38269 > URL: https://issues.apache.org/jira/browse/SPARK-38269 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38269) Clean up redundant type cast
[ https://issues.apache.org/jira/browse/SPARK-38269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38269: Assignee: Apache Spark > Clean up redundant type cast > > > Key: SPARK-38269 > URL: https://issues.apache.org/jira/browse/SPARK-38269 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38269) Clean up redundant type cast
Yang Jie created SPARK-38269: Summary: Clean up redundant type cast Key: SPARK-38269 URL: https://issues.apache.org/jira/browse/SPARK-38269 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 3.3.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-38200) [SQL] Spark JDBC Savemode Supports replace
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495322#comment-17495322 ] melin edited comment on SPARK-38200 at 2/21/22, 5:17 AM: - [~beliefer] oracle: [https://docs.oracle.com/en/database/other-databases/nosql-database/21.1/sqlfornosql/adding-table-rows-using-insert-and-upsert-statements.html] was (Author: melin): [~beliefer] > [SQL] Spark JDBC Savemode Supports replace > -- > > Key: SPARK-38200 > URL: https://issues.apache.org/jira/browse/SPARK-38200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > When writing data into a relational database, data duplication needs to be > considered. Both mysql and postgres support upsert syntax. > mysql: > {code:java} > replace into t(id, update_time) values(1, now()); {code} > pg: > {code:java} > INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT > (id,name) DO UPDATE SET > id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-38200) [SQL] Spark JDBC Savemode Supports replace
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495322#comment-17495322 ] melin edited comment on SPARK-38200 at 2/21/22, 5:17 AM: - [~beliefer] oracle: [https://docs.oracle.com/en/database/other-databases/nosql-database/21.1/sqlfornosql/adding-table-rows-using-insert-and-upsert-statements.html] db2 or sqlserver {code:java} MERGE INTO mytable AS mt USING ( SELECT * FROM TABLE ( VALUES (123, 'text') ) ) AS vt(id, val) ON (mt.id = vt.id) WHEN MATCHED THEN UPDATE SET val = vt.val WHEN NOT MATCHED THEN INSERT (id, val) VALUES (vt.id, vt.val) ; {code} was (Author: melin): [~beliefer] oracle: [https://docs.oracle.com/en/database/other-databases/nosql-database/21.1/sqlfornosql/adding-table-rows-using-insert-and-upsert-statements.html] > [SQL] Spark JDBC Savemode Supports replace > -- > > Key: SPARK-38200 > URL: https://issues.apache.org/jira/browse/SPARK-38200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > When writing data into a relational database, data duplication needs to be > considered. Both mysql and postgres support upsert syntax. > mysql: > {code:java} > replace into t(id, update_time) values(1, now()); {code} > pg: > {code:java} > INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT > (id,name) DO UPDATE SET > id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38236) Absolute file paths specified in create/alter table are treated as relative
[ https://issues.apache.org/jira/browse/SPARK-38236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495324#comment-17495324 ] Apache Spark commented on SPARK-38236: -- User 'bozhang2820' has created a pull request for this issue: https://github.com/apache/spark/pull/35591 > Absolute file paths specified in create/alter table are treated as relative > --- > > Key: SPARK-38236 > URL: https://issues.apache.org/jira/browse/SPARK-38236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.1.2, 3.2.0, 3.2.1 >Reporter: Bo Zhang >Priority: Major > > After https://github.com/apache/spark/pull/28527 we change to create table > under the database location when the table location specified is relative. > However the criteria to determine if a table location is relative/absolute is > URI.isAbsolute, which basically checks if the table location URI has a scheme > defined. So table URIs like /table/path are treated as relative and the > scheme and authority of the database location URI are used to create the > table. For example, when the database location URI is s3a://bucket/db, the > table will be created at s3a://bucket/table/path, while it should be created > under the file system defined in SessionCatalog.hadoopConf instead. > This also applies to alter table. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38236) Absolute file paths specified in create/alter table are treated as relative
[ https://issues.apache.org/jira/browse/SPARK-38236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495323#comment-17495323 ] Apache Spark commented on SPARK-38236: -- User 'bozhang2820' has created a pull request for this issue: https://github.com/apache/spark/pull/35591 > Absolute file paths specified in create/alter table are treated as relative > --- > > Key: SPARK-38236 > URL: https://issues.apache.org/jira/browse/SPARK-38236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.1.2, 3.2.0, 3.2.1 >Reporter: Bo Zhang >Priority: Major > > After https://github.com/apache/spark/pull/28527 we change to create table > under the database location when the table location specified is relative. > However the criteria to determine if a table location is relative/absolute is > URI.isAbsolute, which basically checks if the table location URI has a scheme > defined. So table URIs like /table/path are treated as relative and the > scheme and authority of the database location URI are used to create the > table. For example, when the database location URI is s3a://bucket/db, the > table will be created at s3a://bucket/table/path, while it should be created > under the file system defined in SessionCatalog.hadoopConf instead. > This also applies to alter table. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports replace
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495322#comment-17495322 ] melin commented on SPARK-38200: --- [~beliefer] > [SQL] Spark JDBC Savemode Supports replace > -- > > Key: SPARK-38200 > URL: https://issues.apache.org/jira/browse/SPARK-38200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > When writing data into a relational database, data duplication needs to be > considered. Both mysql and postgres support upsert syntax. > mysql: > {code:java} > replace into t(id, update_time) values(1, now()); {code} > pg: > {code:java} > INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT > (id,name) DO UPDATE SET > id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38268) Hide the "failOnError" field in the toString method of Abs/CheckOverflow
[ https://issues.apache.org/jira/browse/SPARK-38268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38268: Assignee: Gengliang Wang (was: Apache Spark) > Hide the "failOnError" field in the toString method of Abs/CheckOverflow > > > Key: SPARK-38268 > URL: https://issues.apache.org/jira/browse/SPARK-38268 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > To fix most of the test failures of *PlanStabilitySuite under ANSI mode. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38268) Hide the "failOnError" field in the toString method of Abs/CheckOverflow
[ https://issues.apache.org/jira/browse/SPARK-38268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495318#comment-17495318 ] Apache Spark commented on SPARK-38268: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/35590 > Hide the "failOnError" field in the toString method of Abs/CheckOverflow > > > Key: SPARK-38268 > URL: https://issues.apache.org/jira/browse/SPARK-38268 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > To fix most of the test failures of *PlanStabilitySuite under ANSI mode. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38268) Hide the "failOnError" field in the toString method of Abs/CheckOverflow
[ https://issues.apache.org/jira/browse/SPARK-38268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495319#comment-17495319 ] Apache Spark commented on SPARK-38268: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/35590 > Hide the "failOnError" field in the toString method of Abs/CheckOverflow > > > Key: SPARK-38268 > URL: https://issues.apache.org/jira/browse/SPARK-38268 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > To fix most of the test failures of *PlanStabilitySuite under ANSI mode. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38268) Hide the "failOnError" field in the toString method of Abs/CheckOverflow
[ https://issues.apache.org/jira/browse/SPARK-38268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38268: Assignee: Apache Spark (was: Gengliang Wang) > Hide the "failOnError" field in the toString method of Abs/CheckOverflow > > > Key: SPARK-38268 > URL: https://issues.apache.org/jira/browse/SPARK-38268 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > To fix most of the test failures of *PlanStabilitySuite under ANSI mode. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38268) Hide the "failOnError" field in the toString method of Abs/CheckOverflow
Gengliang Wang created SPARK-38268: -- Summary: Hide the "failOnError" field in the toString method of Abs/CheckOverflow Key: SPARK-38268 URL: https://issues.apache.org/jira/browse/SPARK-38268 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Gengliang Wang Assignee: Gengliang Wang To fix most of the test failures of *PlanStabilitySuite under ANSI mode. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38267) Replace pattern matches on boolean expressions with conditional statements
[ https://issues.apache.org/jira/browse/SPARK-38267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495311#comment-17495311 ] Apache Spark commented on SPARK-38267: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/35589 > Replace pattern matches on boolean expressions with conditional statements > -- > > Key: SPARK-38267 > URL: https://issues.apache.org/jira/browse/SPARK-38267 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > Before > > {code:java} > val bool: Boolean > bool match { > case true => do something when bool is true > case false => do something when bool is false > } {code} > > > After > > {code:java} > val bool: Boolean > if (bool) { > do something when bool is true > } else { > do something when bool is false > } {code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38267) Replace pattern matches on boolean expressions with conditional statements
[ https://issues.apache.org/jira/browse/SPARK-38267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38267: Assignee: Apache Spark > Replace pattern matches on boolean expressions with conditional statements > -- > > Key: SPARK-38267 > URL: https://issues.apache.org/jira/browse/SPARK-38267 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > Before > > {code:java} > val bool: Boolean > bool match { > case true => do something when bool is true > case false => do something when bool is false > } {code} > > > After > > {code:java} > val bool: Boolean > if (bool) { > do something when bool is true > } else { > do something when bool is false > } {code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38267) Replace pattern matches on boolean expressions with conditional statements
[ https://issues.apache.org/jira/browse/SPARK-38267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38267: Assignee: (was: Apache Spark) > Replace pattern matches on boolean expressions with conditional statements > -- > > Key: SPARK-38267 > URL: https://issues.apache.org/jira/browse/SPARK-38267 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > Before > > {code:java} > val bool: Boolean > bool match { > case true => do something when bool is true > case false => do something when bool is false > } {code} > > > After > > {code:java} > val bool: Boolean > if (bool) { > do something when bool is true > } else { > do something when bool is false > } {code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38267) Replace pattern matches on boolean expressions with conditional statements
[ https://issues.apache.org/jira/browse/SPARK-38267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495310#comment-17495310 ] Apache Spark commented on SPARK-38267: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/35589 > Replace pattern matches on boolean expressions with conditional statements > -- > > Key: SPARK-38267 > URL: https://issues.apache.org/jira/browse/SPARK-38267 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > Before > > {code:java} > val bool: Boolean > bool match { > case true => do something when bool is true > case false => do something when bool is false > } {code} > > > After > > {code:java} > val bool: Boolean > if (bool) { > do something when bool is true > } else { > do something when bool is false > } {code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38267) Replace pattern matches on boolean expressions with conditional statements
Yang Jie created SPARK-38267: Summary: Replace pattern matches on boolean expressions with conditional statements Key: SPARK-38267 URL: https://issues.apache.org/jira/browse/SPARK-38267 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Yang Jie Before {code:java} val bool: Boolean bool match { case true => do something when bool is true case false => do something when bool is false } {code} After {code:java} val bool: Boolean if (bool) { do something when bool is true } else { do something when bool is false } {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running
[ https://issues.apache.org/jira/browse/SPARK-38258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gabrywu updated SPARK-38258: Description: As we all know, table & column statistics are very important to spark SQL optimizer, however we have to collect & update them using {code:java} analyze table tableName compute statistics{code} It's a little inconvenient, so why can't we collect & update statistics when a spark stage runs and finishes? For example, when a insert overwrite table statement finishes, we can update a corresponding table statistics using SQL metrics. And in following queries, spark sql optimizer can use these statistics. As we all know, it's a common case that we run daily batches using Spark SQLs, so a same SQL can run every day, and the SQL and its corresponding tables data change slowly. That means we can use statistics updated on yesterday to optimize current SQLs. So we'd better add a mechanism to store every stage's statistics somewhere, and use it in new SQLs. Not just collect statistics after a stage finishes. was: As we all know, table & column statistics are very important to spark SQL optimizer, however we have to collect & update them using {code:java} analyze table tableName compute statistics{code} It's a little inconvenient, so why can't we collect & update statistics when a spark stage runs and finishes? For example, when a insert overwrite table statement finishes, we can update a corresponding table statistics using SQL metric. And in following queries, spark sql optimizer can use these statistics. As we all know, it's a common case that we run daily batches using Spark SQLs, so a same SQL can run every day, and the SQL and its corresponding tables data change slowly. That means we can use statistics updated on yesterday to optimize current SQL. So we'd better add a mechanism to store every stage's statistics somewhere, and use it in new SQLs. Not just collect statistics after a stage finishes. > [proposal] collect & update statistics automatically when spark SQL is running > -- > > Key: SPARK-38258 > URL: https://issues.apache.org/jira/browse/SPARK-38258 > Project: Spark > Issue Type: Wish > Components: Spark Core, SQL >Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0 >Reporter: gabrywu >Priority: Minor > > As we all know, table & column statistics are very important to spark SQL > optimizer, however we have to collect & update them using > {code:java} > analyze table tableName compute statistics{code} > > It's a little inconvenient, so why can't we collect & update statistics when > a spark stage runs and finishes? > For example, when a insert overwrite table statement finishes, we can update > a corresponding table statistics using SQL metrics. And in following queries, > spark sql optimizer can use these statistics. > As we all know, it's a common case that we run daily batches using Spark > SQLs, so a same SQL can run every day, and the SQL and its corresponding > tables data change slowly. That means we can use statistics updated on > yesterday to optimize current SQLs. > So we'd better add a mechanism to store every stage's statistics somewhere, > and use it in new SQLs. Not just collect statistics after a stage finishes. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running
[ https://issues.apache.org/jira/browse/SPARK-38258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gabrywu updated SPARK-38258: Description: As we all know, table & column statistics are very important to spark SQL optimizer, however we have to collect & update them using {code:java} analyze table tableName compute statistics{code} It's a little inconvenient, so why can't we collect & update statistics when a spark stage runs and finishes? For example, when a insert overwrite table statement finishes, we can update a corresponding table statistics using SQL metric. And in following queries, spark sql optimizer can use these statistics. As we all know, it's a common case that we run daily batches using Spark SQLs, so a same SQL can run every day, and the SQL and its corresponding tables data change slowly. That means we can use statistics updated on yesterday to optimize current SQL. So we'd better add a mechanism to store every stage's statistics somewhere, and use it in new SQLs. Not just collect statistics after a stage finishes. was: As we all know, table & column statistics are very important to spark SQL optimizer, however we have to collect & update them using {code:java} analyze table tableName compute statistics{code} It's a little inconvenient, so why can't we collect & update statistics when a spark stage runs and finishes? For example, when a insert overwrite table statement finishes, we can update a corresponding table statistics using SQL metric. And in following queries, spark sql optimizer can use these statistics. As we all know, it's a common case that we run daily batches using Spark SQLs, so a same SQL can run every day, and the SQL and its corresponding tables data change slowly. That means we can use sta > [proposal] collect & update statistics automatically when spark SQL is running > -- > > Key: SPARK-38258 > URL: https://issues.apache.org/jira/browse/SPARK-38258 > Project: Spark > Issue Type: Wish > Components: Spark Core, SQL >Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0 >Reporter: gabrywu >Priority: Minor > > As we all know, table & column statistics are very important to spark SQL > optimizer, however we have to collect & update them using > {code:java} > analyze table tableName compute statistics{code} > > It's a little inconvenient, so why can't we collect & update statistics when > a spark stage runs and finishes? > For example, when a insert overwrite table statement finishes, we can update > a corresponding table statistics using SQL metric. And in following queries, > spark sql optimizer can use these statistics. > As we all know, it's a common case that we run daily batches using Spark > SQLs, so a same SQL can run every day, and the SQL and its corresponding > tables data change slowly. That means we can use statistics updated on > yesterday to optimize current SQL. > So we'd better add a mechanism to store every stage's statistics somewhere, > and use it in new SQLs. Not just collect statistics after a stage finishes. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports replace
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495305#comment-17495305 ] jiaan.geng commented on SPARK-38200: [~melin] OK. Does other SQL could finish the same work as Upsert SQL? > [SQL] Spark JDBC Savemode Supports replace > -- > > Key: SPARK-38200 > URL: https://issues.apache.org/jira/browse/SPARK-38200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > When writing data into a relational database, data duplication needs to be > considered. Both mysql and postgres support upsert syntax. > mysql: > {code:java} > replace into t(id, update_time) values(1, now()); {code} > pg: > {code:java} > INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT > (id,name) DO UPDATE SET > id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running
[ https://issues.apache.org/jira/browse/SPARK-38258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gabrywu updated SPARK-38258: Description: As we all know, table & column statistics are very important to spark SQL optimizer, however we have to collect & update them using {code:java} analyze table tableName compute statistics{code} It's a little inconvenient, so why can't we collect & update statistics when a spark stage runs and finishes? For example, when a insert overwrite table statement finishes, we can update a corresponding table statistics using SQL metric. And in following queries, spark sql optimizer can use these statistics. As we all know, it's a common case that we run daily batches using Spark SQLs, so a same SQL can run every day, and the SQL and its corresponding tables data change slowly. That means we can use sta was: As we all know, table & column statistics are very important to spark SQL optimizer, however we have to collect & update them using {code:java} analyze table tableName compute statistics{code} It's a little inconvenient, so why can't we collect & update statistics when a spark stage runs and finishes? For example, when a insert overwrite table statement finishes, we can update a corresponding table statistics using SQL metric. And in following queries, spark sql optimizer can use these statistics. So what do you think of it?[~yumwang] , it it reasonable? > [proposal] collect & update statistics automatically when spark SQL is running > -- > > Key: SPARK-38258 > URL: https://issues.apache.org/jira/browse/SPARK-38258 > Project: Spark > Issue Type: Wish > Components: Spark Core, SQL >Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0 >Reporter: gabrywu >Priority: Minor > > As we all know, table & column statistics are very important to spark SQL > optimizer, however we have to collect & update them using > {code:java} > analyze table tableName compute statistics{code} > > It's a little inconvenient, so why can't we collect & update statistics when > a spark stage runs and finishes? > For example, when a insert overwrite table statement finishes, we can update > a corresponding table statistics using SQL metric. And in following queries, > spark sql optimizer can use these statistics. > As we all know, it's a common case that we run daily batches using Spark > SQLs, so a same SQL can run every day, and the SQL and its corresponding > tables data change slowly. That means we can use sta > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities
[ https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495295#comment-17495295 ] Apache Spark commented on SPARK-37090: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/35587 > Upgrade libthrift to resolve security vulnerabilities > - > > Key: SPARK-37090 > URL: https://issues.apache.org/jira/browse/SPARK-37090 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Juliusz Sompolski >Assignee: Yuming Wang >Priority: Major > Fix For: 3.3.0 > > > Currently, Spark uses libthrift 0.12, which has reported high severity > security vulnerabilities > https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift > Upgrade to 0.14 to get rid of vulnerabilities. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities
[ https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495294#comment-17495294 ] Apache Spark commented on SPARK-37090: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/35588 > Upgrade libthrift to resolve security vulnerabilities > - > > Key: SPARK-37090 > URL: https://issues.apache.org/jira/browse/SPARK-37090 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Juliusz Sompolski >Assignee: Yuming Wang >Priority: Major > Fix For: 3.3.0 > > > Currently, Spark uses libthrift 0.12, which has reported high severity > security vulnerabilities > https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift > Upgrade to 0.14 to get rid of vulnerabilities. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities
[ https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495293#comment-17495293 ] Apache Spark commented on SPARK-37090: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/35587 > Upgrade libthrift to resolve security vulnerabilities > - > > Key: SPARK-37090 > URL: https://issues.apache.org/jira/browse/SPARK-37090 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Juliusz Sompolski >Assignee: Yuming Wang >Priority: Major > Fix For: 3.3.0 > > > Currently, Spark uses libthrift 0.12, which has reported high severity > security vulnerabilities > https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift > Upgrade to 0.14 to get rid of vulnerabilities. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38266) UnresolvedException: Invalid call to dataType on unresolved object caused by GetDateFieldOperations
[ https://issues.apache.org/jira/browse/SPARK-38266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495286#comment-17495286 ] Apache Spark commented on SPARK-38266: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/35568 > UnresolvedException: Invalid call to dataType on unresolved object caused by > GetDateFieldOperations > --- > > Key: SPARK-38266 > URL: https://issues.apache.org/jira/browse/SPARK-38266 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.2 >Reporter: wuyi >Assignee: wuyi >Priority: Major > > {code:java} > test("GetDateFieldOperations should skip unresolved nodes") { > withSQLConf(SQLConf.ANSI_ENABLED.key -> "true") { > val df = Seq("1644821603").map(i => (i.toInt, i)).toDF("tsInt", "tsStr") > val df1 = df.select(df("tsStr").cast("timestamp")).as("df1") > val df2 = df.select(df("tsStr").cast("timestamp")).as("df2") > df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer") > val df3 = df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer") > .select($"df1.tsStr".as("timeStr")).as("df3") > // This throws "UnresolvedException: Invalid call to > // dataType on unresolved object" instead of "AnalysisException: Column > 'df1.timeStr' does not exist." > df3.join(df1, year($"df1.timeStr") === year($"df3.tsStr")) > } > } {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38266) UnresolvedException: Invalid call to dataType on unresolved object caused by GetDateFieldOperations
[ https://issues.apache.org/jira/browse/SPARK-38266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wuyi resolved SPARK-38266. -- Resolution: Fixed > UnresolvedException: Invalid call to dataType on unresolved object caused by > GetDateFieldOperations > --- > > Key: SPARK-38266 > URL: https://issues.apache.org/jira/browse/SPARK-38266 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.2 >Reporter: wuyi >Assignee: wuyi >Priority: Major > > {code:java} > test("GetDateFieldOperations should skip unresolved nodes") { > withSQLConf(SQLConf.ANSI_ENABLED.key -> "true") { > val df = Seq("1644821603").map(i => (i.toInt, i)).toDF("tsInt", "tsStr") > val df1 = df.select(df("tsStr").cast("timestamp")).as("df1") > val df2 = df.select(df("tsStr").cast("timestamp")).as("df2") > df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer") > val df3 = df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer") > .select($"df1.tsStr".as("timeStr")).as("df3") > // This throws "UnresolvedException: Invalid call to > // dataType on unresolved object" instead of "AnalysisException: Column > 'df1.timeStr' does not exist." > df3.join(df1, year($"df1.timeStr") === year($"df3.tsStr")) > } > } {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38266) UnresolvedException: Invalid call to dataType on unresolved object caused by GetDateFieldOperations
[ https://issues.apache.org/jira/browse/SPARK-38266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wuyi reassigned SPARK-38266: Assignee: wuyi > UnresolvedException: Invalid call to dataType on unresolved object caused by > GetDateFieldOperations > --- > > Key: SPARK-38266 > URL: https://issues.apache.org/jira/browse/SPARK-38266 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.2 >Reporter: wuyi >Assignee: wuyi >Priority: Major > > {code:java} > test("GetDateFieldOperations should skip unresolved nodes") { > withSQLConf(SQLConf.ANSI_ENABLED.key -> "true") { > val df = Seq("1644821603").map(i => (i.toInt, i)).toDF("tsInt", "tsStr") > val df1 = df.select(df("tsStr").cast("timestamp")).as("df1") > val df2 = df.select(df("tsStr").cast("timestamp")).as("df2") > df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer") > val df3 = df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer") > .select($"df1.tsStr".as("timeStr")).as("df3") > // This throws "UnresolvedException: Invalid call to > // dataType on unresolved object" instead of "AnalysisException: Column > 'df1.timeStr' does not exist." > df3.join(df1, year($"df1.timeStr") === year($"df3.tsStr")) > } > } {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38266) UnresolvedException: Invalid call to dataType on unresolved object caused by GetDateFieldOperations
[ https://issues.apache.org/jira/browse/SPARK-38266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495285#comment-17495285 ] wuyi commented on SPARK-38266: -- Issue resolved by https://github.com/apache/spark/pull/35568 > UnresolvedException: Invalid call to dataType on unresolved object caused by > GetDateFieldOperations > --- > > Key: SPARK-38266 > URL: https://issues.apache.org/jira/browse/SPARK-38266 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.2 >Reporter: wuyi >Priority: Major > > {code:java} > test("GetDateFieldOperations should skip unresolved nodes") { > withSQLConf(SQLConf.ANSI_ENABLED.key -> "true") { > val df = Seq("1644821603").map(i => (i.toInt, i)).toDF("tsInt", "tsStr") > val df1 = df.select(df("tsStr").cast("timestamp")).as("df1") > val df2 = df.select(df("tsStr").cast("timestamp")).as("df2") > df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer") > val df3 = df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer") > .select($"df1.tsStr".as("timeStr")).as("df3") > // This throws "UnresolvedException: Invalid call to > // dataType on unresolved object" instead of "AnalysisException: Column > 'df1.timeStr' does not exist." > df3.join(df1, year($"df1.timeStr") === year($"df3.tsStr")) > } > } {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38266) UnresolvedException: Invalid call to dataType on unresolved object caused by GetDateFieldOperations
wuyi created SPARK-38266: Summary: UnresolvedException: Invalid call to dataType on unresolved object caused by GetDateFieldOperations Key: SPARK-38266 URL: https://issues.apache.org/jira/browse/SPARK-38266 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2, 3.3.0 Reporter: wuyi {code:java} test("GetDateFieldOperations should skip unresolved nodes") { withSQLConf(SQLConf.ANSI_ENABLED.key -> "true") { val df = Seq("1644821603").map(i => (i.toInt, i)).toDF("tsInt", "tsStr") val df1 = df.select(df("tsStr").cast("timestamp")).as("df1") val df2 = df.select(df("tsStr").cast("timestamp")).as("df2") df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer") val df3 = df1.join(df2, $"df1.tsStr" === $"df2.tsStr", "left_outer") .select($"df1.tsStr".as("timeStr")).as("df3") // This throws "UnresolvedException: Invalid call to // dataType on unresolved object" instead of "AnalysisException: Column 'df1.timeStr' does not exist." df3.join(df1, year($"df1.timeStr") === year($"df3.tsStr")) } } {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38261) Sync missing R packages with CI
[ https://issues.apache.org/jira/browse/SPARK-38261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38261. -- Assignee: Khalid Mammadov Resolution: Fixed Fixed in https://github.com/apache/spark/pull/35583 > Sync missing R packages with CI > --- > > Key: SPARK-38261 > URL: https://issues.apache.org/jira/browse/SPARK-38261 > Project: Spark > Issue Type: Github Integration > Components: Build >Affects Versions: 3.2.1 >Reporter: Khalid Mammadov >Assignee: Khalid Mammadov >Priority: Minor > > Current GitHub workflow job *Linters, licenses, dependencies and > documentation generation* is missing R packages to complete Documentation and > API build. > *Build and test* - is not failing as these packages are installed in the > base image. > We need to keep them in-sync IMO with the base image for easy switch back to > ubuntu runner when ready. > These R packages are missing: *markdown* and *e1071* > Reference: > Base image - > https://hub.docker.com/layers/dongjoon/apache-spark-github-action-image/20220207/images/sha256-af09d172ff8e2cbd71df9a1bc5384a47578c4a4cc293786c539333cafaf4a7ce?context=explore -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38261) Sync missing R packages with CI
[ https://issues.apache.org/jira/browse/SPARK-38261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-38261: - Fix Version/s: 3.3.0 > Sync missing R packages with CI > --- > > Key: SPARK-38261 > URL: https://issues.apache.org/jira/browse/SPARK-38261 > Project: Spark > Issue Type: Github Integration > Components: Build >Affects Versions: 3.2.1 >Reporter: Khalid Mammadov >Assignee: Khalid Mammadov >Priority: Minor > Fix For: 3.3.0 > > > Current GitHub workflow job *Linters, licenses, dependencies and > documentation generation* is missing R packages to complete Documentation and > API build. > *Build and test* - is not failing as these packages are installed in the > base image. > We need to keep them in-sync IMO with the base image for easy switch back to > ubuntu runner when ready. > These R packages are missing: *markdown* and *e1071* > Reference: > Base image - > https://hub.docker.com/layers/dongjoon/apache-spark-github-action-image/20220207/images/sha256-af09d172ff8e2cbd71df9a1bc5384a47578c4a4cc293786c539333cafaf4a7ce?context=explore -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38265) Update comments of ExecutorAllocationClient
[ https://issues.apache.org/jira/browse/SPARK-38265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495271#comment-17495271 ] Apache Spark commented on SPARK-38265: -- User 'Shockang' has created a pull request for this issue: https://github.com/apache/spark/pull/35586 > Update comments of ExecutorAllocationClient > --- > > Key: SPARK-38265 > URL: https://issues.apache.org/jira/browse/SPARK-38265 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Shockang >Priority: Trivial > Fix For: 3.3.0 > > > The class comment of ExecutorAllocationClient is out of date. > {code:java} > This is currently supported only in YARN mode. {code} > Nowadays, this is supported in the following modes: Spark's Standalone, > YARN-Client, YARN-Cluster, Mesos, Kubernetes. > > In my opinion, this comment should be updated. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38265) Update comments of ExecutorAllocationClient
[ https://issues.apache.org/jira/browse/SPARK-38265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495270#comment-17495270 ] Apache Spark commented on SPARK-38265: -- User 'Shockang' has created a pull request for this issue: https://github.com/apache/spark/pull/35586 > Update comments of ExecutorAllocationClient > --- > > Key: SPARK-38265 > URL: https://issues.apache.org/jira/browse/SPARK-38265 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Shockang >Priority: Trivial > Fix For: 3.3.0 > > > The class comment of ExecutorAllocationClient is out of date. > {code:java} > This is currently supported only in YARN mode. {code} > Nowadays, this is supported in the following modes: Spark's Standalone, > YARN-Client, YARN-Cluster, Mesos, Kubernetes. > > In my opinion, this comment should be updated. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38265) Update comments of ExecutorAllocationClient
[ https://issues.apache.org/jira/browse/SPARK-38265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38265: Assignee: Apache Spark > Update comments of ExecutorAllocationClient > --- > > Key: SPARK-38265 > URL: https://issues.apache.org/jira/browse/SPARK-38265 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Shockang >Assignee: Apache Spark >Priority: Trivial > Fix For: 3.3.0 > > > The class comment of ExecutorAllocationClient is out of date. > {code:java} > This is currently supported only in YARN mode. {code} > Nowadays, this is supported in the following modes: Spark's Standalone, > YARN-Client, YARN-Cluster, Mesos, Kubernetes. > > In my opinion, this comment should be updated. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38265) Update comments of ExecutorAllocationClient
[ https://issues.apache.org/jira/browse/SPARK-38265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38265: Assignee: (was: Apache Spark) > Update comments of ExecutorAllocationClient > --- > > Key: SPARK-38265 > URL: https://issues.apache.org/jira/browse/SPARK-38265 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Shockang >Priority: Trivial > Fix For: 3.3.0 > > > The class comment of ExecutorAllocationClient is out of date. > {code:java} > This is currently supported only in YARN mode. {code} > Nowadays, this is supported in the following modes: Spark's Standalone, > YARN-Client, YARN-Cluster, Mesos, Kubernetes. > > In my opinion, this comment should be updated. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38265) Update comments of ExecutorAllocationClient
[ https://issues.apache.org/jira/browse/SPARK-38265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495269#comment-17495269 ] Shockang commented on SPARK-38265: -- Working on this. > Update comments of ExecutorAllocationClient > --- > > Key: SPARK-38265 > URL: https://issues.apache.org/jira/browse/SPARK-38265 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Shockang >Priority: Trivial > Fix For: 3.3.0 > > > The class comment of ExecutorAllocationClient is out of date. > {code:java} > This is currently supported only in YARN mode. {code} > Nowadays, this is supported in the following modes: Spark's Standalone, > YARN-Client, YARN-Cluster, Mesos, Kubernetes. > > In my opinion, this comment should be updated. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38265) Update comments of ExecutorAllocationClient
Shockang created SPARK-38265: Summary: Update comments of ExecutorAllocationClient Key: SPARK-38265 URL: https://issues.apache.org/jira/browse/SPARK-38265 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.1 Reporter: Shockang Fix For: 3.3.0 The class comment of ExecutorAllocationClient is out of date. {code:java} This is currently supported only in YARN mode. {code} Nowadays, this is supported in the following modes: Spark's Standalone, YARN-Client, YARN-Cluster, Mesos, Kubernetes. In my opinion, this comment should be updated. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38264) Add `DataFrame.resample` for pandas API on Spark.
Haejoon Lee created SPARK-38264: --- Summary: Add `DataFrame.resample` for pandas API on Spark. Key: SPARK-38264 URL: https://issues.apache.org/jira/browse/SPARK-38264 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.3.0 Reporter: Haejoon Lee Implement the function DataFrame.resample for pandas API on Spark to follow the behavior of pandas (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37426) Inline type hints for python/pyspark/mllib/regression.py
[ https://issues.apache.org/jira/browse/SPARK-37426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495257#comment-17495257 ] Apache Spark commented on SPARK-37426: -- User 'zero323' has created a pull request for this issue: https://github.com/apache/spark/pull/35585 > Inline type hints for python/pyspark/mllib/regression.py > > > Key: SPARK-37426 > URL: https://issues.apache.org/jira/browse/SPARK-37426 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/mlib/regression.pyi to > python/pyspark/mllib/regression.py -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37426) Inline type hints for python/pyspark/mllib/regression.py
[ https://issues.apache.org/jira/browse/SPARK-37426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37426: Assignee: Apache Spark > Inline type hints for python/pyspark/mllib/regression.py > > > Key: SPARK-37426 > URL: https://issues.apache.org/jira/browse/SPARK-37426 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Apache Spark >Priority: Major > > Inline type hints from python/pyspark/mlib/regression.pyi to > python/pyspark/mllib/regression.py -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37426) Inline type hints for python/pyspark/mllib/regression.py
[ https://issues.apache.org/jira/browse/SPARK-37426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495256#comment-17495256 ] Apache Spark commented on SPARK-37426: -- User 'zero323' has created a pull request for this issue: https://github.com/apache/spark/pull/35585 > Inline type hints for python/pyspark/mllib/regression.py > > > Key: SPARK-37426 > URL: https://issues.apache.org/jira/browse/SPARK-37426 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/mlib/regression.pyi to > python/pyspark/mllib/regression.py -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37426) Inline type hints for python/pyspark/mllib/regression.py
[ https://issues.apache.org/jira/browse/SPARK-37426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37426: Assignee: (was: Apache Spark) > Inline type hints for python/pyspark/mllib/regression.py > > > Key: SPARK-37426 > URL: https://issues.apache.org/jira/browse/SPARK-37426 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/mlib/regression.pyi to > python/pyspark/mllib/regression.py -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37400) Inline type hints for python/pyspark/mllib/classification.py
[ https://issues.apache.org/jira/browse/SPARK-37400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37400: Assignee: (was: Apache Spark) > Inline type hints for python/pyspark/mllib/classification.py > > > Key: SPARK-37400 > URL: https://issues.apache.org/jira/browse/SPARK-37400 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/mlib/classification.pyi to > python/pyspark/mllib/classification.py. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37400) Inline type hints for python/pyspark/mllib/classification.py
[ https://issues.apache.org/jira/browse/SPARK-37400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37400: Assignee: Apache Spark > Inline type hints for python/pyspark/mllib/classification.py > > > Key: SPARK-37400 > URL: https://issues.apache.org/jira/browse/SPARK-37400 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Apache Spark >Priority: Major > > Inline type hints from python/pyspark/mlib/classification.pyi to > python/pyspark/mllib/classification.py. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37400) Inline type hints for python/pyspark/mllib/classification.py
[ https://issues.apache.org/jira/browse/SPARK-37400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495255#comment-17495255 ] Apache Spark commented on SPARK-37400: -- User 'zero323' has created a pull request for this issue: https://github.com/apache/spark/pull/35585 > Inline type hints for python/pyspark/mllib/classification.py > > > Key: SPARK-37400 > URL: https://issues.apache.org/jira/browse/SPARK-37400 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/mlib/classification.pyi to > python/pyspark/mllib/classification.py. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities
[ https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-37090. -- Fix Version/s: 3.3.0 Resolution: Fixed Resolved by https://github.com/apache/spark/pull/34362 > Upgrade libthrift to resolve security vulnerabilities > - > > Key: SPARK-37090 > URL: https://issues.apache.org/jira/browse/SPARK-37090 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Juliusz Sompolski >Assignee: Yuming Wang >Priority: Major > Fix For: 3.3.0 > > > Currently, Spark uses libthrift 0.12, which has reported high severity > security vulnerabilities > https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift > Upgrade to 0.14 to get rid of vulnerabilities. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities
[ https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-37090: Assignee: Yuming Wang > Upgrade libthrift to resolve security vulnerabilities > - > > Key: SPARK-37090 > URL: https://issues.apache.org/jira/browse/SPARK-37090 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Juliusz Sompolski >Assignee: Yuming Wang >Priority: Major > > Currently, Spark uses libthrift 0.12, which has reported high severity > security vulnerabilities > https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift > Upgrade to 0.14 to get rid of vulnerabilities. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36994) Upgrade Apache Thrift
[ https://issues.apache.org/jira/browse/SPARK-36994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-36994. -- Resolution: Duplicate > Upgrade Apache Thrift > - > > Key: SPARK-36994 > URL: https://issues.apache.org/jira/browse/SPARK-36994 > Project: Spark > Issue Type: Bug > Components: Security >Affects Versions: 3.0.1 >Reporter: kaja girish >Priority: Major > > *Image:* > * spark:3.0.1 > *Components Affected:* > * Apache Thrift > *Recommendation:* > * upgrade Apache Thrift > *CVE:* > > |Component Name|Component Version Name|Vulnerability|Fixed version| > |Apache Thrift|0.11.0-4.|CVE-2019-0205|0.13.0| > |Apache Thrift|0.11.0-4.|CVE-2019-0210|0.13.0| > |Apache Thrift|0.11.0-4.|CVE-2020-13949|0.14.1| -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities
[ https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reopened SPARK-37090: -- I'm going to 'reverse' the direction of the Duplicate as the PRs ended up being vs this JIRA > Upgrade libthrift to resolve security vulnerabilities > - > > Key: SPARK-37090 > URL: https://issues.apache.org/jira/browse/SPARK-37090 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Juliusz Sompolski >Priority: Major > > Currently, Spark uses libthrift 0.12, which has reported high severity > security vulnerabilities > https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift > Upgrade to 0.14 to get rid of vulnerabilities. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38262) Upgrade Google guava to version 30.0-jre
[ https://issues.apache.org/jira/browse/SPARK-38262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495220#comment-17495220 ] Apache Spark commented on SPARK-38262: -- User 'bjornjorgensen' has created a pull request for this issue: https://github.com/apache/spark/pull/35584 > Upgrade Google guava to version 30.0-jre > > > Key: SPARK-38262 > URL: https://issues.apache.org/jira/browse/SPARK-38262 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Bjørn Jørgensen >Priority: Major > > Apache Spark is using com.google.guava:guava version 14.0.1 which has two > security issues. > [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] > [CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] > We should upgrade to [version > 30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38262) Upgrade Google guava to version 30.0-jre
[ https://issues.apache.org/jira/browse/SPARK-38262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38262: Assignee: (was: Apache Spark) > Upgrade Google guava to version 30.0-jre > > > Key: SPARK-38262 > URL: https://issues.apache.org/jira/browse/SPARK-38262 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Bjørn Jørgensen >Priority: Major > > Apache Spark is using com.google.guava:guava version 14.0.1 which has two > security issues. > [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] > [CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] > We should upgrade to [version > 30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38262) Upgrade Google guava to version 30.0-jre
[ https://issues.apache.org/jira/browse/SPARK-38262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38262: Assignee: Apache Spark > Upgrade Google guava to version 30.0-jre > > > Key: SPARK-38262 > URL: https://issues.apache.org/jira/browse/SPARK-38262 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Bjørn Jørgensen >Assignee: Apache Spark >Priority: Major > > Apache Spark is using com.google.guava:guava version 14.0.1 which has two > security issues. > [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] > [CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] > We should upgrade to [version > 30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38262) Upgrade Google guava to version 30.0-jre
[ https://issues.apache.org/jira/browse/SPARK-38262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495219#comment-17495219 ] Apache Spark commented on SPARK-38262: -- User 'bjornjorgensen' has created a pull request for this issue: https://github.com/apache/spark/pull/35584 > Upgrade Google guava to version 30.0-jre > > > Key: SPARK-38262 > URL: https://issues.apache.org/jira/browse/SPARK-38262 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Bjørn Jørgensen >Priority: Major > > Apache Spark is using com.google.guava:guava version 14.0.1 which has two > security issues. > [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] > [CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] > We should upgrade to [version > 30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38262) Upgrade Google guava to version 30.0-jre
[ https://issues.apache.org/jira/browse/SPARK-38262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bjørn Jørgensen updated SPARK-38262: Description: Apache Spark is using com.google.guava:guava version 14.0.1 which has two security issues. [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] [CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] We should upgrade to [version 30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] was: Apache Spark are using com.google.guava:guava version 14.0.1 which has two security issues. [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] [CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] We should upgrade to [version 30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] > Upgrade Google guava to version 30.0-jre > > > Key: SPARK-38262 > URL: https://issues.apache.org/jira/browse/SPARK-38262 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Bjørn Jørgensen >Priority: Major > > Apache Spark is using com.google.guava:guava version 14.0.1 which has two > security issues. > [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] > [CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] > We should upgrade to [version > 30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38263) StructType explode
Sayed Mohammad Hossein Torabi created SPARK-38263: - Summary: StructType explode Key: SPARK-38263 URL: https://issues.apache.org/jira/browse/SPARK-38263 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.1 Reporter: Sayed Mohammad Hossein Torabi Currently explode function only supports Array datatypes and Map datatypes but not StructType. Supporting StructType helps spark user's to transform datasets to a flatten one and this feature would be helpful with dealing semi-structured and unstructured datasets. the idea is to support StructType in the first place and also add `prefix` and `postfix` option to it -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38262) Upgrade Google guava to version 30.0-jre
[ https://issues.apache.org/jira/browse/SPARK-38262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bjørn Jørgensen updated SPARK-38262: Description: Apache Spark are using com.google.guava:guava version 14.0.1 which has two security issues. [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] [CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] We should upgrade to [version 30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] was: Apache Spark are using com.google.guava:guava version 14.0 which has two security issues. [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] [CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] We should upgrade to [version 30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] > Upgrade Google guava to version 30.0-jre > > > Key: SPARK-38262 > URL: https://issues.apache.org/jira/browse/SPARK-38262 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Bjørn Jørgensen >Priority: Major > > Apache Spark are using com.google.guava:guava version 14.0.1 which has two > security issues. > [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] > [CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] > We should upgrade to [version > 30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38262) Upgrade Google guava to version 30.0-jre
Bjørn Jørgensen created SPARK-38262: --- Summary: Upgrade Google guava to version 30.0-jre Key: SPARK-38262 URL: https://issues.apache.org/jira/browse/SPARK-38262 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.3.0 Reporter: Bjørn Jørgensen Apache Spark are using com.google.guava:guava version 14.0 which has two security issues. [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] [CVE-2020-8908|https://nvd.nist.gov/vuln/detail/CVE-2020-8908] We should upgrade to [version 30.0|https://mvnrepository.com/artifact/com.google.guava/guava/30.0-jre] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38261) Sync missing R packages with CI
[ https://issues.apache.org/jira/browse/SPARK-38261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38261: Assignee: (was: Apache Spark) > Sync missing R packages with CI > --- > > Key: SPARK-38261 > URL: https://issues.apache.org/jira/browse/SPARK-38261 > Project: Spark > Issue Type: Github Integration > Components: Build >Affects Versions: 3.2.1 >Reporter: Khalid Mammadov >Priority: Minor > > Current GitHub workflow job *Linters, licenses, dependencies and > documentation generation* is missing R packages to complete Documentation and > API build. > *Build and test* - is not failing as these packages are installed in the > base image. > We need to keep them in-sync IMO with the base image for easy switch back to > ubuntu runner when ready. > These R packages are missing: *markdown* and *e1071* > Reference: > Base image - > https://hub.docker.com/layers/dongjoon/apache-spark-github-action-image/20220207/images/sha256-af09d172ff8e2cbd71df9a1bc5384a47578c4a4cc293786c539333cafaf4a7ce?context=explore -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38261) Sync missing R packages with CI
[ https://issues.apache.org/jira/browse/SPARK-38261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495179#comment-17495179 ] Apache Spark commented on SPARK-38261: -- User 'khalidmammadov' has created a pull request for this issue: https://github.com/apache/spark/pull/35583 > Sync missing R packages with CI > --- > > Key: SPARK-38261 > URL: https://issues.apache.org/jira/browse/SPARK-38261 > Project: Spark > Issue Type: Github Integration > Components: Build >Affects Versions: 3.2.1 >Reporter: Khalid Mammadov >Priority: Minor > > Current GitHub workflow job *Linters, licenses, dependencies and > documentation generation* is missing R packages to complete Documentation and > API build. > *Build and test* - is not failing as these packages are installed in the > base image. > We need to keep them in-sync IMO with the base image for easy switch back to > ubuntu runner when ready. > These R packages are missing: *markdown* and *e1071* > Reference: > Base image - > https://hub.docker.com/layers/dongjoon/apache-spark-github-action-image/20220207/images/sha256-af09d172ff8e2cbd71df9a1bc5384a47578c4a4cc293786c539333cafaf4a7ce?context=explore -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38261) Sync missing R packages with CI
[ https://issues.apache.org/jira/browse/SPARK-38261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38261: Assignee: Apache Spark > Sync missing R packages with CI > --- > > Key: SPARK-38261 > URL: https://issues.apache.org/jira/browse/SPARK-38261 > Project: Spark > Issue Type: Github Integration > Components: Build >Affects Versions: 3.2.1 >Reporter: Khalid Mammadov >Assignee: Apache Spark >Priority: Minor > > Current GitHub workflow job *Linters, licenses, dependencies and > documentation generation* is missing R packages to complete Documentation and > API build. > *Build and test* - is not failing as these packages are installed in the > base image. > We need to keep them in-sync IMO with the base image for easy switch back to > ubuntu runner when ready. > These R packages are missing: *markdown* and *e1071* > Reference: > Base image - > https://hub.docker.com/layers/dongjoon/apache-spark-github-action-image/20220207/images/sha256-af09d172ff8e2cbd71df9a1bc5384a47578c4a4cc293786c539333cafaf4a7ce?context=explore -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38261) Sync missing R packages with CI
Khalid Mammadov created SPARK-38261: --- Summary: Sync missing R packages with CI Key: SPARK-38261 URL: https://issues.apache.org/jira/browse/SPARK-38261 Project: Spark Issue Type: Github Integration Components: Build Affects Versions: 3.2.1 Reporter: Khalid Mammadov Current GitHub workflow job *Linters, licenses, dependencies and documentation generation* is missing R packages to complete Documentation and API build. *Build and test* - is not failing as these packages are installed in the base image. We need to keep them in-sync IMO with the base image for easy switch back to ubuntu runner when ready. These R packages are missing: *markdown* and *e1071* Reference: Base image - https://hub.docker.com/layers/dongjoon/apache-spark-github-action-image/20220207/images/sha256-af09d172ff8e2cbd71df9a1bc5384a47578c4a4cc293786c539333cafaf4a7ce?context=explore -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37954) old columns should not be available after select or drop
[ https://issues.apache.org/jira/browse/SPARK-37954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495177#comment-17495177 ] Varun Shah commented on SPARK-37954: Hi [~andrewfmurphy], there are 2 reasons you dont see any error in the drop function: # As you rightly pointed out, the function will not throw any error in case the column does not exists. I think this is debatable, considering the fact that functions like select/filter throws error if column is missing. # The other factor is spark catalyst which optimizes your query/DAG by performing optimizations like predicate push down, which in the example you have mentioned tries pushing the filter downward and during this, recognizes the column rename and uses oldCol from the original dataframe/rdd # Run the following 2 examples to see how the Plans get created for different scenarios: {code:java} from pyspark.sql import SparkSession from pyspark.sql.functions import col as col spark = SparkSession.builder.appName('available_columns').getOrCreate() df = spark.createDataFrame([{"oldcol": 1}, {"oldcol": 2}]) df = df.withColumnRenamed('oldcol', 'newcol') df = df.filter(col("oldcol")!=2) df.count() df.explain("extended") {code} {noformat} == Parsed Logical Plan == 'Filter NOT ('oldcol = 2) +- Project [oldcol#168L AS newcol#170L] +- LogicalRDD [oldcol#168L], false == Analyzed Logical Plan == newcol: bigint Project [newcol#170L] +- Filter NOT (oldcol#168L = cast(2 as bigint)) +- Project [oldcol#168L AS newcol#170L, oldcol#168L] +- LogicalRDD [oldcol#168L], false == Optimized Logical Plan == Project [oldcol#168L AS newcol#170L] +- Filter (isnotnull(oldcol#168L) AND NOT (oldcol#168L = 2)) +- LogicalRDD [oldcol#168L], false == Physical Plan == *(1) Project [oldcol#168L AS newcol#170L] +- *(1) Filter (isnotnull(oldcol#168L) AND NOT (oldcol#168L = 2)) +- *(1) Scan ExistingRDD[oldcol#168L]{noformat} {code:java} # action -2 df2 = df.select(col("newcol")) df2 = df2.filter(col("oldcol")!=2) df2.count() df2.explain("extended"){code} {code:java} == Parsed Logical Plan == 'Filter NOT ('oldcol = 2) +- Project [newcol#170L] +- Project [newcol#170L] +- Filter NOT (oldcol#168L = cast(2 as bigint)) +- Project [oldcol#168L AS newcol#170L, oldcol#168L] +- LogicalRDD [oldcol#168L], false == Analyzed Logical Plan == newcol: bigint Project [newcol#170L] +- Filter NOT (oldcol#168L = cast(2 as bigint)) +- Project [newcol#170L, oldcol#168L] +- Project [newcol#170L, oldcol#168L] +- Filter NOT (oldcol#168L = cast(2 as bigint)) +- Project [oldcol#168L AS newcol#170L, oldcol#168L] +- LogicalRDD [oldcol#168L], false == Optimized Logical Plan == Project [oldcol#168L AS newcol#170L] +- Filter (isnotnull(oldcol#168L) AND NOT (oldcol#168L = 2)) +- LogicalRDD [oldcol#168L], false == Physical Plan == *(1) Project [oldcol#168L AS newcol#170L] +- *(1) Filter (isnotnull(oldcol#168L) AND NOT (oldcol#168L = 2)) +- *(1) Scan ExistingRDD[oldcol#168L] {code} > old columns should not be available after select or drop > > > Key: SPARK-37954 > URL: https://issues.apache.org/jira/browse/SPARK-37954 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.0.1 >Reporter: Jean Bon >Priority: Major > > > {code:java} > from pyspark.sql import SparkSession > from pyspark.sql.functions import col as col > spark = SparkSession.builder.appName('available_columns').getOrCreate() > df = spark.range(5).select((col("id")+10).alias("id2")) > assert df.columns==["id2"] #OK > try: > df.select("id") > error_raise = False > except: > error_raise = True > assert error_raise #OK > df = df.drop("id") #should raise an error > df.filter(col("id")!=2).count() #returns 4, should raise an error > {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38260) Remove dependence on commons-net
[ https://issues.apache.org/jira/browse/SPARK-38260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495156#comment-17495156 ] Apache Spark commented on SPARK-38260: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/35582 > Remove dependence on commons-net > > > Key: SPARK-38260 > URL: https://issues.apache.org/jira/browse/SPARK-38260 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0 > Environment: Spark doesn't rely on commons-net directly > >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38260) Remove dependence on commons-net
[ https://issues.apache.org/jira/browse/SPARK-38260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38260: Assignee: Apache Spark > Remove dependence on commons-net > > > Key: SPARK-38260 > URL: https://issues.apache.org/jira/browse/SPARK-38260 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0 > Environment: Spark doesn't rely on commons-net directly > >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38260) Remove dependence on commons-net
[ https://issues.apache.org/jira/browse/SPARK-38260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38260: Assignee: (was: Apache Spark) > Remove dependence on commons-net > > > Key: SPARK-38260 > URL: https://issues.apache.org/jira/browse/SPARK-38260 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0 > Environment: Spark doesn't rely on commons-net directly > >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38260) Remove dependence on commons-net
[ https://issues.apache.org/jira/browse/SPARK-38260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495155#comment-17495155 ] Apache Spark commented on SPARK-38260: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/35582 > Remove dependence on commons-net > > > Key: SPARK-38260 > URL: https://issues.apache.org/jira/browse/SPARK-38260 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0 > Environment: Spark doesn't rely on commons-net directly > >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38260) Remove dependence on commons-net
Yang Jie created SPARK-38260: Summary: Remove dependence on commons-net Key: SPARK-38260 URL: https://issues.apache.org/jira/browse/SPARK-38260 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.3.0 Environment: Spark doesn't rely on commons-net directly Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38259) Upgrade netty to 4.1.74
[ https://issues.apache.org/jira/browse/SPARK-38259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495149#comment-17495149 ] Apache Spark commented on SPARK-38259: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/35581 > Upgrade netty to 4.1.74 > --- > > Key: SPARK-38259 > URL: https://issues.apache.org/jira/browse/SPARK-38259 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > https://netty.io/news/2022/02/08/4-1-74-Final.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38259) Upgrade netty to 4.1.74
[ https://issues.apache.org/jira/browse/SPARK-38259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38259: Assignee: Apache Spark > Upgrade netty to 4.1.74 > --- > > Key: SPARK-38259 > URL: https://issues.apache.org/jira/browse/SPARK-38259 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > https://netty.io/news/2022/02/08/4-1-74-Final.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38259) Upgrade netty to 4.1.74
[ https://issues.apache.org/jira/browse/SPARK-38259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38259: Assignee: (was: Apache Spark) > Upgrade netty to 4.1.74 > --- > > Key: SPARK-38259 > URL: https://issues.apache.org/jira/browse/SPARK-38259 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > https://netty.io/news/2022/02/08/4-1-74-Final.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38259) Upgrade netty to 4.1.74
Yang Jie created SPARK-38259: Summary: Upgrade netty to 4.1.74 Key: SPARK-38259 URL: https://issues.apache.org/jira/browse/SPARK-38259 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.3.0 Reporter: Yang Jie https://netty.io/news/2022/02/08/4-1-74-Final.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running
[ https://issues.apache.org/jira/browse/SPARK-38258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gabrywu updated SPARK-38258: Affects Version/s: 2.4.0 > [proposal] collect & update statistics automatically when spark SQL is running > -- > > Key: SPARK-38258 > URL: https://issues.apache.org/jira/browse/SPARK-38258 > Project: Spark > Issue Type: Wish > Components: Spark Core, SQL >Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0 >Reporter: gabrywu >Priority: Minor > > As we all know, table & column statistics are very important to spark SQL > optimizer, however we have to collect & update them using > {code:java} > analyze table tableName compute statistics{code} > > It's a little inconvenient, so why can't we collect & update statistics when > a spark stage runs and finishes? > For example, when a insert overwrite table statement finishes, we can update > a corresponding table statistics using SQL metric. And in following queries, > spark sql optimizer can use these statistics. > So what do you think of it?[~yumwang] , it it reasonable? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running
[ https://issues.apache.org/jira/browse/SPARK-38258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gabrywu updated SPARK-38258: Description: As we all know, table & column statistics are very important to spark SQL optimizer, however we have to collect & update them using {code:java} analyze table tableName compute statistics{code} It's a little inconvenient, so why can't we collect & update statistics when a spark stage runs and finishes? For example, when a insert overwrite table statement finishes, we can update a corresponding table statistics using SQL metric. And in following queries, spark sql optimizer can use these statistics. So what do you think of it?[~yumwang] , it it reasonable? was: As we all know, table & column statistics are very important to spark SQL optimizer, however we have to collect & update them using {code:java} analyze table tableName compute statistics{code} It's a little inconvenient, so why can't we collect & update statistics when a spark stage runs and finishes? For example, when a insert overwrite table statement finishes, we can update a corresponding table statistics using SQL metric. And in next queries, spark sql optimizer can use these statistics. So what do you think of it?[~yumwang] > [proposal] collect & update statistics automatically when spark SQL is running > -- > > Key: SPARK-38258 > URL: https://issues.apache.org/jira/browse/SPARK-38258 > Project: Spark > Issue Type: Wish > Components: Spark Core, SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0 >Reporter: gabrywu >Priority: Minor > > As we all know, table & column statistics are very important to spark SQL > optimizer, however we have to collect & update them using > {code:java} > analyze table tableName compute statistics{code} > > It's a little inconvenient, so why can't we collect & update statistics when > a spark stage runs and finishes? > For example, when a insert overwrite table statement finishes, we can update > a corresponding table statistics using SQL metric. And in following queries, > spark sql optimizer can use these statistics. > So what do you think of it?[~yumwang] , it it reasonable? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running
gabrywu created SPARK-38258: --- Summary: [proposal] collect & update statistics automatically when spark SQL is running Key: SPARK-38258 URL: https://issues.apache.org/jira/browse/SPARK-38258 Project: Spark Issue Type: Wish Components: Spark Core, SQL Affects Versions: 3.2.0, 3.1.0, 3.0.0 Reporter: gabrywu As we all know, table & column statistics are very important to spark SQL optimizer, however we have to collect & update them using {code:java} analyze table tableName compute statistics{code} It's a little inconvenient, so why can't we collect & update statistics when a spark stage runs and finishes? For example, when a insert overwrite table statement finishes, we can update a corresponding table statistics using SQL metric. And in next queries, spark sql optimizer can use these statistics. So what do you think of it?[~yumwang] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org