[jira] [Updated] (SPARK-37528) Schedule Tasks By Input Size
[ https://issues.apache.org/jira/browse/SPARK-37528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You updated SPARK-37528: -- Affects Version/s: 3.4.0 (was: 3.3.0) > Schedule Tasks By Input Size > > > Key: SPARK-37528 > URL: https://issues.apache.org/jira/browse/SPARK-37528 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > > In general, the larger input data size means longer running time. So ideally, > we can let DAGScheduler submit bigger input size task first. It can reduce > the whole stage running time. For example, we have one stage with 4 tasks and > the defaultParallelism is 2 and the 4 tasks have different running time [1s, > 3s, 2s, 4s]. > - in normal, the running time of the stage is: 7s > - if big task first, the running time of the stage is: 5s -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38757) Update the Oracle docker image
Luca Canali created SPARK-38757: --- Summary: Update the Oracle docker image Key: SPARK-38757 URL: https://issues.apache.org/jira/browse/SPARK-38757 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.2.1 Reporter: Luca Canali This proposes to update the Docker image used for integration tests and builds from Oracle XE version 18.4.0 to Oracle XE version 21.3.0. Currently Oracle XE version 18.4.0 is being used. Oracle 18c support has ended in 2021. Oracle 21c is the latest release of the Oracle RDBMS. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38757) Update the Oracle docker image version used for test and integration
[ https://issues.apache.org/jira/browse/SPARK-38757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Canali updated SPARK-38757: Summary: Update the Oracle docker image version used for test and integration (was: Update the Oracle docker image) > Update the Oracle docker image version used for test and integration > > > Key: SPARK-38757 > URL: https://issues.apache.org/jira/browse/SPARK-38757 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.1 >Reporter: Luca Canali >Priority: Minor > > This proposes to update the Docker image used for integration tests and > builds from Oracle XE version 18.4.0 to Oracle XE version 21.3.0. > Currently Oracle XE version 18.4.0 is being used. Oracle 18c support has > ended in 2021. Oracle 21c is the latest release of the Oracle RDBMS. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38757) Update the Oracle docker image version used for test and integration
[ https://issues.apache.org/jira/browse/SPARK-38757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515807#comment-17515807 ] Apache Spark commented on SPARK-38757: -- User 'LucaCanali' has created a pull request for this issue: https://github.com/apache/spark/pull/36036 > Update the Oracle docker image version used for test and integration > > > Key: SPARK-38757 > URL: https://issues.apache.org/jira/browse/SPARK-38757 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.1 >Reporter: Luca Canali >Priority: Minor > > This proposes to update the Docker image used for integration tests and > builds from Oracle XE version 18.4.0 to Oracle XE version 21.3.0. > Currently Oracle XE version 18.4.0 is being used. Oracle 18c support has > ended in 2021. Oracle 21c is the latest release of the Oracle RDBMS. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38757) Update the Oracle docker image version used for test and integration
[ https://issues.apache.org/jira/browse/SPARK-38757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38757: Assignee: (was: Apache Spark) > Update the Oracle docker image version used for test and integration > > > Key: SPARK-38757 > URL: https://issues.apache.org/jira/browse/SPARK-38757 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.1 >Reporter: Luca Canali >Priority: Minor > > This proposes to update the Docker image used for integration tests and > builds from Oracle XE version 18.4.0 to Oracle XE version 21.3.0. > Currently Oracle XE version 18.4.0 is being used. Oracle 18c support has > ended in 2021. Oracle 21c is the latest release of the Oracle RDBMS. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38757) Update the Oracle docker image version used for test and integration
[ https://issues.apache.org/jira/browse/SPARK-38757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38757: Assignee: Apache Spark > Update the Oracle docker image version used for test and integration > > > Key: SPARK-38757 > URL: https://issues.apache.org/jira/browse/SPARK-38757 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.1 >Reporter: Luca Canali >Assignee: Apache Spark >Priority: Minor > > This proposes to update the Docker image used for integration tests and > builds from Oracle XE version 18.4.0 to Oracle XE version 21.3.0. > Currently Oracle XE version 18.4.0 is being used. Oracle 18c support has > ended in 2021. Oracle 21c is the latest release of the Oracle RDBMS. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38758) Web UI add heap dump
Jinpeng Chi created SPARK-38758: --- Summary: Web UI add heap dump Key: SPARK-38758 URL: https://issues.apache.org/jira/browse/SPARK-38758 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.2.1, 3.1.2 Reporter: Jinpeng Chi The current Web UI can dump threads, so I want to add memory dump -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38358) Add migration guide for spark.sql.hive.convertMetastoreInsertDir and spark.sql.hive.convertMetastoreCtas
[ https://issues.apache.org/jira/browse/SPARK-38358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515836#comment-17515836 ] Apache Spark commented on SPARK-38358: -- User 'cutiechi' has created a pull request for this issue: https://github.com/apache/spark/pull/36037 > Add migration guide for spark.sql.hive.convertMetastoreInsertDir and > spark.sql.hive.convertMetastoreCtas > > > Key: SPARK-38358 > URL: https://issues.apache.org/jira/browse/SPARK-38358 > Project: Spark > Issue Type: Task > Components: Documentation, SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.1 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > Fix For: 3.3.0 > > > After we migration to spark3, many job throw exception since in data source > API, we can't support overwrite into partition table while reading from this > table. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38759) Add StreamingQueryListener support in PySpark
Hyukjin Kwon created SPARK-38759: Summary: Add StreamingQueryListener support in PySpark Key: SPARK-38759 URL: https://issues.apache.org/jira/browse/SPARK-38759 Project: Spark Issue Type: Improvement Components: PySpark, Structured Streaming Affects Versions: 3.4.0 Reporter: Hyukjin Kwon PySpark currently does not have the support of {{StreamingQueryListener}} in PySpark whereas DStream has it. This feature is important especially with {{Dataset.observe}} so users can monitor what's going on in their queries. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38760) Implement DataFrame.observe in PySpark
Hyukjin Kwon created SPARK-38760: Summary: Implement DataFrame.observe in PySpark Key: SPARK-38760 URL: https://issues.apache.org/jira/browse/SPARK-38760 Project: Spark Issue Type: Improvement Components: PySpark, Structured Streaming Affects Versions: 3.4.0 Reporter: Hyukjin Kwon This JIRA is blocked by SPARK-38759. We should better have the support of DataFrame.observe for PySpark Structured Streaming users so they can mintor their queries. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38684) Stream-stream outer join has a possible correctness issue due to weakly read consistent on outer iterators
[ https://issues.apache.org/jira/browse/SPARK-38684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-38684. -- Fix Version/s: 3.3.0 3.2.2 Resolution: Fixed Issue resolved by pull request 36002 [https://github.com/apache/spark/pull/36002] > Stream-stream outer join has a possible correctness issue due to weakly read > consistent on outer iterators > -- > > Key: SPARK-38684 > URL: https://issues.apache.org/jira/browse/SPARK-38684 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.1, 3.3.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Blocker > Labels: correctness > Fix For: 3.3.0, 3.2.2 > > > We figured out stream-stream join has the same issue with SPARK-38320 on the > appended iterators. Since the root cause is same as SPARK-38320, this is only > reproducible with RocksDB state store provider, but even with HDFS backed > state store provider, it is not guaranteed by interface contract hence may > depend on the JVM vendor, version, etc. > I can easily construct the scenario of “data loss” in state store. > Condition follows: > * Use stream-stream time interval outer join > ** left outer join has an issue on left side, right outer join has an issue > on right side, full outer join has an issue on both sides > * At batch N, produce row(s) on the problematic side which are non-late > * At the same batch (batch N), some row(s) on the problematic side should be > evicted by watermark condition > When the condition is fulfilled, out of sync happens with keyToNumValues > between state and the iterator in evict phase. If eviction of the row happens > for the grouping key (updating keyToNumValues), the eviction phase > “overwrites” keyToNumValues in the state as the value it calculates. > Given that the eviction phase “do not know” about the new rows > (keyToNumValues is out of sync), effectively discarding all rows from the > state being added in the batch N. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38684) Stream-stream outer join has a possible correctness issue due to weakly read consistent on outer iterators
[ https://issues.apache.org/jira/browse/SPARK-38684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-38684: Assignee: Jungtaek Lim > Stream-stream outer join has a possible correctness issue due to weakly read > consistent on outer iterators > -- > > Key: SPARK-38684 > URL: https://issues.apache.org/jira/browse/SPARK-38684 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.1, 3.3.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Blocker > Labels: correctness > > We figured out stream-stream join has the same issue with SPARK-38320 on the > appended iterators. Since the root cause is same as SPARK-38320, this is only > reproducible with RocksDB state store provider, but even with HDFS backed > state store provider, it is not guaranteed by interface contract hence may > depend on the JVM vendor, version, etc. > I can easily construct the scenario of “data loss” in state store. > Condition follows: > * Use stream-stream time interval outer join > ** left outer join has an issue on left side, right outer join has an issue > on right side, full outer join has an issue on both sides > * At batch N, produce row(s) on the problematic side which are non-late > * At the same batch (batch N), some row(s) on the problematic side should be > evicted by watermark condition > When the condition is fulfilled, out of sync happens with keyToNumValues > between state and the iterator in evict phase. If eviction of the row happens > for the grouping key (updating keyToNumValues), the eviction phase > “overwrites” keyToNumValues in the state as the value it calculates. > Given that the eviction phase “do not know” about the new rows > (keyToNumValues is out of sync), effectively discarding all rows from the > state being added in the batch N. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38761) DS V2 supports push down misc non-aggregate functions
jiaan.geng created SPARK-38761: -- Summary: DS V2 supports push down misc non-aggregate functions Key: SPARK-38761 URL: https://issues.apache.org/jira/browse/SPARK-38761 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.4.0 Reporter: jiaan.geng Currently, Spark have a lot misc non-aggregate functions of ANSI standard. abs, coalesce, nullif, when DS V2 should supports push down these misc non-aggregate functions -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38759) Add StreamingQueryListener support in PySpark
[ https://issues.apache.org/jira/browse/SPARK-38759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515862#comment-17515862 ] Apache Spark commented on SPARK-38759: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/36038 > Add StreamingQueryListener support in PySpark > - > > Key: SPARK-38759 > URL: https://issues.apache.org/jira/browse/SPARK-38759 > Project: Spark > Issue Type: Improvement > Components: PySpark, Structured Streaming >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > PySpark currently does not have the support of {{StreamingQueryListener}} in > PySpark whereas DStream has it. This feature is important especially with > {{Dataset.observe}} so users can monitor what's going on in their queries. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38761) DS V2 supports push down misc non-aggregate functions
[ https://issues.apache.org/jira/browse/SPARK-38761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515861#comment-17515861 ] Apache Spark commented on SPARK-38761: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/36039 > DS V2 supports push down misc non-aggregate functions > - > > Key: SPARK-38761 > URL: https://issues.apache.org/jira/browse/SPARK-38761 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Spark have a lot misc non-aggregate functions of ANSI standard. > abs, > coalesce, > nullif, > when > DS V2 should supports push down these misc non-aggregate functions -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38759) Add StreamingQueryListener support in PySpark
[ https://issues.apache.org/jira/browse/SPARK-38759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38759: Assignee: Apache Spark > Add StreamingQueryListener support in PySpark > - > > Key: SPARK-38759 > URL: https://issues.apache.org/jira/browse/SPARK-38759 > Project: Spark > Issue Type: Improvement > Components: PySpark, Structured Streaming >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > PySpark currently does not have the support of {{StreamingQueryListener}} in > PySpark whereas DStream has it. This feature is important especially with > {{Dataset.observe}} so users can monitor what's going on in their queries. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38761) DS V2 supports push down misc non-aggregate functions
[ https://issues.apache.org/jira/browse/SPARK-38761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38761: Assignee: (was: Apache Spark) > DS V2 supports push down misc non-aggregate functions > - > > Key: SPARK-38761 > URL: https://issues.apache.org/jira/browse/SPARK-38761 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Spark have a lot misc non-aggregate functions of ANSI standard. > abs, > coalesce, > nullif, > when > DS V2 should supports push down these misc non-aggregate functions -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38759) Add StreamingQueryListener support in PySpark
[ https://issues.apache.org/jira/browse/SPARK-38759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38759: Assignee: (was: Apache Spark) > Add StreamingQueryListener support in PySpark > - > > Key: SPARK-38759 > URL: https://issues.apache.org/jira/browse/SPARK-38759 > Project: Spark > Issue Type: Improvement > Components: PySpark, Structured Streaming >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > PySpark currently does not have the support of {{StreamingQueryListener}} in > PySpark whereas DStream has it. This feature is important especially with > {{Dataset.observe}} so users can monitor what's going on in their queries. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38761) DS V2 supports push down misc non-aggregate functions
[ https://issues.apache.org/jira/browse/SPARK-38761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38761: Assignee: Apache Spark > DS V2 supports push down misc non-aggregate functions > - > > Key: SPARK-38761 > URL: https://issues.apache.org/jira/browse/SPARK-38761 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > Currently, Spark have a lot misc non-aggregate functions of ANSI standard. > abs, > coalesce, > nullif, > when > DS V2 should supports push down these misc non-aggregate functions -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38758) Web UI add heap dump
[ https://issues.apache.org/jira/browse/SPARK-38758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38758: Assignee: (was: Apache Spark) > Web UI add heap dump > - > > Key: SPARK-38758 > URL: https://issues.apache.org/jira/browse/SPARK-38758 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.1.2, 3.2.1 >Reporter: Jinpeng Chi >Priority: Major > > The current Web UI can dump threads, so I want to add memory dump -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38758) Web UI add heap dump
[ https://issues.apache.org/jira/browse/SPARK-38758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38758: Assignee: Apache Spark > Web UI add heap dump > - > > Key: SPARK-38758 > URL: https://issues.apache.org/jira/browse/SPARK-38758 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.1.2, 3.2.1 >Reporter: Jinpeng Chi >Assignee: Apache Spark >Priority: Major > > The current Web UI can dump threads, so I want to add memory dump -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38758) Web UI add heap dump
[ https://issues.apache.org/jira/browse/SPARK-38758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515865#comment-17515865 ] Apache Spark commented on SPARK-38758: -- User 'cutiechi' has created a pull request for this issue: https://github.com/apache/spark/pull/36037 > Web UI add heap dump > - > > Key: SPARK-38758 > URL: https://issues.apache.org/jira/browse/SPARK-38758 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.1.2, 3.2.1 >Reporter: Jinpeng Chi >Priority: Major > > The current Web UI can dump threads, so I want to add memory dump -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38762) Provide query context in Decimal overflow errors
Gengliang Wang created SPARK-38762: -- Summary: Provide query context in Decimal overflow errors Key: SPARK-38762 URL: https://issues.apache.org/jira/browse/SPARK-38762 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Gengliang Wang Assignee: Gengliang Wang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38762) Provide query context in Decimal overflow errors
[ https://issues.apache.org/jira/browse/SPARK-38762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38762: Assignee: Apache Spark (was: Gengliang Wang) > Provide query context in Decimal overflow errors > > > Key: SPARK-38762 > URL: https://issues.apache.org/jira/browse/SPARK-38762 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38762) Provide query context in Decimal overflow errors
[ https://issues.apache.org/jira/browse/SPARK-38762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515890#comment-17515890 ] Apache Spark commented on SPARK-38762: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/36040 > Provide query context in Decimal overflow errors > > > Key: SPARK-38762 > URL: https://issues.apache.org/jira/browse/SPARK-38762 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38762) Provide query context in Decimal overflow errors
[ https://issues.apache.org/jira/browse/SPARK-38762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515889#comment-17515889 ] Apache Spark commented on SPARK-38762: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/36040 > Provide query context in Decimal overflow errors > > > Key: SPARK-38762 > URL: https://issues.apache.org/jira/browse/SPARK-38762 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38762) Provide query context in Decimal overflow errors
[ https://issues.apache.org/jira/browse/SPARK-38762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38762: Assignee: Gengliang Wang (was: Apache Spark) > Provide query context in Decimal overflow errors > > > Key: SPARK-38762 > URL: https://issues.apache.org/jira/browse/SPARK-38762 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38703) High GC and memory footprint after switch to ZSTD
[ https://issues.apache.org/jira/browse/SPARK-38703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515907#comment-17515907 ] Cheng Pan commented on SPARK-38703: --- SPARK-34390 may helps, our benchmark of 1T TPC-DS shows the benefits. (compression using zstd in shuffle, not parquet) {code:bash} +---+---+---+-+ |lz4| sum(task_cpu_time_s) | sum(task_run_time_s) | sum(gc_time_s) | +---+---+---+-+ | lz4 | 1871242.5 | 3861923.8 | 197151.5 | | zstd | 1989641.6 | 3326399.8 | 244333.2 | | zstd_buffer_pool | 1912032.0 | 3342339.4 | 187262.3 | +---+---+---+-+ {code} > High GC and memory footprint after switch to ZSTD > - > > Key: SPARK-38703 > URL: https://issues.apache.org/jira/browse/SPARK-38703 > Project: Spark > Issue Type: Question > Components: Input/Output >Affects Versions: 3.1.2 >Reporter: Michael Taranov >Priority: Major > > Hi All, > We started to switch our Spark pipelines to read parquet with ZSTD > compression. > After the switch we see that memory footprint is much larger than previously > with SNAPPY. > Additionally GC stats of the jobs are much higher comparing to SNAPPY with > the same workload as previously. > Is there any configurations that may be relevant to read path, that may help > in such cases ? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-38703) High GC and memory footprint after switch to ZSTD
[ https://issues.apache.org/jira/browse/SPARK-38703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515907#comment-17515907 ] Cheng Pan edited comment on SPARK-38703 at 4/1/22 12:55 PM: SPARK-34390 may help, our benchmark of 1T TPC-DS shows the benefits. (compression using zstd in shuffle, not parquet) {code:bash} +---+---+---+-+ |lz4| sum(task_cpu_time_s) | sum(task_run_time_s) | sum(gc_time_s) | +---+---+---+-+ | lz4 | 1871242.5 | 3861923.8 | 197151.5 | | zstd | 1989641.6 | 3326399.8 | 244333.2 | | zstd_buffer_pool | 1912032.0 | 3342339.4 | 187262.3 | +---+---+---+-+ {code} was (Author: pan3793): SPARK-34390 may helps, our benchmark of 1T TPC-DS shows the benefits. (compression using zstd in shuffle, not parquet) {code:bash} +---+---+---+-+ |lz4| sum(task_cpu_time_s) | sum(task_run_time_s) | sum(gc_time_s) | +---+---+---+-+ | lz4 | 1871242.5 | 3861923.8 | 197151.5 | | zstd | 1989641.6 | 3326399.8 | 244333.2 | | zstd_buffer_pool | 1912032.0 | 3342339.4 | 187262.3 | +---+---+---+-+ {code} > High GC and memory footprint after switch to ZSTD > - > > Key: SPARK-38703 > URL: https://issues.apache.org/jira/browse/SPARK-38703 > Project: Spark > Issue Type: Question > Components: Input/Output >Affects Versions: 3.1.2 >Reporter: Michael Taranov >Priority: Major > > Hi All, > We started to switch our Spark pipelines to read parquet with ZSTD > compression. > After the switch we see that memory footprint is much larger than previously > with SNAPPY. > Additionally GC stats of the jobs are much higher comparing to SNAPPY with > the same workload as previously. > Is there any configurations that may be relevant to read path, that may help > in such cases ? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-38703) High GC and memory footprint after switch to ZSTD
[ https://issues.apache.org/jira/browse/SPARK-38703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515907#comment-17515907 ] Cheng Pan edited comment on SPARK-38703 at 4/1/22 12:56 PM: SPARK-34390 may help, our benchmark of 1T TPC-DS shows the benefits. (compression using zstd in shuffle, not parquet) {code:bash} +---+---+---+-+ |compression| sum(task_cpu_time_s) | sum(task_run_time_s) | sum(gc_time_s) | +---+---+---+-+ | lz4 | 1871242.5 | 3861923.8 | 197151.5 | | zstd | 1989641.6 | 3326399.8 | 244333.2 | | zstd_buffer_pool | 1912032.0 | 3342339.4 | 187262.3 | +---+---+---+-+ {code} was (Author: pan3793): SPARK-34390 may help, our benchmark of 1T TPC-DS shows the benefits. (compression using zstd in shuffle, not parquet) {code:bash} +---+---+---+-+ |lz4| sum(task_cpu_time_s) | sum(task_run_time_s) | sum(gc_time_s) | +---+---+---+-+ | lz4 | 1871242.5 | 3861923.8 | 197151.5 | | zstd | 1989641.6 | 3326399.8 | 244333.2 | | zstd_buffer_pool | 1912032.0 | 3342339.4 | 187262.3 | +---+---+---+-+ {code} > High GC and memory footprint after switch to ZSTD > - > > Key: SPARK-38703 > URL: https://issues.apache.org/jira/browse/SPARK-38703 > Project: Spark > Issue Type: Question > Components: Input/Output >Affects Versions: 3.1.2 >Reporter: Michael Taranov >Priority: Major > > Hi All, > We started to switch our Spark pipelines to read parquet with ZSTD > compression. > After the switch we see that memory footprint is much larger than previously > with SNAPPY. > Additionally GC stats of the jobs are much higher comparing to SNAPPY with > the same workload as previously. > Is there any configurations that may be relevant to read path, that may help > in such cases ? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38614) After Spark update, df.show() shows incorrect F.percent_rank results
[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZygD updated SPARK-38614: - Component/s: SQL > After Spark update, df.show() shows incorrect F.percent_rank results > > > Key: SPARK-38614 > URL: https://issues.apache.org/jira/browse/SPARK-38614 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.2.0, 3.2.1 >Reporter: ZygD >Priority: Major > Labels: correctness > > Expected result is obtained using Spark 3.1.2, but not 3.2.0 or 3.2.1 > *Minimal reproducible example* > {code:java} > from pyspark.sql import SparkSession, functions as F, Window as W > spark = SparkSession.builder.getOrCreate() > > df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) > df.show(3) > df.show(5) {code} > *Expected result* > {code:java} > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > +---++ > only showing top 3 rows > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > | 3|0.03| > | 4|0.04| > +---++ > only showing top 5 rows{code} > *Actual result* > {code:java} > +---+--+ > | id|pr| > +---+--+ > | 0| 0.0| > | 1|0.| > | 2|0.| > +---+--+ > only showing top 3 rows > +---+---+ > | id| pr| > +---+---+ > | 0|0.0| > | 1|0.2| > | 2|0.4| > | 3|0.6| > | 4|0.8| > +---+---+ > only showing top 5 rows{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38763) Pandas API on spark Can`t apply lamda to columns.
Bjørn Jørgensen created SPARK-38763: --- Summary: Pandas API on spark Can`t apply lamda to columns. Key: SPARK-38763 URL: https://issues.apache.org/jira/browse/SPARK-38763 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.3.0, 3.4.0 Reporter: Bjørn Jørgensen When I use a spark master build from 08 November 21 I can use this code to rename columns {code:java} pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x)) pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) {code} But now after I get this error when I use this code. --- ValueErrorTraceback (most recent call last) Input In [5], in () > 1 pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x)) 2 pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) 3 pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) File /opt/spark/python/pyspark/pandas/frame.py:10636, in DataFrame.rename(self, mapper, index, columns, axis, inplace, level, errors) 10632 index_mapper_fn, index_mapper_ret_dtype, index_mapper_ret_stype = gen_mapper_fn( 10633 index 10634 ) 10635 if columns: > 10636 columns_mapper_fn, _, _ = gen_mapper_fn(columns) 10638 if not index and not columns: 10639 raise ValueError("Either `index` or `columns` should be provided.") File /opt/spark/python/pyspark/pandas/frame.py:10603, in DataFrame.rename..gen_mapper_fn(mapper) 10601 elif callable(mapper): 10602 mapper_callable = cast(Callable, mapper) > 10603 return_type = cast(ScalarType, infer_return_type(mapper)) 10604 dtype = return_type.dtype 10605 spark_return_type = return_type.spark_type File /opt/spark/python/pyspark/pandas/typedef/typehints.py:563, in infer_return_type(f) 560 tpe = get_type_hints(f).get("return", None) 562 if tpe is None: --> 563 raise ValueError("A return value is required for the input function") 565 if hasattr(tpe, "__origin__") and issubclass(tpe.__origin__, SeriesType): 566 tpe = tpe.__args__[0] ValueError: A return value is required for the input function -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38763) Pandas API on spark Can`t apply lamda to columns.
[ https://issues.apache.org/jira/browse/SPARK-38763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515991#comment-17515991 ] Bjørn Jørgensen commented on SPARK-38763: - [~XinrongM] > Pandas API on spark Can`t apply lamda to columns. > --- > > Key: SPARK-38763 > URL: https://issues.apache.org/jira/browse/SPARK-38763 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > > When I use a spark master build from 08 November 21 I can use this code to > rename columns > {code:java} > pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > {code} > But now after I get this error when I use this code. > --- > ValueErrorTraceback (most recent call last) > Input In [5], in () > > 1 pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', > x)) > 2 pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > 3 pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > File /opt/spark/python/pyspark/pandas/frame.py:10636, in > DataFrame.rename(self, mapper, index, columns, axis, inplace, level, errors) > 10632 index_mapper_fn, index_mapper_ret_dtype, index_mapper_ret_stype = > gen_mapper_fn( > 10633 index > 10634 ) > 10635 if columns: > > 10636 columns_mapper_fn, _, _ = gen_mapper_fn(columns) > 10638 if not index and not columns: > 10639 raise ValueError("Either `index` or `columns` should be > provided.") > File /opt/spark/python/pyspark/pandas/frame.py:10603, in > DataFrame.rename..gen_mapper_fn(mapper) > 10601 elif callable(mapper): > 10602 mapper_callable = cast(Callable, mapper) > > 10603 return_type = cast(ScalarType, infer_return_type(mapper)) > 10604 dtype = return_type.dtype > 10605 spark_return_type = return_type.spark_type > File /opt/spark/python/pyspark/pandas/typedef/typehints.py:563, in > infer_return_type(f) > 560 tpe = get_type_hints(f).get("return", None) > 562 if tpe is None: > --> 563 raise ValueError("A return value is required for the input > function") > 565 if hasattr(tpe, "__origin__") and issubclass(tpe.__origin__, > SeriesType): > 566 tpe = tpe.__args__[0] > ValueError: A return value is required for the input function -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-38763) Pandas API on spark Can`t apply lamda to columns.
[ https://issues.apache.org/jira/browse/SPARK-38763 ] Bjørn Jørgensen deleted comment on SPARK-38763: - was (Author: bjornjorgensen): [~XinrongM] > Pandas API on spark Can`t apply lamda to columns. > --- > > Key: SPARK-38763 > URL: https://issues.apache.org/jira/browse/SPARK-38763 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > > When I use a spark master build from 08 November 21 I can use this code to > rename columns > {code:java} > pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > {code} > But now after I get this error when I use this code. > --- > ValueErrorTraceback (most recent call last) > Input In [5], in () > > 1 pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', > x)) > 2 pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > 3 pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > File /opt/spark/python/pyspark/pandas/frame.py:10636, in > DataFrame.rename(self, mapper, index, columns, axis, inplace, level, errors) > 10632 index_mapper_fn, index_mapper_ret_dtype, index_mapper_ret_stype = > gen_mapper_fn( > 10633 index > 10634 ) > 10635 if columns: > > 10636 columns_mapper_fn, _, _ = gen_mapper_fn(columns) > 10638 if not index and not columns: > 10639 raise ValueError("Either `index` or `columns` should be > provided.") > File /opt/spark/python/pyspark/pandas/frame.py:10603, in > DataFrame.rename..gen_mapper_fn(mapper) > 10601 elif callable(mapper): > 10602 mapper_callable = cast(Callable, mapper) > > 10603 return_type = cast(ScalarType, infer_return_type(mapper)) > 10604 dtype = return_type.dtype > 10605 spark_return_type = return_type.spark_type > File /opt/spark/python/pyspark/pandas/typedef/typehints.py:563, in > infer_return_type(f) > 560 tpe = get_type_hints(f).get("return", None) > 562 if tpe is None: > --> 563 raise ValueError("A return value is required for the input > function") > 565 if hasattr(tpe, "__origin__") and issubclass(tpe.__origin__, > SeriesType): > 566 tpe = tpe.__args__[0] > ValueError: A return value is required for the input function -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37903) Replace string_typehints with get_type_hints.
[ https://issues.apache.org/jira/browse/SPARK-37903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516012#comment-17516012 ] Bjørn Jørgensen commented on SPARK-37903: - I do have some problems now when I use lamba on columns [SPARK-38763|https://issues.apache.org/jira/projects/SPARK/issues/SPARK-38763] > Replace string_typehints with get_type_hints. > - > > Key: SPARK-37903 > URL: https://issues.apache.org/jira/browse/SPARK-37903 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.3.0 > > > Currently we have a hacky way to resolve type hints written as strings, but > we can use {{get_type_hints}} instead. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38764) spark thrift server issue: Length field is empty for varchar fields
Ayan Ray created SPARK-38764: Summary: spark thrift server issue: Length field is empty for varchar fields Key: SPARK-38764 URL: https://issues.apache.org/jira/browse/SPARK-38764 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.1 Reporter: Ayan Ray I am trying to read Data from Spark Thrift Server using SAS. In the table definition through DBeaver, I am seeing that *Length* field is empty only for fields with *VARCHAR* data type. I can see the length in the Data Type field as {*}varchar(32){*}. But that doesn't suffice my purpose as the SAS application taps into the Length field. Since, this field is not populated now, SAS is defaulting to the max size and as a result its becoming extremely slow. I get the length field populated in Hive. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38765) Implement `inplace` parameter of `Series.clip`
Xinrong Meng created SPARK-38765: Summary: Implement `inplace` parameter of `Series.clip` Key: SPARK-38765 URL: https://issues.apache.org/jira/browse/SPARK-38765 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.4.0 Reporter: Xinrong Meng Implement `inplace` parameter of `Series.clip` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38765) Implement `inplace` parameter of `Series.clip`
[ https://issues.apache.org/jira/browse/SPARK-38765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516074#comment-17516074 ] Apache Spark commented on SPARK-38765: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/36041 > Implement `inplace` parameter of `Series.clip` > -- > > Key: SPARK-38765 > URL: https://issues.apache.org/jira/browse/SPARK-38765 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Implement `inplace` parameter of `Series.clip` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38765) Implement `inplace` parameter of `Series.clip`
[ https://issues.apache.org/jira/browse/SPARK-38765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38765: Assignee: (was: Apache Spark) > Implement `inplace` parameter of `Series.clip` > -- > > Key: SPARK-38765 > URL: https://issues.apache.org/jira/browse/SPARK-38765 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Implement `inplace` parameter of `Series.clip` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38765) Implement `inplace` parameter of `Series.clip`
[ https://issues.apache.org/jira/browse/SPARK-38765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38765: Assignee: Apache Spark > Implement `inplace` parameter of `Series.clip` > -- > > Key: SPARK-38765 > URL: https://issues.apache.org/jira/browse/SPARK-38765 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Implement `inplace` parameter of `Series.clip` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38765) Implement `inplace` parameter of `Series.clip`
[ https://issues.apache.org/jira/browse/SPARK-38765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516075#comment-17516075 ] Apache Spark commented on SPARK-38765: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/36041 > Implement `inplace` parameter of `Series.clip` > -- > > Key: SPARK-38765 > URL: https://issues.apache.org/jira/browse/SPARK-38765 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Implement `inplace` parameter of `Series.clip` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38763) Pandas API on spark Can`t apply lamda to columns.
[ https://issues.apache.org/jira/browse/SPARK-38763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516087#comment-17516087 ] Bjørn Jørgensen commented on SPARK-38763: - This error is https://github.com/apache/spark/pull/35236 in typehints.py line 562 {code:java} if tpe is None: raise ValueError("A return value is required for the input function") {code} > Pandas API on spark Can`t apply lamda to columns. > --- > > Key: SPARK-38763 > URL: https://issues.apache.org/jira/browse/SPARK-38763 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > > When I use a spark master build from 08 November 21 I can use this code to > rename columns > {code:java} > pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > {code} > But now after I get this error when I use this code. > --- > ValueErrorTraceback (most recent call last) > Input In [5], in () > > 1 pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', > x)) > 2 pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > 3 pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > File /opt/spark/python/pyspark/pandas/frame.py:10636, in > DataFrame.rename(self, mapper, index, columns, axis, inplace, level, errors) > 10632 index_mapper_fn, index_mapper_ret_dtype, index_mapper_ret_stype = > gen_mapper_fn( > 10633 index > 10634 ) > 10635 if columns: > > 10636 columns_mapper_fn, _, _ = gen_mapper_fn(columns) > 10638 if not index and not columns: > 10639 raise ValueError("Either `index` or `columns` should be > provided.") > File /opt/spark/python/pyspark/pandas/frame.py:10603, in > DataFrame.rename..gen_mapper_fn(mapper) > 10601 elif callable(mapper): > 10602 mapper_callable = cast(Callable, mapper) > > 10603 return_type = cast(ScalarType, infer_return_type(mapper)) > 10604 dtype = return_type.dtype > 10605 spark_return_type = return_type.spark_type > File /opt/spark/python/pyspark/pandas/typedef/typehints.py:563, in > infer_return_type(f) > 560 tpe = get_type_hints(f).get("return", None) > 562 if tpe is None: > --> 563 raise ValueError("A return value is required for the input > function") > 565 if hasattr(tpe, "__origin__") and issubclass(tpe.__origin__, > SeriesType): > 566 tpe = tpe.__args__[0] > ValueError: A return value is required for the input function -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38763) Pandas API on spark Can`t apply lamda to columns.
[ https://issues.apache.org/jira/browse/SPARK-38763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516172#comment-17516172 ] Xinrong Meng commented on SPARK-38763: -- Hi [~bjornjorgensen], thanks for raising that! The workaround is to use a function with a return type rather than a lambda. I am fixing this in https://issues.apache.org/jira/browse/SPARK-38766. > Pandas API on spark Can`t apply lamda to columns. > --- > > Key: SPARK-38763 > URL: https://issues.apache.org/jira/browse/SPARK-38763 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > > When I use a spark master build from 08 November 21 I can use this code to > rename columns > {code:java} > pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > {code} > But now after I get this error when I use this code. > --- > ValueErrorTraceback (most recent call last) > Input In [5], in () > > 1 pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', > x)) > 2 pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > 3 pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > File /opt/spark/python/pyspark/pandas/frame.py:10636, in > DataFrame.rename(self, mapper, index, columns, axis, inplace, level, errors) > 10632 index_mapper_fn, index_mapper_ret_dtype, index_mapper_ret_stype = > gen_mapper_fn( > 10633 index > 10634 ) > 10635 if columns: > > 10636 columns_mapper_fn, _, _ = gen_mapper_fn(columns) > 10638 if not index and not columns: > 10639 raise ValueError("Either `index` or `columns` should be > provided.") > File /opt/spark/python/pyspark/pandas/frame.py:10603, in > DataFrame.rename..gen_mapper_fn(mapper) > 10601 elif callable(mapper): > 10602 mapper_callable = cast(Callable, mapper) > > 10603 return_type = cast(ScalarType, infer_return_type(mapper)) > 10604 dtype = return_type.dtype > 10605 spark_return_type = return_type.spark_type > File /opt/spark/python/pyspark/pandas/typedef/typehints.py:563, in > infer_return_type(f) > 560 tpe = get_type_hints(f).get("return", None) > 562 if tpe is None: > --> 563 raise ValueError("A return value is required for the input > function") > 565 if hasattr(tpe, "__origin__") and issubclass(tpe.__origin__, > SeriesType): > 566 tpe = tpe.__args__[0] > ValueError: A return value is required for the input function -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38766) Support lambda `column` parameter of `DataFrame.rename`
[ https://issues.apache.org/jira/browse/SPARK-38766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516173#comment-17516173 ] Xinrong Meng commented on SPARK-38766: -- I am working on that. > Support lambda `column` parameter of `DataFrame.rename` > --- > > Key: SPARK-38766 > URL: https://issues.apache.org/jira/browse/SPARK-38766 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Support lambda `column` parameter of `DataFrame.rename`. > The issue was detected in https://issues.apache.org/jira/browse/SPARK-38763. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38766) Support lambda `column` parameter of `DataFrame.rename`
Xinrong Meng created SPARK-38766: Summary: Support lambda `column` parameter of `DataFrame.rename` Key: SPARK-38766 URL: https://issues.apache.org/jira/browse/SPARK-38766 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.4.0 Reporter: Xinrong Meng Support lambda `column` parameter of `DataFrame.rename`. The issue was detected in https://issues.apache.org/jira/browse/SPARK-38763. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38766) Support lambda `column` parameter of `DataFrame.rename`
[ https://issues.apache.org/jira/browse/SPARK-38766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-38766. -- Resolution: Duplicate > Support lambda `column` parameter of `DataFrame.rename` > --- > > Key: SPARK-38766 > URL: https://issues.apache.org/jira/browse/SPARK-38766 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Support lambda `column` parameter of `DataFrame.rename`. > The issue was detected in https://issues.apache.org/jira/browse/SPARK-38763. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-38763) Pandas API on spark Can`t apply lamda to columns.
[ https://issues.apache.org/jira/browse/SPARK-38763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516172#comment-17516172 ] Xinrong Meng edited comment on SPARK-38763 at 4/1/22 11:56 PM: --- Hi [~bjornjorgensen], thanks for raising that! The workaround is to use a function with a return type rather than a lambda. I am fixing this now. was (Author: xinrongm): Hi [~bjornjorgensen], thanks for raising that! The workaround is to use a function with a return type rather than a lambda. I am fixing this in https://issues.apache.org/jira/browse/SPARK-38766. > Pandas API on spark Can`t apply lamda to columns. > --- > > Key: SPARK-38763 > URL: https://issues.apache.org/jira/browse/SPARK-38763 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > > When I use a spark master build from 08 November 21 I can use this code to > rename columns > {code:java} > pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > {code} > But now after I get this error when I use this code. > --- > ValueErrorTraceback (most recent call last) > Input In [5], in () > > 1 pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', > x)) > 2 pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > 3 pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > File /opt/spark/python/pyspark/pandas/frame.py:10636, in > DataFrame.rename(self, mapper, index, columns, axis, inplace, level, errors) > 10632 index_mapper_fn, index_mapper_ret_dtype, index_mapper_ret_stype = > gen_mapper_fn( > 10633 index > 10634 ) > 10635 if columns: > > 10636 columns_mapper_fn, _, _ = gen_mapper_fn(columns) > 10638 if not index and not columns: > 10639 raise ValueError("Either `index` or `columns` should be > provided.") > File /opt/spark/python/pyspark/pandas/frame.py:10603, in > DataFrame.rename..gen_mapper_fn(mapper) > 10601 elif callable(mapper): > 10602 mapper_callable = cast(Callable, mapper) > > 10603 return_type = cast(ScalarType, infer_return_type(mapper)) > 10604 dtype = return_type.dtype > 10605 spark_return_type = return_type.spark_type > File /opt/spark/python/pyspark/pandas/typedef/typehints.py:563, in > infer_return_type(f) > 560 tpe = get_type_hints(f).get("return", None) > 562 if tpe is None: > --> 563 raise ValueError("A return value is required for the input > function") > 565 if hasattr(tpe, "__origin__") and issubclass(tpe.__origin__, > SeriesType): > 566 tpe = tpe.__args__[0] > ValueError: A return value is required for the input function -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38763) Pandas API on spark Can`t apply lamda to columns.
[ https://issues.apache.org/jira/browse/SPARK-38763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516175#comment-17516175 ] Apache Spark commented on SPARK-38763: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/36042 > Pandas API on spark Can`t apply lamda to columns. > --- > > Key: SPARK-38763 > URL: https://issues.apache.org/jira/browse/SPARK-38763 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > > When I use a spark master build from 08 November 21 I can use this code to > rename columns > {code:java} > pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > {code} > But now after I get this error when I use this code. > --- > ValueErrorTraceback (most recent call last) > Input In [5], in () > > 1 pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', > x)) > 2 pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > 3 pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > File /opt/spark/python/pyspark/pandas/frame.py:10636, in > DataFrame.rename(self, mapper, index, columns, axis, inplace, level, errors) > 10632 index_mapper_fn, index_mapper_ret_dtype, index_mapper_ret_stype = > gen_mapper_fn( > 10633 index > 10634 ) > 10635 if columns: > > 10636 columns_mapper_fn, _, _ = gen_mapper_fn(columns) > 10638 if not index and not columns: > 10639 raise ValueError("Either `index` or `columns` should be > provided.") > File /opt/spark/python/pyspark/pandas/frame.py:10603, in > DataFrame.rename..gen_mapper_fn(mapper) > 10601 elif callable(mapper): > 10602 mapper_callable = cast(Callable, mapper) > > 10603 return_type = cast(ScalarType, infer_return_type(mapper)) > 10604 dtype = return_type.dtype > 10605 spark_return_type = return_type.spark_type > File /opt/spark/python/pyspark/pandas/typedef/typehints.py:563, in > infer_return_type(f) > 560 tpe = get_type_hints(f).get("return", None) > 562 if tpe is None: > --> 563 raise ValueError("A return value is required for the input > function") > 565 if hasattr(tpe, "__origin__") and issubclass(tpe.__origin__, > SeriesType): > 566 tpe = tpe.__args__[0] > ValueError: A return value is required for the input function -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38763) Pandas API on spark Can`t apply lamda to columns.
[ https://issues.apache.org/jira/browse/SPARK-38763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38763: Assignee: Apache Spark > Pandas API on spark Can`t apply lamda to columns. > --- > > Key: SPARK-38763 > URL: https://issues.apache.org/jira/browse/SPARK-38763 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Apache Spark >Priority: Major > > When I use a spark master build from 08 November 21 I can use this code to > rename columns > {code:java} > pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > {code} > But now after I get this error when I use this code. > --- > ValueErrorTraceback (most recent call last) > Input In [5], in () > > 1 pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', > x)) > 2 pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > 3 pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > File /opt/spark/python/pyspark/pandas/frame.py:10636, in > DataFrame.rename(self, mapper, index, columns, axis, inplace, level, errors) > 10632 index_mapper_fn, index_mapper_ret_dtype, index_mapper_ret_stype = > gen_mapper_fn( > 10633 index > 10634 ) > 10635 if columns: > > 10636 columns_mapper_fn, _, _ = gen_mapper_fn(columns) > 10638 if not index and not columns: > 10639 raise ValueError("Either `index` or `columns` should be > provided.") > File /opt/spark/python/pyspark/pandas/frame.py:10603, in > DataFrame.rename..gen_mapper_fn(mapper) > 10601 elif callable(mapper): > 10602 mapper_callable = cast(Callable, mapper) > > 10603 return_type = cast(ScalarType, infer_return_type(mapper)) > 10604 dtype = return_type.dtype > 10605 spark_return_type = return_type.spark_type > File /opt/spark/python/pyspark/pandas/typedef/typehints.py:563, in > infer_return_type(f) > 560 tpe = get_type_hints(f).get("return", None) > 562 if tpe is None: > --> 563 raise ValueError("A return value is required for the input > function") > 565 if hasattr(tpe, "__origin__") and issubclass(tpe.__origin__, > SeriesType): > 566 tpe = tpe.__args__[0] > ValueError: A return value is required for the input function -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38763) Pandas API on spark Can`t apply lamda to columns.
[ https://issues.apache.org/jira/browse/SPARK-38763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38763: Assignee: (was: Apache Spark) > Pandas API on spark Can`t apply lamda to columns. > --- > > Key: SPARK-38763 > URL: https://issues.apache.org/jira/browse/SPARK-38763 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > > When I use a spark master build from 08 November 21 I can use this code to > rename columns > {code:java} > pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > {code} > But now after I get this error when I use this code. > --- > ValueErrorTraceback (most recent call last) > Input In [5], in () > > 1 pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', > x)) > 2 pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > 3 pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > File /opt/spark/python/pyspark/pandas/frame.py:10636, in > DataFrame.rename(self, mapper, index, columns, axis, inplace, level, errors) > 10632 index_mapper_fn, index_mapper_ret_dtype, index_mapper_ret_stype = > gen_mapper_fn( > 10633 index > 10634 ) > 10635 if columns: > > 10636 columns_mapper_fn, _, _ = gen_mapper_fn(columns) > 10638 if not index and not columns: > 10639 raise ValueError("Either `index` or `columns` should be > provided.") > File /opt/spark/python/pyspark/pandas/frame.py:10603, in > DataFrame.rename..gen_mapper_fn(mapper) > 10601 elif callable(mapper): > 10602 mapper_callable = cast(Callable, mapper) > > 10603 return_type = cast(ScalarType, infer_return_type(mapper)) > 10604 dtype = return_type.dtype > 10605 spark_return_type = return_type.spark_type > File /opt/spark/python/pyspark/pandas/typedef/typehints.py:563, in > infer_return_type(f) > 560 tpe = get_type_hints(f).get("return", None) > 562 if tpe is None: > --> 563 raise ValueError("A return value is required for the input > function") > 565 if hasattr(tpe, "__origin__") and issubclass(tpe.__origin__, > SeriesType): > 566 tpe = tpe.__args__[0] > ValueError: A return value is required for the input function -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38763) Pandas API on spark Can`t apply lamda to columns.
[ https://issues.apache.org/jira/browse/SPARK-38763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516176#comment-17516176 ] Apache Spark commented on SPARK-38763: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/36042 > Pandas API on spark Can`t apply lamda to columns. > --- > > Key: SPARK-38763 > URL: https://issues.apache.org/jira/browse/SPARK-38763 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > > When I use a spark master build from 08 November 21 I can use this code to > rename columns > {code:java} > pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > {code} > But now after I get this error when I use this code. > --- > ValueErrorTraceback (most recent call last) > Input In [5], in () > > 1 pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', > x)) > 2 pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > 3 pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > File /opt/spark/python/pyspark/pandas/frame.py:10636, in > DataFrame.rename(self, mapper, index, columns, axis, inplace, level, errors) > 10632 index_mapper_fn, index_mapper_ret_dtype, index_mapper_ret_stype = > gen_mapper_fn( > 10633 index > 10634 ) > 10635 if columns: > > 10636 columns_mapper_fn, _, _ = gen_mapper_fn(columns) > 10638 if not index and not columns: > 10639 raise ValueError("Either `index` or `columns` should be > provided.") > File /opt/spark/python/pyspark/pandas/frame.py:10603, in > DataFrame.rename..gen_mapper_fn(mapper) > 10601 elif callable(mapper): > 10602 mapper_callable = cast(Callable, mapper) > > 10603 return_type = cast(ScalarType, infer_return_type(mapper)) > 10604 dtype = return_type.dtype > 10605 spark_return_type = return_type.spark_type > File /opt/spark/python/pyspark/pandas/typedef/typehints.py:563, in > infer_return_type(f) > 560 tpe = get_type_hints(f).get("return", None) > 562 if tpe is None: > --> 563 raise ValueError("A return value is required for the input > function") > 565 if hasattr(tpe, "__origin__") and issubclass(tpe.__origin__, > SeriesType): > 566 tpe = tpe.__args__[0] > ValueError: A return value is required for the input function -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38763) Pandas API on spark Can`t apply lamda to columns.
[ https://issues.apache.org/jira/browse/SPARK-38763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516177#comment-17516177 ] Xinrong Meng commented on SPARK-38763: -- I will backport the fix after approved and merged. > Pandas API on spark Can`t apply lamda to columns. > --- > > Key: SPARK-38763 > URL: https://issues.apache.org/jira/browse/SPARK-38763 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > > When I use a spark master build from 08 November 21 I can use this code to > rename columns > {code:java} > pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > {code} > But now after I get this error when I use this code. > --- > ValueErrorTraceback (most recent call last) > Input In [5], in () > > 1 pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', > x)) > 2 pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > 3 pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > File /opt/spark/python/pyspark/pandas/frame.py:10636, in > DataFrame.rename(self, mapper, index, columns, axis, inplace, level, errors) > 10632 index_mapper_fn, index_mapper_ret_dtype, index_mapper_ret_stype = > gen_mapper_fn( > 10633 index > 10634 ) > 10635 if columns: > > 10636 columns_mapper_fn, _, _ = gen_mapper_fn(columns) > 10638 if not index and not columns: > 10639 raise ValueError("Either `index` or `columns` should be > provided.") > File /opt/spark/python/pyspark/pandas/frame.py:10603, in > DataFrame.rename..gen_mapper_fn(mapper) > 10601 elif callable(mapper): > 10602 mapper_callable = cast(Callable, mapper) > > 10603 return_type = cast(ScalarType, infer_return_type(mapper)) > 10604 dtype = return_type.dtype > 10605 spark_return_type = return_type.spark_type > File /opt/spark/python/pyspark/pandas/typedef/typehints.py:563, in > infer_return_type(f) > 560 tpe = get_type_hints(f).get("return", None) > 562 if tpe is None: > --> 563 raise ValueError("A return value is required for the input > function") > 565 if hasattr(tpe, "__origin__") and issubclass(tpe.__origin__, > SeriesType): > 566 tpe = tpe.__args__[0] > ValueError: A return value is required for the input function -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38767) Support ignoreCorruptFiles and ignoreMissingFiles in Data Source options
Yaohua Zhao created SPARK-38767: --- Summary: Support ignoreCorruptFiles and ignoreMissingFiles in Data Source options Key: SPARK-38767 URL: https://issues.apache.org/jira/browse/SPARK-38767 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.1 Reporter: Yaohua Zhao -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38620) Replace `value.formatted(formatString)` with `formatString.format(value)` to clean up compilation warning
[ https://issues.apache.org/jira/browse/SPARK-38620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-38620. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 35930 [https://github.com/apache/spark/pull/35930] > Replace `value.formatted(formatString)` with `formatString.format(value)` to > clean up compilation warning > - > > Key: SPARK-38620 > URL: https://issues.apache.org/jira/browse/SPARK-38620 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.4.0 > > > There are compile warnings as follows: > {code:java} > [WARNING] > /spark-source/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingPage.scala:67: > [deprecation @ > org.apache.spark.streaming.ui.RecordRateUIData.formattedAvg.$anonfun | > origin=scala.Predef.StringFormat.formatted | version=2.12.16] method > formatted in class StringFormat is deprecated (since 2.12.16): Use > `formatString.format(value)` instead of `value.formatted(formatString)`, > or use the `f""` string interpolator. In Java 15 and later, `formatted` > resolves to the new method in String which has reversed parameters. > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala:201: > [deprecation @ > org.apache.spark.sql.streaming.ui.StreamingQueryPagedTable.row | > origin=scala.Predef.StringFormat.formatted | version=2.12.16] method > formatted in class StringFormat is deprecated (since 2.12.16): Use > `formatString.format(value)` instead of `value.formatted(formatString)`, > or use the `f""` string interpolator. In Java 15 and later, `formatted` > resolves to the new method in String which has reversed parameters. > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala:202: > [deprecation @ > org.apache.spark.sql.streaming.ui.StreamingQueryPagedTable.row | > origin=scala.Predef.StringFormat.formatted | version=2.12.16] method > formatted in class StringFormat is deprecated (since 2.12.16): Use > `formatString.format(value)` instead of `value.formatted(formatString)`, > or use the `f""` string interpolator. In Java 15 and later, `formatted` > resolves to the new method in String which has reversed parameters. {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38620) Replace `value.formatted(formatString)` with `formatString.format(value)` to clean up compilation warning
[ https://issues.apache.org/jira/browse/SPARK-38620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-38620: - Priority: Trivial (was: Minor) > Replace `value.formatted(formatString)` with `formatString.format(value)` to > clean up compilation warning > - > > Key: SPARK-38620 > URL: https://issues.apache.org/jira/browse/SPARK-38620 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > Fix For: 3.4.0 > > > There are compile warnings as follows: > {code:java} > [WARNING] > /spark-source/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingPage.scala:67: > [deprecation @ > org.apache.spark.streaming.ui.RecordRateUIData.formattedAvg.$anonfun | > origin=scala.Predef.StringFormat.formatted | version=2.12.16] method > formatted in class StringFormat is deprecated (since 2.12.16): Use > `formatString.format(value)` instead of `value.formatted(formatString)`, > or use the `f""` string interpolator. In Java 15 and later, `formatted` > resolves to the new method in String which has reversed parameters. > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala:201: > [deprecation @ > org.apache.spark.sql.streaming.ui.StreamingQueryPagedTable.row | > origin=scala.Predef.StringFormat.formatted | version=2.12.16] method > formatted in class StringFormat is deprecated (since 2.12.16): Use > `formatString.format(value)` instead of `value.formatted(formatString)`, > or use the `f""` string interpolator. In Java 15 and later, `formatted` > resolves to the new method in String which has reversed parameters. > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala:202: > [deprecation @ > org.apache.spark.sql.streaming.ui.StreamingQueryPagedTable.row | > origin=scala.Predef.StringFormat.formatted | version=2.12.16] method > formatted in class StringFormat is deprecated (since 2.12.16): Use > `formatString.format(value)` instead of `value.formatted(formatString)`, > or use the `f""` string interpolator. In Java 15 and later, `formatted` > resolves to the new method in String which has reversed parameters. {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38620) Replace `value.formatted(formatString)` with `formatString.format(value)` to clean up compilation warning
[ https://issues.apache.org/jira/browse/SPARK-38620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-38620: Assignee: Yang Jie > Replace `value.formatted(formatString)` with `formatString.format(value)` to > clean up compilation warning > - > > Key: SPARK-38620 > URL: https://issues.apache.org/jira/browse/SPARK-38620 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > There are compile warnings as follows: > {code:java} > [WARNING] > /spark-source/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingPage.scala:67: > [deprecation @ > org.apache.spark.streaming.ui.RecordRateUIData.formattedAvg.$anonfun | > origin=scala.Predef.StringFormat.formatted | version=2.12.16] method > formatted in class StringFormat is deprecated (since 2.12.16): Use > `formatString.format(value)` instead of `value.formatted(formatString)`, > or use the `f""` string interpolator. In Java 15 and later, `formatted` > resolves to the new method in String which has reversed parameters. > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala:201: > [deprecation @ > org.apache.spark.sql.streaming.ui.StreamingQueryPagedTable.row | > origin=scala.Predef.StringFormat.formatted | version=2.12.16] method > formatted in class StringFormat is deprecated (since 2.12.16): Use > `formatString.format(value)` instead of `value.formatted(formatString)`, > or use the `f""` string interpolator. In Java 15 and later, `formatted` > resolves to the new method in String which has reversed parameters. > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala:202: > [deprecation @ > org.apache.spark.sql.streaming.ui.StreamingQueryPagedTable.row | > origin=scala.Predef.StringFormat.formatted | version=2.12.16] method > formatted in class StringFormat is deprecated (since 2.12.16): Use > `formatString.format(value)` instead of `value.formatted(formatString)`, > or use the `f""` string interpolator. In Java 15 and later, `formatted` > resolves to the new method in String which has reversed parameters. {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34863) Support nested column in Spark Parquet vectorized readers
[ https://issues.apache.org/jira/browse/SPARK-34863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-34863. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34659 [https://github.com/apache/spark/pull/34659] > Support nested column in Spark Parquet vectorized readers > - > > Key: SPARK-34863 > URL: https://issues.apache.org/jira/browse/SPARK-34863 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Cheng Su >Assignee: Apache Spark >Priority: Minor > Fix For: 3.3.0 > > > The task is to support nested column type in Spark Parquet vectorized reader. > Currently Parquet vectorized reader does not support nested column type > (struct, array and map). We implemented nested column vectorized reader for > FB-ORC in our internal fork of Spark. We are seeing performance improvement > compared to non-vectorized reader when reading nested columns. In addition, > this can also help improve the non-nested column performance when reading > non-nested and nested columns together in one query. > > Parquet: > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L173] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38768) If limit could pushed down and Data source only have one partition, DS V2 should not do limit again
jiaan.geng created SPARK-38768: -- Summary: If limit could pushed down and Data source only have one partition, DS V2 should not do limit again Key: SPARK-38768 URL: https://issues.apache.org/jira/browse/SPARK-38768 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: jiaan.geng -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38769) [SQL] behavior of schema_of_json not same with 2.4.0
[ https://issues.apache.org/jira/browse/SPARK-38769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gabrywu updated SPARK-38769: Summary: [SQL] behavior of schema_of_json not same with 2.4.0 (was: [SQL] behavior schema_of_json not same with 2.4.0) > [SQL] behavior of schema_of_json not same with 2.4.0 > > > Key: SPARK-38769 > URL: https://issues.apache.org/jira/browse/SPARK-38769 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: gabrywu >Priority: Minor > > When I switch to spark 3.1.1 from spark 2.4.0, I found a built-in function > throw errors: > |== Physical Plan == org.apache.spark.sql.AnalysisException: cannot resolve > 'schema_of_json(get_json_object(`adtnl_info_txt`, '$.all_model_scores'))' due > to data type mismatch: The input json should be a foldable string expression > and not null; however, got get_json_object(`adtnl_info_txt`, > '$.all_model_scores').; line 3 pos 2; | > But schema_of_json worked well in 2.4.0, So, is it a bug, or a new feature, > which doesn't support non-Literal expressions? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38769) [SQL] behavior schema_of_json not same with 2.4.0
gabrywu created SPARK-38769: --- Summary: [SQL] behavior schema_of_json not same with 2.4.0 Key: SPARK-38769 URL: https://issues.apache.org/jira/browse/SPARK-38769 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.1 Reporter: gabrywu When I switch to spark 3.1.1 from spark 2.4.0, I found a built-in function throw errors: |== Physical Plan == org.apache.spark.sql.AnalysisException: cannot resolve 'schema_of_json(get_json_object(`adtnl_info_txt`, '$.all_model_scores'))' due to data type mismatch: The input json should be a foldable string expression and not null; however, got get_json_object(`adtnl_info_txt`, '$.all_model_scores').; line 3 pos 2; | But schema_of_json worked well in 2.4.0, So, is it a bug, or a new feature, which doesn't support non-Literal expressions? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38768) If limit could pushed down and Data source only have one partition, DS V2 should not do limit again
[ https://issues.apache.org/jira/browse/SPARK-38768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38768: Assignee: Apache Spark > If limit could pushed down and Data source only have one partition, DS V2 > should not do limit again > --- > > Key: SPARK-38768 > URL: https://issues.apache.org/jira/browse/SPARK-38768 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38768) If limit could pushed down and Data source only have one partition, DS V2 should not do limit again
[ https://issues.apache.org/jira/browse/SPARK-38768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516210#comment-17516210 ] Apache Spark commented on SPARK-38768: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/36043 > If limit could pushed down and Data source only have one partition, DS V2 > should not do limit again > --- > > Key: SPARK-38768 > URL: https://issues.apache.org/jira/browse/SPARK-38768 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38768) If limit could pushed down and Data source only have one partition, DS V2 should not do limit again
[ https://issues.apache.org/jira/browse/SPARK-38768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38768: Assignee: Apache Spark > If limit could pushed down and Data source only have one partition, DS V2 > should not do limit again > --- > > Key: SPARK-38768 > URL: https://issues.apache.org/jira/browse/SPARK-38768 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38768) If limit could pushed down and Data source only have one partition, DS V2 should not do limit again
[ https://issues.apache.org/jira/browse/SPARK-38768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516209#comment-17516209 ] Apache Spark commented on SPARK-38768: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/36043 > If limit could pushed down and Data source only have one partition, DS V2 > should not do limit again > --- > > Key: SPARK-38768 > URL: https://issues.apache.org/jira/browse/SPARK-38768 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38768) If limit could pushed down and Data source only have one partition, DS V2 should not do limit again
[ https://issues.apache.org/jira/browse/SPARK-38768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38768: Assignee: (was: Apache Spark) > If limit could pushed down and Data source only have one partition, DS V2 > should not do limit again > --- > > Key: SPARK-38768 > URL: https://issues.apache.org/jira/browse/SPARK-38768 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38769) [SQL] behavior of schema_of_json not same with 2.4.0
[ https://issues.apache.org/jira/browse/SPARK-38769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516219#comment-17516219 ] gabrywu commented on SPARK-38769: - [~maxgekk] > [SQL] behavior of schema_of_json not same with 2.4.0 > > > Key: SPARK-38769 > URL: https://issues.apache.org/jira/browse/SPARK-38769 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: gabrywu >Priority: Minor > > When I switch to spark 3.1.1 from spark 2.4.0, I found a built-in function > throw errors: > |== Physical Plan == org.apache.spark.sql.AnalysisException: cannot resolve > 'schema_of_json(get_json_object(`adtnl_info_txt`, '$.all_model_scores'))' due > to data type mismatch: The input json should be a foldable string expression > and not null; however, got get_json_object(`adtnl_info_txt`, > '$.all_model_scores').; line 3 pos 2; | > But schema_of_json worked well in 2.4.0, So, is it a bug, or a new feature, > which doesn't support non-Literal expressions? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38770) Simply steps to re write primary resource in k8s spark application
qian created SPARK-38770: Summary: Simply steps to re write primary resource in k8s spark application Key: SPARK-38770 URL: https://issues.apache.org/jira/browse/SPARK-38770 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1, 3.1.0 Reporter: qian Fix For: 3.3.0 re-write primary resource actions use renameMainAppResource method twice and second usage has no effect. So, Simply steps to re write primary resource in k8s spark application -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38770) Simply steps to re write primary resource in k8s spark application
[ https://issues.apache.org/jira/browse/SPARK-38770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qian updated SPARK-38770: - Fix Version/s: 3.4.0 (was: 3.3.0) > Simply steps to re write primary resource in k8s spark application > -- > > Key: SPARK-38770 > URL: https://issues.apache.org/jira/browse/SPARK-38770 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1 >Reporter: qian >Priority: Major > Fix For: 3.4.0 > > > re-write primary resource actions use renameMainAppResource method twice and > second usage has no effect. So, Simply steps to re write primary resource in > k8s spark application -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38770) Simply steps to re write primary resource in k8s spark application
[ https://issues.apache.org/jira/browse/SPARK-38770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516228#comment-17516228 ] Apache Spark commented on SPARK-38770: -- User 'dcoliversun' has created a pull request for this issue: https://github.com/apache/spark/pull/36044 > Simply steps to re write primary resource in k8s spark application > -- > > Key: SPARK-38770 > URL: https://issues.apache.org/jira/browse/SPARK-38770 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1 >Reporter: qian >Priority: Major > Fix For: 3.4.0 > > > re-write primary resource actions use renameMainAppResource method twice and > second usage has no effect. So, Simply steps to re write primary resource in > k8s spark application -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38770) Simply steps to re write primary resource in k8s spark application
[ https://issues.apache.org/jira/browse/SPARK-38770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38770: Assignee: Apache Spark > Simply steps to re write primary resource in k8s spark application > -- > > Key: SPARK-38770 > URL: https://issues.apache.org/jira/browse/SPARK-38770 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1 >Reporter: qian >Assignee: Apache Spark >Priority: Major > Fix For: 3.4.0 > > > re-write primary resource actions use renameMainAppResource method twice and > second usage has no effect. So, Simply steps to re write primary resource in > k8s spark application -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38770) Simply steps to re write primary resource in k8s spark application
[ https://issues.apache.org/jira/browse/SPARK-38770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38770: Assignee: (was: Apache Spark) > Simply steps to re write primary resource in k8s spark application > -- > > Key: SPARK-38770 > URL: https://issues.apache.org/jira/browse/SPARK-38770 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1 >Reporter: qian >Priority: Major > Fix For: 3.4.0 > > > re-write primary resource actions use renameMainAppResource method twice and > second usage has no effect. So, Simply steps to re write primary resource in > k8s spark application -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org