[jira] [Assigned] (SPARK-41102) Merge SparkConnectPlanner and SparkConnectCommandPlanner
[ https://issues.apache.org/jira/browse/SPARK-41102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-41102: --- Assignee: Rui Wang > Merge SparkConnectPlanner and SparkConnectCommandPlanner > > > Key: SPARK-41102 > URL: https://issues.apache.org/jira/browse/SPARK-41102 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41102) Merge SparkConnectPlanner and SparkConnectCommandPlanner
[ https://issues.apache.org/jira/browse/SPARK-41102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-41102. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38604 [https://github.com/apache/spark/pull/38604] > Merge SparkConnectPlanner and SparkConnectCommandPlanner > > > Key: SPARK-41102 > URL: https://issues.apache.org/jira/browse/SPARK-41102 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41113) Upgrade sbt to 1.8.0
[ https://issues.apache.org/jira/browse/SPARK-41113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632133#comment-17632133 ] Apache Spark commented on SPARK-41113: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38620 > Upgrade sbt to 1.8.0 > > > Key: SPARK-41113 > URL: https://issues.apache.org/jira/browse/SPARK-41113 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > https://github.com/sbt/sbt/releases/tag/v1.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41113) Upgrade sbt to 1.8.0
[ https://issues.apache.org/jira/browse/SPARK-41113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632132#comment-17632132 ] Apache Spark commented on SPARK-41113: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38620 > Upgrade sbt to 1.8.0 > > > Key: SPARK-41113 > URL: https://issues.apache.org/jira/browse/SPARK-41113 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > https://github.com/sbt/sbt/releases/tag/v1.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41113) Upgrade sbt to 1.8.0
[ https://issues.apache.org/jira/browse/SPARK-41113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41113: Assignee: (was: Apache Spark) > Upgrade sbt to 1.8.0 > > > Key: SPARK-41113 > URL: https://issues.apache.org/jira/browse/SPARK-41113 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > https://github.com/sbt/sbt/releases/tag/v1.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41113) Upgrade sbt to 1.8.0
[ https://issues.apache.org/jira/browse/SPARK-41113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41113: Assignee: Apache Spark > Upgrade sbt to 1.8.0 > > > Key: SPARK-41113 > URL: https://issues.apache.org/jira/browse/SPARK-41113 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > https://github.com/sbt/sbt/releases/tag/v1.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41113) Upgrade sbt to 1.8.0
Yang Jie created SPARK-41113: Summary: Upgrade sbt to 1.8.0 Key: SPARK-41113 URL: https://issues.apache.org/jira/browse/SPARK-41113 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie https://github.com/sbt/sbt/releases/tag/v1.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41112) RuntimeFilter should apply ColumnPruning eagerly with in-subquery filter
[ https://issues.apache.org/jira/browse/SPARK-41112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41112: Assignee: Apache Spark > RuntimeFilter should apply ColumnPruning eagerly with in-subquery filter > > > Key: SPARK-41112 > URL: https://issues.apache.org/jira/browse/SPARK-41112 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Major > > The inferred in-subquery filter should apply ColumnPruning before get plan > statistics and check if can be broadcasted. Otherwise, the final physical > plan will be different from expected. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41112) RuntimeFilter should apply ColumnPruning eagerly with in-subquery filter
[ https://issues.apache.org/jira/browse/SPARK-41112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632128#comment-17632128 ] Apache Spark commented on SPARK-41112: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/38619 > RuntimeFilter should apply ColumnPruning eagerly with in-subquery filter > > > Key: SPARK-41112 > URL: https://issues.apache.org/jira/browse/SPARK-41112 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > > The inferred in-subquery filter should apply ColumnPruning before get plan > statistics and check if can be broadcasted. Otherwise, the final physical > plan will be different from expected. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41112) RuntimeFilter should apply ColumnPruning eagerly with in-subquery filter
[ https://issues.apache.org/jira/browse/SPARK-41112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41112: Assignee: (was: Apache Spark) > RuntimeFilter should apply ColumnPruning eagerly with in-subquery filter > > > Key: SPARK-41112 > URL: https://issues.apache.org/jira/browse/SPARK-41112 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > > The inferred in-subquery filter should apply ColumnPruning before get plan > statistics and check if can be broadcasted. Otherwise, the final physical > plan will be different from expected. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41112) RuntimeFilter should apply ColumnPruning eagerly with in-subquery filter
XiDuo You created SPARK-41112: - Summary: RuntimeFilter should apply ColumnPruning eagerly with in-subquery filter Key: SPARK-41112 URL: https://issues.apache.org/jira/browse/SPARK-41112 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: XiDuo You The inferred in-subquery filter should apply ColumnPruning before get plan statistics and check if can be broadcasted. Otherwise, the final physical plan will be different from expected. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41005) Arrow based collect
[ https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632119#comment-17632119 ] Apache Spark commented on SPARK-41005: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/38618 > Arrow based collect > --- > > Key: SPARK-41005 > URL: https://issues.apache.org/jira/browse/SPARK-41005 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41005) Arrow based collect
[ https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632117#comment-17632117 ] Apache Spark commented on SPARK-41005: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/38618 > Arrow based collect > --- > > Key: SPARK-41005 > URL: https://issues.apache.org/jira/browse/SPARK-41005 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41005) Arrow based collect
[ https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632118#comment-17632118 ] Apache Spark commented on SPARK-41005: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/38618 > Arrow based collect > --- > > Key: SPARK-41005 > URL: https://issues.apache.org/jira/browse/SPARK-41005 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41108) Control the max size of arrow batch
[ https://issues.apache.org/jira/browse/SPARK-41108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632115#comment-17632115 ] Apache Spark commented on SPARK-41108: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/38618 > Control the max size of arrow batch > --- > > Key: SPARK-41108 > URL: https://issues.apache.org/jira/browse/SPARK-41108 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41111) Implement `DataFrame.show`
Ruifeng Zheng created SPARK-4: - Summary: Implement `DataFrame.show` Key: SPARK-4 URL: https://issues.apache.org/jira/browse/SPARK-4 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41111) Implement `DataFrame.show`
[ https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-4: - Assignee: Ruifeng Zheng > Implement `DataFrame.show` > -- > > Key: SPARK-4 > URL: https://issues.apache.org/jira/browse/SPARK-4 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41108) Control the max size of arrow batch
[ https://issues.apache.org/jira/browse/SPARK-41108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632116#comment-17632116 ] Apache Spark commented on SPARK-41108: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/38618 > Control the max size of arrow batch > --- > > Key: SPARK-41108 > URL: https://issues.apache.org/jira/browse/SPARK-41108 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40798) Alter partition should verify value
[ https://issues.apache.org/jira/browse/SPARK-40798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632106#comment-17632106 ] Ranga Reddy commented on SPARK-40798: - The below issue is addressed in the current Jira. Can I add a test case for the below issue in InsertSuite.scala? https://issues.apache.org/jira/browse/SPARK-40988 > Alter partition should verify value > --- > > Key: SPARK-40798 > URL: https://issues.apache.org/jira/browse/SPARK-40798 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.4.0 > > > > {code:java} > CREATE TABLE t (c int) USING PARQUET PARTITIONED BY(p int); > -- This DDL should fail but worked: > ALTER TABLE t ADD PARTITION(p='aaa'); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41095) Convert unresolved operators to internal errors
[ https://issues.apache.org/jira/browse/SPARK-41095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-41095. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38582 [https://github.com/apache/spark/pull/38582] > Convert unresolved operators to internal errors > --- > > Key: SPARK-41095 > URL: https://issues.apache.org/jira/browse/SPARK-41095 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > > The 'unresolved operator' error is considered as a bug in most cases. Need to > convert any such errors to internal errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40988) Spark3 partition column value is not validated with user provided schema.
[ https://issues.apache.org/jira/browse/SPARK-40988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632102#comment-17632102 ] Ranga Reddy commented on SPARK-40988: - The following Jira will resolve the issue by throwing the *CAST_INVALID_INPUT* error. https://issues.apache.org/jira/browse/SPARK-40798 > Spark3 partition column value is not validated with user provided schema. > - > > Key: SPARK-40988 > URL: https://issues.apache.org/jira/browse/SPARK-40988 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Ranga Reddy >Priority: Major > > Spark3 has not validated the Partition Column type while inserting the data > but on the Hive side exception is thrown while inserting different type > values. > *Spark Code:* > > {code:java} > scala> val tableName="test_partition_table" > tableName: String = test_partition_table > scala>scala> spark.sql(s"DROP TABLE IF EXISTS $tableName") > res0: org.apache.spark.sql.DataFrame = [] > scala> spark.sql(s"CREATE EXTERNAL TABLE $tableName ( id INT, name STRING ) > PARTITIONED BY (age INT) LOCATION 'file:/tmp/spark-warehouse/$tableName'") > res1: org.apache.spark.sql.DataFrame = [] > scala> spark.sql("SHOW tables").show(truncate=false) > +-+-+---+ > |namespace|tableName |isTemporary| > +-+-+---+ > |default |test_partition_table |false | > +-+-+---+ > scala> spark.sql("SET spark.sql.sources.validatePartitionColumns").show(50, > false) > +--+-+ > |key |value| > +--+-+ > |spark.sql.sources.validatePartitionColumns|true | > +--+-+ > scala> spark.sql(s"""INSERT INTO $tableName partition (age=25) VALUES (1, > 'Ranga')""") > res4: org.apache.spark.sql.DataFrame = []scala> spark.sql(s"show partitions > $tableName").show(50, false) > +-+ > |partition| > +-+ > |age=25 | > +-+ > scala> spark.sql(s"select * from $tableName").show(50, false) > +---+-+---+ > |id |name |age| > +---+-+---+ > |1 |Ranga|25 | > +---+-+---+ > scala> spark.sql(s"""INSERT INTO $tableName partition (age=\"test_age\") > VALUES (2, 'Nishanth')""") > res7: org.apache.spark.sql.DataFrame = []scala> spark.sql(s"show partitions > $tableName").show(50, false) > ++ > |partition | > ++ > |age=25 | > |age=test_age| > ++ > scala> spark.sql(s"select * from $tableName").show(50, false) > +---+++ > |id |name |age | > +---+++ > |1 |Ranga |25 | > |2 |Nishanth|null| > +---+++ {code} > *Hive Code:* > > > {code:java} > > INSERT INTO test_partition_table partition (age="test_age2") VALUES (3, > > 'Nishanth'); > Error: Error while compiling statement: FAILED: SemanticException [Error > 10248]: Cannot add partition column age of type string as it cannot be > converted to type int (state=42000,code=10248){code} > > *Expected Result:* > When *spark.sql.sources.validatePartitionColumns=true* it needs to be > validated the datatype value and exception needs to be thrown if we provide > wrong data type value. > *Reference:* > [https://spark.apache.org/docs/3.3.1/sql-migration-guide.html#data-sources] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41059) Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION
[ https://issues.apache.org/jira/browse/SPARK-41059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-41059. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38572 [https://github.com/apache/spark/pull/38572] > Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION > --- > > Key: SPARK-41059 > URL: https://issues.apache.org/jira/browse/SPARK-41059 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > Fix For: 3.4.0 > > > We should all _LEGACY errors to error proper named error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40096) Finalize shuffle merge slow due to connection creation fails
[ https://issues.apache.org/jira/browse/SPARK-40096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632091#comment-17632091 ] Apache Spark commented on SPARK-40096: -- User 'mridulm' has created a pull request for this issue: https://github.com/apache/spark/pull/38617 > Finalize shuffle merge slow due to connection creation fails > > > Key: SPARK-40096 > URL: https://issues.apache.org/jira/browse/SPARK-40096 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Wan Kun >Assignee: Wan Kun >Priority: Major > Fix For: 3.4.0 > > > *How to reproduce this issue* > * Enable push based shuffle > * Remove some merger nodes before sending finalize RPCs > * Driver try to connect those merger shuffle services and send finalize RPC > one by one, each connection creation will timeout after > SPARK_NETWORK_IO_CONNECTIONCREATIONTIMEOUT_KEY (120s by default) > > We can send these RPCs in *shuffleMergeFinalizeScheduler* thread pool and > handle the connection creation exception -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40096) Finalize shuffle merge slow due to connection creation fails
[ https://issues.apache.org/jira/browse/SPARK-40096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632090#comment-17632090 ] Apache Spark commented on SPARK-40096: -- User 'mridulm' has created a pull request for this issue: https://github.com/apache/spark/pull/38617 > Finalize shuffle merge slow due to connection creation fails > > > Key: SPARK-40096 > URL: https://issues.apache.org/jira/browse/SPARK-40096 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Wan Kun >Assignee: Wan Kun >Priority: Major > Fix For: 3.4.0 > > > *How to reproduce this issue* > * Enable push based shuffle > * Remove some merger nodes before sending finalize RPCs > * Driver try to connect those merger shuffle services and send finalize RPC > one by one, each connection creation will timeout after > SPARK_NETWORK_IO_CONNECTIONCREATIONTIMEOUT_KEY (120s by default) > > We can send these RPCs in *shuffleMergeFinalizeScheduler* thread pool and > handle the connection creation exception -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41110) Implement `DataFrame.sparkSession` in Python client
[ https://issues.apache.org/jira/browse/SPARK-41110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632084#comment-17632084 ] Apache Spark commented on SPARK-41110: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38616 > Implement `DataFrame.sparkSession` in Python client > --- > > Key: SPARK-41110 > URL: https://issues.apache.org/jira/browse/SPARK-41110 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41110) Implement `DataFrame.sparkSession` in Python client
[ https://issues.apache.org/jira/browse/SPARK-41110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41110: Assignee: Apache Spark > Implement `DataFrame.sparkSession` in Python client > --- > > Key: SPARK-41110 > URL: https://issues.apache.org/jira/browse/SPARK-41110 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41110) Implement `DataFrame.sparkSession` in Python client
[ https://issues.apache.org/jira/browse/SPARK-41110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632083#comment-17632083 ] Apache Spark commented on SPARK-41110: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38616 > Implement `DataFrame.sparkSession` in Python client > --- > > Key: SPARK-41110 > URL: https://issues.apache.org/jira/browse/SPARK-41110 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41110) Implement `DataFrame.sparkSession` in Python client
[ https://issues.apache.org/jira/browse/SPARK-41110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41110: Assignee: (was: Apache Spark) > Implement `DataFrame.sparkSession` in Python client > --- > > Key: SPARK-41110 > URL: https://issues.apache.org/jira/browse/SPARK-41110 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41110) Implement `DataFrame.sparkSession` in Python client
Rui Wang created SPARK-41110: Summary: Implement `DataFrame.sparkSession` in Python client Key: SPARK-41110 URL: https://issues.apache.org/jira/browse/SPARK-41110 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41108) Control the max size of arrow batch
[ https://issues.apache.org/jira/browse/SPARK-41108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41108: Assignee: Ruifeng Zheng > Control the max size of arrow batch > --- > > Key: SPARK-41108 > URL: https://issues.apache.org/jira/browse/SPARK-41108 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41108) Control the max size of arrow batch
[ https://issues.apache.org/jira/browse/SPARK-41108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41108. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38612 [https://github.com/apache/spark/pull/38612] > Control the max size of arrow batch > --- > > Key: SPARK-41108 > URL: https://issues.apache.org/jira/browse/SPARK-41108 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41109) Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN
[ https://issues.apache.org/jira/browse/SPARK-41109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41109: Assignee: Apache Spark > Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN > -- > > Key: SPARK-41109 > URL: https://issues.apache.org/jira/browse/SPARK-41109 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41109) Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN
[ https://issues.apache.org/jira/browse/SPARK-41109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41109: Assignee: (was: Apache Spark) > Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN > -- > > Key: SPARK-41109 > URL: https://issues.apache.org/jira/browse/SPARK-41109 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41109) Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN
[ https://issues.apache.org/jira/browse/SPARK-41109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632078#comment-17632078 ] Apache Spark commented on SPARK-41109: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/38615 > Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN > -- > > Key: SPARK-41109 > URL: https://issues.apache.org/jira/browse/SPARK-41109 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41109) Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN
BingKun Pan created SPARK-41109: --- Summary: Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN Key: SPARK-41109 URL: https://issues.apache.org/jira/browse/SPARK-41109 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40988) Spark3 partition column value is not validated with user provided schema.
[ https://issues.apache.org/jira/browse/SPARK-40988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632073#comment-17632073 ] Ranga Reddy commented on SPARK-40988: - In Spark 3.4, if we run the following code we can see the *CAST_INVALID_INPUT* exception. {code:java} spark.sql(s"""INSERT INTO $tableName partition (age=\"test_age\") VALUES (2, 'Nishanth')"""){code} *Exception:* {code:java} [CAST_INVALID_INPUT] The value 'AGE_34' of the type "STRING" cannot be cast to "INT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 1) == INSERT INTO TABLE partition_table PARTITION(age="AGE_34") VALUES (1, 'ABC') ^org.apache.spark.SparkNumberFormatException: [CAST_INVALID_INPUT] The value 'AGE_34' of the type "STRING" cannot be cast to "INT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 1) == INSERT INTO TABLE partition_table PARTITION(age="AGE_34") VALUES (1, 'ABC') ^ at org.apache.spark.sql.errors.QueryExecutionErrors$.invalidInputInCastToNumberError(QueryExecutionErrors.scala:161) at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.withException(UTF8StringUtils.scala:51) at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.toIntExact(UTF8StringUtils.scala:34) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$2(Cast.scala:927) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$2$adapted(Cast.scala:927) at org.apache.spark.sql.catalyst.expressions.Cast.buildCast(Cast.scala:588) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$1(Cast.scala:927) at org.apache.spark.sql.catalyst.expressions.Cast.nullSafeEval(Cast.scala:1285) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:526) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:522) at org.apache.spark.sql.util.PartitioningUtils$.normalizePartitionStringValue(PartitioningUtils.scala:56) at org.apache.spark.sql.util.PartitioningUtils$.$anonfun$normalizePartitionSpec$1(PartitioningUtils.scala:100) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.util.PartitioningUtils$.normalizePartitionSpec(PartitioningUtils.scala:76) at org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$.org$apache$spark$sql$execution$datasources$PreprocessTableInsertion$$preprocess(rules.scala:382) at org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:426) at org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:420) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:170) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:170) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:168) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning(AnalysisHelper.scala:99) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning$(AnalysisHelper.scala:96) at
[jira] [Commented] (SPARK-41005) Arrow based collect
[ https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632069#comment-17632069 ] Apache Spark commented on SPARK-41005: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/38614 > Arrow based collect > --- > > Key: SPARK-41005 > URL: https://issues.apache.org/jira/browse/SPARK-41005 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41005) Arrow based collect
[ https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632070#comment-17632070 ] Apache Spark commented on SPARK-41005: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/38614 > Arrow based collect > --- > > Key: SPARK-41005 > URL: https://issues.apache.org/jira/browse/SPARK-41005 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41005) Arrow based collect
[ https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632067#comment-17632067 ] Apache Spark commented on SPARK-41005: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/38613 > Arrow based collect > --- > > Key: SPARK-41005 > URL: https://issues.apache.org/jira/browse/SPARK-41005 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41005) Arrow based collect
[ https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632066#comment-17632066 ] Apache Spark commented on SPARK-41005: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/38613 > Arrow based collect > --- > > Key: SPARK-41005 > URL: https://issues.apache.org/jira/browse/SPARK-41005 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41108) Control the max size of arrow batch
[ https://issues.apache.org/jira/browse/SPARK-41108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632058#comment-17632058 ] Apache Spark commented on SPARK-41108: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/38612 > Control the max size of arrow batch > --- > > Key: SPARK-41108 > URL: https://issues.apache.org/jira/browse/SPARK-41108 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41108) Control the max size of arrow batch
[ https://issues.apache.org/jira/browse/SPARK-41108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41108: Assignee: Apache Spark > Control the max size of arrow batch > --- > > Key: SPARK-41108 > URL: https://issues.apache.org/jira/browse/SPARK-41108 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41108) Control the max size of arrow batch
[ https://issues.apache.org/jira/browse/SPARK-41108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632057#comment-17632057 ] Apache Spark commented on SPARK-41108: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/38612 > Control the max size of arrow batch > --- > > Key: SPARK-41108 > URL: https://issues.apache.org/jira/browse/SPARK-41108 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41108) Control the max size of arrow batch
[ https://issues.apache.org/jira/browse/SPARK-41108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41108: Assignee: (was: Apache Spark) > Control the max size of arrow batch > --- > > Key: SPARK-41108 > URL: https://issues.apache.org/jira/browse/SPARK-41108 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41108) Control the max size of arrow batch
Ruifeng Zheng created SPARK-41108: - Summary: Control the max size of arrow batch Key: SPARK-41108 URL: https://issues.apache.org/jira/browse/SPARK-41108 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41063) `hive-thriftserver` module compilation deadlock
[ https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41063: Assignee: (was: Apache Spark) > `hive-thriftserver` module compilation deadlock > --- > > Key: SPARK-41063 > URL: https://issues.apache.org/jira/browse/SPARK-41063 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z=HMACV1=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D] > > I have seen it when compiling with Maven locally, but I haven't investigated > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41107) Install memory-profiler in the CI
[ https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41107: Assignee: (was: Apache Spark) > Install memory-profiler in the CI > - > > Key: SPARK-41107 > URL: https://issues.apache.org/jira/browse/SPARK-41107 > Project: Spark > Issue Type: Sub-task > Components: Build, PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > PySpark memory profiler depends on > [memory-profiler](https://pypi.org/project/memory-profiler/) . The ticket > proposes to install memory-profiler in the CI to enable related tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41107) Install memory-profiler in the CI
[ https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632054#comment-17632054 ] Apache Spark commented on SPARK-41107: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/38611 > Install memory-profiler in the CI > - > > Key: SPARK-41107 > URL: https://issues.apache.org/jira/browse/SPARK-41107 > Project: Spark > Issue Type: Sub-task > Components: Build, PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > PySpark memory profiler depends on > [memory-profiler](https://pypi.org/project/memory-profiler/) . The ticket > proposes to install memory-profiler in the CI to enable related tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41107) Install memory-profiler in the CI
[ https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-41107: - Description: PySpark memory profiler depends on [memory-profiler|https://pypi.org/project/memory-profiler/] . The ticket proposes to install memory-profiler in the CI to enable related tests. (was: PySpark memory profiler depends on [memory-profiler](https://pypi.org/project/memory-profiler/) . The ticket proposes to install memory-profiler in the CI to enable related tests.) > Install memory-profiler in the CI > - > > Key: SPARK-41107 > URL: https://issues.apache.org/jira/browse/SPARK-41107 > Project: Spark > Issue Type: Sub-task > Components: Build, PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > PySpark memory profiler depends on > [memory-profiler|https://pypi.org/project/memory-profiler/] . The ticket > proposes to install memory-profiler in the CI to enable related tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41063) `hive-thriftserver` module compilation deadlock
[ https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41063: Assignee: Apache Spark > `hive-thriftserver` module compilation deadlock > --- > > Key: SPARK-41063 > URL: https://issues.apache.org/jira/browse/SPARK-41063 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z=HMACV1=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D] > > I have seen it when compiling with Maven locally, but I haven't investigated > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41107) Install memory-profiler in the CI
[ https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-41107: - Description: PySpark memory profiler depends on [memory-profiler|https://pypi.org/project/memory-profiler/]. The ticket proposes to install memory-profiler in the CI to enable related tests. was:PySpark memory profiler depends on [memory-profiler|https://pypi.org/project/memory-profiler/] . The ticket proposes to install memory-profiler in the CI to enable related tests. > Install memory-profiler in the CI > - > > Key: SPARK-41107 > URL: https://issues.apache.org/jira/browse/SPARK-41107 > Project: Spark > Issue Type: Sub-task > Components: Build, PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > PySpark memory profiler depends on > [memory-profiler|https://pypi.org/project/memory-profiler/]. > The ticket proposes to install memory-profiler in the CI to enable related > tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41107) Install memory-profiler in the CI
[ https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41107: Assignee: Apache Spark > Install memory-profiler in the CI > - > > Key: SPARK-41107 > URL: https://issues.apache.org/jira/browse/SPARK-41107 > Project: Spark > Issue Type: Sub-task > Components: Build, PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > PySpark memory profiler depends on > [memory-profiler](https://pypi.org/project/memory-profiler/) . The ticket > proposes to install memory-profiler in the CI to enable related tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41107) Install memory-profiler in the CI
[ https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-41107: - Description: PySpark memory profiler depends on [memory-profiler](https://pypi.org/project/memory-profiler/) . The ticket proposes to install memory-profiler in the CI to enable related tests. (was: We shall install the Memory Profiler in CI in order to enable memory profiling tests.) > Install memory-profiler in the CI > - > > Key: SPARK-41107 > URL: https://issues.apache.org/jira/browse/SPARK-41107 > Project: Spark > Issue Type: Sub-task > Components: Build, PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > PySpark memory profiler depends on > [memory-profiler](https://pypi.org/project/memory-profiler/) . The ticket > proposes to install memory-profiler in the CI to enable related tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41107) Install memory-profiler in the CI
[ https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-41107: - Summary: Install memory-profiler in the CI (was: Install memory-profiler in CI) > Install memory-profiler in the CI > - > > Key: SPARK-41107 > URL: https://issues.apache.org/jira/browse/SPARK-41107 > Project: Spark > Issue Type: Sub-task > Components: Build, PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > We shall install the Memory Profiler in CI in order to enable memory > profiling tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41107) Install memory-profiler in CI
[ https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-41107: - Summary: Install memory-profiler in CI (was: Install the Memory Profiler in CI) > Install memory-profiler in CI > - > > Key: SPARK-41107 > URL: https://issues.apache.org/jira/browse/SPARK-41107 > Project: Spark > Issue Type: Sub-task > Components: Build, PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > We shall install the Memory Profiler in CI in order to enable memory > profiling tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-41063) `hive-thriftserver` module compilation deadlock
[ https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-41063: -- Assignee: (was: Hyukjin Kwon) > `hive-thriftserver` module compilation deadlock > --- > > Key: SPARK-41063 > URL: https://issues.apache.org/jira/browse/SPARK-41063 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > Fix For: 3.4.0 > > > [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z=HMACV1=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D] > > I have seen it when compiling with Maven locally, but I haven't investigated > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41063) `hive-thriftserver` module compilation deadlock
[ https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41063: - Fix Version/s: (was: 3.4.0) > `hive-thriftserver` module compilation deadlock > --- > > Key: SPARK-41063 > URL: https://issues.apache.org/jira/browse/SPARK-41063 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z=HMACV1=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D] > > I have seen it when compiling with Maven locally, but I haven't investigated > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41063) `hive-thriftserver` module compilation deadlock
[ https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632053#comment-17632053 ] Hyukjin Kwon commented on SPARK-41063: -- Reverted at https://github.com/apache/spark/commit/73bca6e5cace0c2c46938e82fa12ab518faa2248 > `hive-thriftserver` module compilation deadlock > --- > > Key: SPARK-41063 > URL: https://issues.apache.org/jira/browse/SPARK-41063 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z=HMACV1=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D] > > I have seen it when compiling with Maven locally, but I haven't investigated > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41036) `columns` API should use `schema` API to avoid data fetching
[ https://issues.apache.org/jira/browse/SPARK-41036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-41036. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38546 [https://github.com/apache/spark/pull/38546] > `columns` API should use `schema` API to avoid data fetching > > > Key: SPARK-41036 > URL: https://issues.apache.org/jira/browse/SPARK-41036 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41036) `columns` API should use `schema` API to avoid data fetching
[ https://issues.apache.org/jira/browse/SPARK-41036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-41036: - Assignee: Rui Wang > `columns` API should use `schema` API to avoid data fetching > > > Key: SPARK-41036 > URL: https://issues.apache.org/jira/browse/SPARK-41036 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41107) Install the Memory Profiler in CI
[ https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-41107: - Component/s: Tests > Install the Memory Profiler in CI > - > > Key: SPARK-41107 > URL: https://issues.apache.org/jira/browse/SPARK-41107 > Project: Spark > Issue Type: Sub-task > Components: Build, PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > We shall install the Memory Profiler in CI in order to enable memory > profiling tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41107) Install the Memory Profiler in CI
Xinrong Meng created SPARK-41107: Summary: Install the Memory Profiler in CI Key: SPARK-41107 URL: https://issues.apache.org/jira/browse/SPARK-41107 Project: Spark Issue Type: Sub-task Components: Build, PySpark Affects Versions: 3.4.0 Reporter: Xinrong Meng We shall install the Memory Profiler in CI in order to enable memory profiling tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-41099) Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException
[ https://issues.apache.org/jira/browse/SPARK-41099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Zhang closed SPARK-41099. > Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException > > > Key: SPARK-41099 > URL: https://issues.apache.org/jira/browse/SPARK-41099 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Bo Zhang >Priority: Major > > This is similar to https://issues.apache.org/jira/browse/SPARK-40488. > Exceptions thrown in SparkHadoopWriter.write are wrapped with > SparkException("Job aborted."). > This wrapping provides little extra information, but generates a long > stacktrace, which hinders debugging when error happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-41099) Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException
[ https://issues.apache.org/jira/browse/SPARK-41099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632048#comment-17632048 ] Bo Zhang edited comment on SPARK-41099 at 11/11/22 3:08 AM: To keep the exceptions exposed to users who use the RDD APIs, we will not change this. See https://github.com/apache/spark/pull/38602#issuecomment-1310755154 was (Author: bozhang): To keep the exceptions exposed to users who use the RDD APIs, we will not change this. > Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException > > > Key: SPARK-41099 > URL: https://issues.apache.org/jira/browse/SPARK-41099 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Bo Zhang >Priority: Major > > This is similar to https://issues.apache.org/jira/browse/SPARK-40488. > Exceptions thrown in SparkHadoopWriter.write are wrapped with > SparkException("Job aborted."). > This wrapping provides little extra information, but generates a long > stacktrace, which hinders debugging when error happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41105) Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate if a field is set or unset
[ https://issues.apache.org/jira/browse/SPARK-41105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41105. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38606 [https://github.com/apache/spark/pull/38606] > Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate > if a field is set or unset > --- > > Key: SPARK-41105 > URL: https://issues.apache.org/jira/browse/SPARK-41105 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41099) Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException
[ https://issues.apache.org/jira/browse/SPARK-41099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Zhang resolved SPARK-41099. -- Resolution: Won't Fix To keep the exceptions exposed to users who use the RDD APIs, we will not change this. > Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException > > > Key: SPARK-41099 > URL: https://issues.apache.org/jira/browse/SPARK-41099 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Bo Zhang >Priority: Major > > This is similar to https://issues.apache.org/jira/browse/SPARK-40488. > Exceptions thrown in SparkHadoopWriter.write are wrapped with > SparkException("Job aborted."). > This wrapping provides little extra information, but generates a long > stacktrace, which hinders debugging when error happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41105) Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate if a field is set or unset
[ https://issues.apache.org/jira/browse/SPARK-41105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41105: Assignee: Rui Wang > Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate > if a field is set or unset > --- > > Key: SPARK-41105 > URL: https://issues.apache.org/jira/browse/SPARK-41105 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40281) Memory Profiler on Executors
[ https://issues.apache.org/jira/browse/SPARK-40281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40281. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38584 [https://github.com/apache/spark/pull/38584] > Memory Profiler on Executors > > > Key: SPARK-40281 > URL: https://issues.apache.org/jira/browse/SPARK-40281 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.4.0 > > > The ticket proposes to implement PySpark memory profiling on executors. See > more > [design|https://docs.google.com/document/d/e/2PACX-1vR2K4TdrM1eAjNDC1bsflCNRH67UWLoC-lCv6TSUVXD91Ruksm99pYTnCeIm7Ui3RgrrRNcQU_D8-oh/pub]. > There are many factors in a PySpark program’s performance. Memory, as one of > the key factors of a program’s performance, had been missing in PySpark > profiling. A PySpark program on the Spark driver can be profiled with [Memory > Profiler|https://www.google.com/url?q=https://pypi.org/project/memory-profiler/=D=editors=1668027860192689=AOvVaw1t4LRcObEGuhaTr5oHEUwU] > as a normal Python process, but there was not an easy way to profile memory > on Spark executors. > PySpark UDFs, one of the most popular Python APIs, enable users to run custom > code on top of the Apache Spark™ engine. However, it is difficult to optimize > UDFs without understanding memory consumption. > The ticket proposes to introduce the PySpark memory profiler, which profiles > memory on executors. It provides information about total memory usage and > pinpoints which lines of code in a UDF attribute to the most memory usage. > That will help optimize PySpark UDFs and reduce the likelihood of > out-of-memory errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40281) Memory Profiler on Executors
[ https://issues.apache.org/jira/browse/SPARK-40281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40281: Assignee: Xinrong Meng > Memory Profiler on Executors > > > Key: SPARK-40281 > URL: https://issues.apache.org/jira/browse/SPARK-40281 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > The ticket proposes to implement PySpark memory profiling on executors. See > more > [design|https://docs.google.com/document/d/e/2PACX-1vR2K4TdrM1eAjNDC1bsflCNRH67UWLoC-lCv6TSUVXD91Ruksm99pYTnCeIm7Ui3RgrrRNcQU_D8-oh/pub]. > There are many factors in a PySpark program’s performance. Memory, as one of > the key factors of a program’s performance, had been missing in PySpark > profiling. A PySpark program on the Spark driver can be profiled with [Memory > Profiler|https://www.google.com/url?q=https://pypi.org/project/memory-profiler/=D=editors=1668027860192689=AOvVaw1t4LRcObEGuhaTr5oHEUwU] > as a normal Python process, but there was not an easy way to profile memory > on Spark executors. > PySpark UDFs, one of the most popular Python APIs, enable users to run custom > code on top of the Apache Spark™ engine. However, it is difficult to optimize > UDFs without understanding memory consumption. > The ticket proposes to introduce the PySpark memory profiler, which profiles > memory on executors. It provides information about total memory usage and > pinpoints which lines of code in a UDF attribute to the most memory usage. > That will help optimize PySpark UDFs and reduce the likelihood of > out-of-memory errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41106) Reduce collection conversion when create AttributeMap
[ https://issues.apache.org/jira/browse/SPARK-41106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632041#comment-17632041 ] Apache Spark commented on SPARK-41106: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38610 > Reduce collection conversion when create AttributeMap > - > > Key: SPARK-41106 > URL: https://issues.apache.org/jira/browse/SPARK-41106 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41106) Reduce collection conversion when create AttributeMap
[ https://issues.apache.org/jira/browse/SPARK-41106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41106: Assignee: (was: Apache Spark) > Reduce collection conversion when create AttributeMap > - > > Key: SPARK-41106 > URL: https://issues.apache.org/jira/browse/SPARK-41106 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41106) Reduce collection conversion when create AttributeMap
[ https://issues.apache.org/jira/browse/SPARK-41106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41106: Assignee: Apache Spark > Reduce collection conversion when create AttributeMap > - > > Key: SPARK-41106 > URL: https://issues.apache.org/jira/browse/SPARK-41106 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41106) Reduce collection conversion when create AttributeMap
Yang Jie created SPARK-41106: Summary: Reduce collection conversion when create AttributeMap Key: SPARK-41106 URL: https://issues.apache.org/jira/browse/SPARK-41106 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40593) protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14
[ https://issues.apache.org/jira/browse/SPARK-40593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632030#comment-17632030 ] Apache Spark commented on SPARK-40593: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38609 > protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14 > --- > > Key: SPARK-40593 > URL: https://issues.apache.org/jira/browse/SPARK-40593 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > Compile Connect module on CentOS release 6.3, the default glibc version is > 2.12, this will cause compilation to fail as follows: > {code:java} > [ERROR] PROTOC FAILED: > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe: > /lib64/libc.so.6: version `GLIBC_2.14' not found (required by > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe) > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe: > /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.18' not found (required by > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe) > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe: > /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe) > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe: > /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.5' not found (required by > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40593) protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14
[ https://issues.apache.org/jira/browse/SPARK-40593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632028#comment-17632028 ] Apache Spark commented on SPARK-40593: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38609 > protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14 > --- > > Key: SPARK-40593 > URL: https://issues.apache.org/jira/browse/SPARK-40593 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > Compile Connect module on CentOS release 6.3, the default glibc version is > 2.12, this will cause compilation to fail as follows: > {code:java} > [ERROR] PROTOC FAILED: > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe: > /lib64/libc.so.6: version `GLIBC_2.14' not found (required by > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe) > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe: > /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.18' not found (required by > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe) > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe: > /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe) > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe: > /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.5' not found (required by > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40593) protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14
[ https://issues.apache.org/jira/browse/SPARK-40593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40593: Assignee: Apache Spark > protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14 > --- > > Key: SPARK-40593 > URL: https://issues.apache.org/jira/browse/SPARK-40593 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > Compile Connect module on CentOS release 6.3, the default glibc version is > 2.12, this will cause compilation to fail as follows: > {code:java} > [ERROR] PROTOC FAILED: > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe: > /lib64/libc.so.6: version `GLIBC_2.14' not found (required by > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe) > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe: > /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.18' not found (required by > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe) > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe: > /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe) > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe: > /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.5' not found (required by > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40593) protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14
[ https://issues.apache.org/jira/browse/SPARK-40593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40593: Assignee: (was: Apache Spark) > protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14 > --- > > Key: SPARK-40593 > URL: https://issues.apache.org/jira/browse/SPARK-40593 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > Compile Connect module on CentOS release 6.3, the default glibc version is > 2.12, this will cause compilation to fail as follows: > {code:java} > [ERROR] PROTOC FAILED: > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe: > /lib64/libc.so.6: version `GLIBC_2.14' not found (required by > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe) > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe: > /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.18' not found (required by > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe) > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe: > /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe) > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe: > /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.5' not found (required by > /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41077) Rename `ColumnRef` to `Column` in Python client implementation
[ https://issues.apache.org/jira/browse/SPARK-41077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41077: Assignee: Rui Wang > Rename `ColumnRef` to `Column` in Python client implementation > --- > > Key: SPARK-41077 > URL: https://issues.apache.org/jira/browse/SPARK-41077 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41077) Rename `ColumnRef` to `Column` in Python client implementation
[ https://issues.apache.org/jira/browse/SPARK-41077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41077. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38586 [https://github.com/apache/spark/pull/38586] > Rename `ColumnRef` to `Column` in Python client implementation > --- > > Key: SPARK-41077 > URL: https://issues.apache.org/jira/browse/SPARK-41077 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41063) `hive-thriftserver` module compilation deadlock
[ https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41063. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38599 [https://github.com/apache/spark/pull/38599] > `hive-thriftserver` module compilation deadlock > --- > > Key: SPARK-41063 > URL: https://issues.apache.org/jira/browse/SPARK-41063 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z=HMACV1=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D] > > I have seen it when compiling with Maven locally, but I haven't investigated > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41063) `hive-thriftserver` module compilation deadlock
[ https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41063: Assignee: Hyukjin Kwon > `hive-thriftserver` module compilation deadlock > --- > > Key: SPARK-41063 > URL: https://issues.apache.org/jira/browse/SPARK-41063 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Hyukjin Kwon >Priority: Major > > [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z=HMACV1=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D] > > I have seen it when compiling with Maven locally, but I haven't investigated > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41005) Arrow based collect
[ https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-41005: - Assignee: Ruifeng Zheng > Arrow based collect > --- > > Key: SPARK-41005 > URL: https://issues.apache.org/jira/browse/SPARK-41005 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41005) Arrow based collect
[ https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-41005. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38468 [https://github.com/apache/spark/pull/38468] > Arrow based collect > --- > > Key: SPARK-41005 > URL: https://issues.apache.org/jira/browse/SPARK-41005 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41080) Support Bit manipulation function SETBIT
[ https://issues.apache.org/jira/browse/SPARK-41080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631991#comment-17631991 ] Apache Spark commented on SPARK-41080: -- User 'vinodkc' has created a pull request for this issue: https://github.com/apache/spark/pull/38608 > Support Bit manipulation function SETBIT > > > Key: SPARK-41080 > URL: https://issues.apache.org/jira/browse/SPARK-41080 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.4 >Reporter: Vinod KC >Priority: Minor > > Support function to change a bit at a specified position. It shall change a > bit at a specified position to a 1, If the optional third argument is set to > zero, the specified bit shall be set to 0 instead. > SETBIT(integer_type a, INT position [, INT zero_or_one]) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41080) Support Bit manipulation function SETBIT
[ https://issues.apache.org/jira/browse/SPARK-41080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41080: Assignee: (was: Apache Spark) > Support Bit manipulation function SETBIT > > > Key: SPARK-41080 > URL: https://issues.apache.org/jira/browse/SPARK-41080 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.4 >Reporter: Vinod KC >Priority: Minor > > Support function to change a bit at a specified position. It shall change a > bit at a specified position to a 1, If the optional third argument is set to > zero, the specified bit shall be set to 0 instead. > SETBIT(integer_type a, INT position [, INT zero_or_one]) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41080) Support Bit manipulation function SETBIT
[ https://issues.apache.org/jira/browse/SPARK-41080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631990#comment-17631990 ] Apache Spark commented on SPARK-41080: -- User 'vinodkc' has created a pull request for this issue: https://github.com/apache/spark/pull/38608 > Support Bit manipulation function SETBIT > > > Key: SPARK-41080 > URL: https://issues.apache.org/jira/browse/SPARK-41080 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.4 >Reporter: Vinod KC >Priority: Minor > > Support function to change a bit at a specified position. It shall change a > bit at a specified position to a 1, If the optional third argument is set to > zero, the specified bit shall be set to 0 instead. > SETBIT(integer_type a, INT position [, INT zero_or_one]) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41080) Support Bit manipulation function SETBIT
[ https://issues.apache.org/jira/browse/SPARK-41080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41080: Assignee: Apache Spark > Support Bit manipulation function SETBIT > > > Key: SPARK-41080 > URL: https://issues.apache.org/jira/browse/SPARK-41080 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.4 >Reporter: Vinod KC >Assignee: Apache Spark >Priority: Minor > > Support function to change a bit at a specified position. It shall change a > bit at a specified position to a 1, If the optional third argument is set to > zero, the specified bit shall be set to 0 instead. > SETBIT(integer_type a, INT position [, INT zero_or_one]) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40938) Support Alias for every Relation
[ https://issues.apache.org/jira/browse/SPARK-40938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631982#comment-17631982 ] Apache Spark commented on SPARK-40938: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38607 > Support Alias for every Relation > > > Key: SPARK-40938 > URL: https://issues.apache.org/jira/browse/SPARK-40938 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40938) Support Alias for every Relation
[ https://issues.apache.org/jira/browse/SPARK-40938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631983#comment-17631983 ] Apache Spark commented on SPARK-40938: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38607 > Support Alias for every Relation > > > Key: SPARK-40938 > URL: https://issues.apache.org/jira/browse/SPARK-40938 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40901) Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path
[ https://issues.apache.org/jira/browse/SPARK-40901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-40901. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38377 [https://github.com/apache/spark/pull/38377] > Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path > > > Key: SPARK-40901 > URL: https://issues.apache.org/jira/browse/SPARK-40901 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0, 3.2.2 >Reporter: Swaminathan Balachandran >Assignee: Swaminathan Balachandran >Priority: Major > Fix For: 3.4.0 > > > Spark Config: spark.driver.log.dfsDir doesn't support absolute URI hadoop > based path. It currently only supports URI path and writes only to > fs.defaultFS and does not write logs to any other configured filesystem. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40901) Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path
[ https://issues.apache.org/jira/browse/SPARK-40901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-40901: --- Assignee: Swaminathan Balachandran > Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path > > > Key: SPARK-40901 > URL: https://issues.apache.org/jira/browse/SPARK-40901 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0, 3.2.2 >Reporter: Swaminathan Balachandran >Assignee: Swaminathan Balachandran >Priority: Major > > Spark Config: spark.driver.log.dfsDir doesn't support absolute URI hadoop > based path. It currently only supports URI path and writes only to > fs.defaultFS and does not write logs to any other configured filesystem. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name
[ https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serhii Nesterov updated SPARK-41060: Attachment: Screenshot 2022-11-09 015432.png > Spark Submitter generates a ConfigMap with the same name > > > Key: SPARK-41060 > URL: https://issues.apache.org/jira/browse/SPARK-41060 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.2, 3.3.0, 3.3.1 >Reporter: Serhii Nesterov >Priority: Major > Attachments: Screenshot 2022-11-09 015432.png > > > *Description of the issue:* > There's a problem with submitting spark jobs to K8s cluster: the library > generates and reuses the same name for config maps (for drivers and > executors). Ideally, for each job 2 config maps should be created: for a > driver and an executor. However, the library creates only one driver config > map for all jobs (in some cases it generates only one executor map for all > jobs in the same manner). So, if I run 5 jobs, then only one driver config > map will be generated and used for every job. During those runs we > experience issues when deleting pods from the cluster: executors pods are > endlessly created and immediately terminated overloading cluster resources. > > *The reason of the issue:* > This problem occurs because of the *KubernetesClientUtils* class in which we > have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems > to be incorrect and should be urgently fixed. I've prepared some changes for > review to fix the issue (tested in the cluster of our project). > > *Steps to reproduce the issue:* > > # Create a *KubernetesClientApplication* object. > # Submit at least 2 jobs (sequentially or using *Thread* for running in > parallel). > > *The results of my observations according to the steps are as follows:* > # Spark 3.1.2 - The same config map in K8S will be overwritten which means > all the jobs will point to the same config map. > # Spark 3.3.* - For the first job a new config map will be created. For > other jobs an exception will be thrown (the K8S Fabric library does not allow > to create a new config map with the existing name). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name
[ https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631972#comment-17631972 ] Serhii Nesterov commented on SPARK-41060: - After applying the fixes from the pull request config maps are created correctly: !Screenshot 2022-11-09 015432.png! > Spark Submitter generates a ConfigMap with the same name > > > Key: SPARK-41060 > URL: https://issues.apache.org/jira/browse/SPARK-41060 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.2, 3.3.0, 3.3.1 >Reporter: Serhii Nesterov >Priority: Major > Attachments: Screenshot 2022-11-09 015432.png > > > *Description of the issue:* > There's a problem with submitting spark jobs to K8s cluster: the library > generates and reuses the same name for config maps (for drivers and > executors). Ideally, for each job 2 config maps should be created: for a > driver and an executor. However, the library creates only one driver config > map for all jobs (in some cases it generates only one executor map for all > jobs in the same manner). So, if I run 5 jobs, then only one driver config > map will be generated and used for every job. During those runs we > experience issues when deleting pods from the cluster: executors pods are > endlessly created and immediately terminated overloading cluster resources. > > *The reason of the issue:* > This problem occurs because of the *KubernetesClientUtils* class in which we > have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems > to be incorrect and should be urgently fixed. I've prepared some changes for > review to fix the issue (tested in the cluster of our project). > > *Steps to reproduce the issue:* > > # Create a *KubernetesClientApplication* object. > # Submit at least 2 jobs (sequentially or using *Thread* for running in > parallel). > > *The results of my observations according to the steps are as follows:* > # Spark 3.1.2 - The same config map in K8S will be overwritten which means > all the jobs will point to the same config map. > # Spark 3.3.* - For the first job a new config map will be created. For > other jobs an exception will be thrown (the K8S Fabric library does not allow > to create a new config map with the existing name). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name
[ https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serhii Nesterov updated SPARK-41060: Description: *Description of the issue:* There's a problem with submitting spark jobs to K8s cluster: the library generates and reuses the same name for config maps (for drivers and executors). Ideally, for each job 2 config maps should be created: for a driver and an executor. However, the library creates only one driver config map for all jobs (in some cases it generates only one executor map for all jobs in the same manner). So, if I run 5 jobs, then only one driver config map will be generated and used for every job. During those runs we experience issues when deleting pods from the cluster: executors pods are endlessly created and immediately terminated overloading cluster resources. *The reason of the issue:* This problem occurs because of the *KubernetesClientUtils* class in which we have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems to be incorrect and should be urgently fixed. I've prepared some changes for review to fix the issue (tested in the cluster of our project). *Steps to reproduce the issue:* # Create a *KubernetesClientApplication* object. # Submit at least 2 jobs (sequentially or using *Thread* for running in parallel). *The results of my observations according to the steps are as follows:* # Spark 3.1.2 - The same config map in K8S will be overwritten which means all the jobs will point to the same config map. # Spark 3.3.* - For the first job a new config map will be created. For other jobs an exception will be thrown (the K8S Fabric library does not allow to create a new config map with the existing name). was: There's a problem with submitting spark jobs to K8s cluster: the library generates and reuses the same name for config maps (for drivers and executors). Ideally, for each job 2 config maps should be created: for a driver and an executor. However, the library creates only one driver config map for all jobs (in some cases it generates only one executor map for all jobs in the same manner). So, if I run 5 jobs, then only one driver config map will be generated and used for every job. During those runs we experience issues when deleting pods from the cluster: executors pods are endlessly created and immediately terminated overloading cluster resources. This problem occurs because of the *KubernetesClientUtils* class in which we have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems to be incorrect and should be urgently fixed. I've prepared some changes for review to fix the issue (tested in the cluster of our project). Steps to reproduce the issue: # Create a *KubernetesClientApplication* object. # Submit at least 2 jobs (sequentially or using *Thread* for running in parallel). The results of my observations according to the steps are as follows: # Spark 3.1.2 - The same config map in K8S will be overwritten which means all the jobs will point to the same config map. # Spark 3.3.* - For the first job a new config map will be created. For other jobs an exception will be thrown (the K8S Fabric library does not allow to create a new config map with the existing name). > Spark Submitter generates a ConfigMap with the same name > > > Key: SPARK-41060 > URL: https://issues.apache.org/jira/browse/SPARK-41060 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.2, 3.3.0, 3.3.1 >Reporter: Serhii Nesterov >Priority: Major > > *Description of the issue:* > There's a problem with submitting spark jobs to K8s cluster: the library > generates and reuses the same name for config maps (for drivers and > executors). Ideally, for each job 2 config maps should be created: for a > driver and an executor. However, the library creates only one driver config > map for all jobs (in some cases it generates only one executor map for all > jobs in the same manner). So, if I run 5 jobs, then only one driver config > map will be generated and used for every job. During those runs we > experience issues when deleting pods from the cluster: executors pods are > endlessly created and immediately terminated overloading cluster resources. > > *The reason of the issue:* > This problem occurs because of the *KubernetesClientUtils* class in which we > have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems > to be incorrect and should be urgently fixed. I've prepared some changes for > review to fix the issue (tested in the cluster of our project). > > *Steps to reproduce the issue:* > > # Create a *KubernetesClientApplication* object. > # Submit at least 2 jobs
[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name
[ https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serhii Nesterov updated SPARK-41060: Description: There's a problem with submitting spark jobs to K8s cluster: the library generates and reuses the same name for config maps (for drivers and executors). Ideally, for each job 2 config maps should be created: for a driver and an executor. However, the library creates only one driver config map for all jobs (in some cases it generates only one executor map for all jobs in the same manner). So, if I run 5 jobs, then only one driver config map will be generated and used for every job. During those runs we experience issues when deleting pods from the cluster: executors pods are endlessly created and immediately terminated overloading cluster resources. This problem occurs because of the *KubernetesClientUtils* class in which we have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems to be incorrect and should be urgently fixed. I've prepared some changes for review to fix the issue (tested in the cluster of our project). Steps to reproduce the issue: # Create a *KubernetesClientApplication* object. # Submit at least 2 jobs (sequentially or using *Thread* for running in parallel). The results of my observations according to the steps are as follows: # Spark 3.1.2 - The same config map in K8S will be overwritten which means all the jobs will point to the same config map. # Spark 3.3.* - For the first job a new config map will be created. For other jobs an exception will be thrown (the K8S Fabric library does not allow to create a new config map with the existing name). was: There's a problem with submitting spark jobs to K8s cluster: the library generates and reuses the same name for config maps (for drivers and executors). Ideally, for each job 2 config maps should be created: for a driver and an executor. However, the library creates only one driver config map for all jobs (in some cases it generates only one executor map for all jobs in the same manner). So, if I run 5 jobs, then only one driver config map will be generated and used for every job. During those runs we experience issues when deleting pods from the cluster: executors pods are endlessly created and immediately terminated overloading cluster resources. This problem occurs because of the *KubernetesClientUtils* class in which we have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems to be incorrect and should be urgently fixed. I've prepared some changes for review to fix the issue (tested in the cluster of our project). > Spark Submitter generates a ConfigMap with the same name > > > Key: SPARK-41060 > URL: https://issues.apache.org/jira/browse/SPARK-41060 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.2, 3.3.0, 3.3.1 >Reporter: Serhii Nesterov >Priority: Major > > There's a problem with submitting spark jobs to K8s cluster: the library > generates and reuses the same name for config maps (for drivers and > executors). Ideally, for each job 2 config maps should be created: for a > driver and an executor. However, the library creates only one driver config > map for all jobs (in some cases it generates only one executor map for all > jobs in the same manner). So, if I run 5 jobs, then only one driver config > map will be generated and used for every job. During those runs we > experience issues when deleting pods from the cluster: executors pods are > endlessly created and immediately terminated overloading cluster resources. > This problem occurs because of the *KubernetesClientUtils* class in which we > have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems > to be incorrect and should be urgently fixed. I've prepared some changes for > review to fix the issue (tested in the cluster of our project). > > Steps to reproduce the issue: > > # Create a *KubernetesClientApplication* object. > # Submit at least 2 jobs (sequentially or using *Thread* for running in > parallel). > > The results of my observations according to the steps are as follows: > # Spark 3.1.2 - The same config map in K8S will be overwritten which means > all the jobs will point to the same config map. > # Spark 3.3.* - For the first job a new config map will be created. For > other jobs an exception will be thrown (the K8S Fabric library does not allow > to create a new config map with the existing name). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail:
[jira] [Commented] (SPARK-41105) Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate if a field is set or unset
[ https://issues.apache.org/jira/browse/SPARK-41105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631967#comment-17631967 ] Apache Spark commented on SPARK-41105: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38606 > Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate > if a field is set or unset > --- > > Key: SPARK-41105 > URL: https://issues.apache.org/jira/browse/SPARK-41105 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41105) Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate if a field is set or unset
[ https://issues.apache.org/jira/browse/SPARK-41105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41105: Assignee: Apache Spark > Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate > if a field is set or unset > --- > > Key: SPARK-41105 > URL: https://issues.apache.org/jira/browse/SPARK-41105 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41105) Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate if a field is set or unset
[ https://issues.apache.org/jira/browse/SPARK-41105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41105: Assignee: (was: Apache Spark) > Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate > if a field is set or unset > --- > > Key: SPARK-41105 > URL: https://issues.apache.org/jira/browse/SPARK-41105 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41105) Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate if a field is set or unset
Rui Wang created SPARK-41105: Summary: Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate if a field is set or unset Key: SPARK-41105 URL: https://issues.apache.org/jira/browse/SPARK-41105 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41104) Can insert NULL into hive table table with NOT NULL column
[ https://issues.apache.org/jira/browse/SPARK-41104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631946#comment-17631946 ] Rui Wang commented on SPARK-41104: -- Looks like HIVE only enforce `NOT NULL` since Hive 3.0.0 https://issues.apache.org/jira/browse/HIVE-16575 > Can insert NULL into hive table table with NOT NULL column > -- > > Key: SPARK-41104 > URL: https://issues.apache.org/jira/browse/SPARK-41104 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Critical > > spark-sql> CREATE TABLE tttd(c1 int not null); > 22/11/10 14:04:28 WARN ResolveSessionCatalog: A Hive serde table will be > created as there is no table provider specified. You can set > spark.sql.legacy.createHiveTableByDefault to false so that native data source > table will be created instead. > 22/11/10 14:04:28 WARN HiveMetaStore: Location: > file:/Users/serge.rielau/spark/spark-warehouse/tttd specified for > non-external table:tttd > Time taken: 0.078 seconds > spark-sql> INSERT INTO tttd VALUES(null); > Time taken: 0.36 seconds > spark-sql> SELECT * FROM tttd; > NULL > Time taken: 0.074 seconds, Fetched 1 row(s) > spark-sql> > Does hive not support NOT NULL? That's fine, but then we should fail on > CREATE TABLE -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org