[jira] [Assigned] (SPARK-41102) Merge SparkConnectPlanner and SparkConnectCommandPlanner

2022-11-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-41102:
---

Assignee: Rui Wang

> Merge SparkConnectPlanner and SparkConnectCommandPlanner
> 
>
> Key: SPARK-41102
> URL: https://issues.apache.org/jira/browse/SPARK-41102
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41102) Merge SparkConnectPlanner and SparkConnectCommandPlanner

2022-11-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-41102.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38604
[https://github.com/apache/spark/pull/38604]

> Merge SparkConnectPlanner and SparkConnectCommandPlanner
> 
>
> Key: SPARK-41102
> URL: https://issues.apache.org/jira/browse/SPARK-41102
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41113) Upgrade sbt to 1.8.0

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632133#comment-17632133
 ] 

Apache Spark commented on SPARK-41113:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38620

> Upgrade sbt to 1.8.0
> 
>
> Key: SPARK-41113
> URL: https://issues.apache.org/jira/browse/SPARK-41113
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> https://github.com/sbt/sbt/releases/tag/v1.8.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41113) Upgrade sbt to 1.8.0

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632132#comment-17632132
 ] 

Apache Spark commented on SPARK-41113:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38620

> Upgrade sbt to 1.8.0
> 
>
> Key: SPARK-41113
> URL: https://issues.apache.org/jira/browse/SPARK-41113
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> https://github.com/sbt/sbt/releases/tag/v1.8.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41113) Upgrade sbt to 1.8.0

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41113:


Assignee: (was: Apache Spark)

> Upgrade sbt to 1.8.0
> 
>
> Key: SPARK-41113
> URL: https://issues.apache.org/jira/browse/SPARK-41113
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> https://github.com/sbt/sbt/releases/tag/v1.8.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41113) Upgrade sbt to 1.8.0

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41113:


Assignee: Apache Spark

> Upgrade sbt to 1.8.0
> 
>
> Key: SPARK-41113
> URL: https://issues.apache.org/jira/browse/SPARK-41113
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> https://github.com/sbt/sbt/releases/tag/v1.8.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41113) Upgrade sbt to 1.8.0

2022-11-10 Thread Yang Jie (Jira)
Yang Jie created SPARK-41113:


 Summary: Upgrade sbt to 1.8.0
 Key: SPARK-41113
 URL: https://issues.apache.org/jira/browse/SPARK-41113
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie


https://github.com/sbt/sbt/releases/tag/v1.8.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41112) RuntimeFilter should apply ColumnPruning eagerly with in-subquery filter

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41112:


Assignee: Apache Spark

> RuntimeFilter should apply ColumnPruning eagerly with in-subquery filter
> 
>
> Key: SPARK-41112
> URL: https://issues.apache.org/jira/browse/SPARK-41112
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
> The inferred in-subquery filter should apply ColumnPruning before get plan 
> statistics and check if can be broadcasted. Otherwise, the final physical 
> plan will be different from expected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41112) RuntimeFilter should apply ColumnPruning eagerly with in-subquery filter

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632128#comment-17632128
 ] 

Apache Spark commented on SPARK-41112:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/38619

> RuntimeFilter should apply ColumnPruning eagerly with in-subquery filter
> 
>
> Key: SPARK-41112
> URL: https://issues.apache.org/jira/browse/SPARK-41112
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> The inferred in-subquery filter should apply ColumnPruning before get plan 
> statistics and check if can be broadcasted. Otherwise, the final physical 
> plan will be different from expected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41112) RuntimeFilter should apply ColumnPruning eagerly with in-subquery filter

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41112:


Assignee: (was: Apache Spark)

> RuntimeFilter should apply ColumnPruning eagerly with in-subquery filter
> 
>
> Key: SPARK-41112
> URL: https://issues.apache.org/jira/browse/SPARK-41112
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> The inferred in-subquery filter should apply ColumnPruning before get plan 
> statistics and check if can be broadcasted. Otherwise, the final physical 
> plan will be different from expected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41112) RuntimeFilter should apply ColumnPruning eagerly with in-subquery filter

2022-11-10 Thread XiDuo You (Jira)
XiDuo You created SPARK-41112:
-

 Summary: RuntimeFilter should apply ColumnPruning eagerly with 
in-subquery filter
 Key: SPARK-41112
 URL: https://issues.apache.org/jira/browse/SPARK-41112
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: XiDuo You


The inferred in-subquery filter should apply ColumnPruning before get plan 
statistics and check if can be broadcasted. Otherwise, the final physical plan 
will be different from expected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41005) Arrow based collect

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632119#comment-17632119
 ] 

Apache Spark commented on SPARK-41005:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/38618

> Arrow based collect
> ---
>
> Key: SPARK-41005
> URL: https://issues.apache.org/jira/browse/SPARK-41005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41005) Arrow based collect

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632117#comment-17632117
 ] 

Apache Spark commented on SPARK-41005:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/38618

> Arrow based collect
> ---
>
> Key: SPARK-41005
> URL: https://issues.apache.org/jira/browse/SPARK-41005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41005) Arrow based collect

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632118#comment-17632118
 ] 

Apache Spark commented on SPARK-41005:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/38618

> Arrow based collect
> ---
>
> Key: SPARK-41005
> URL: https://issues.apache.org/jira/browse/SPARK-41005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41108) Control the max size of arrow batch

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632115#comment-17632115
 ] 

Apache Spark commented on SPARK-41108:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/38618

> Control the max size of arrow batch
> ---
>
> Key: SPARK-41108
> URL: https://issues.apache.org/jira/browse/SPARK-41108
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41111) Implement `DataFrame.show`

2022-11-10 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-4:
-

 Summary: Implement `DataFrame.show`
 Key: SPARK-4
 URL: https://issues.apache.org/jira/browse/SPARK-4
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41111) Implement `DataFrame.show`

2022-11-10 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-4:
-

Assignee: Ruifeng Zheng

> Implement `DataFrame.show`
> --
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41108) Control the max size of arrow batch

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632116#comment-17632116
 ] 

Apache Spark commented on SPARK-41108:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/38618

> Control the max size of arrow batch
> ---
>
> Key: SPARK-41108
> URL: https://issues.apache.org/jira/browse/SPARK-41108
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40798) Alter partition should verify value

2022-11-10 Thread Ranga Reddy (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632106#comment-17632106
 ] 

Ranga Reddy commented on SPARK-40798:
-

The below issue is addressed in the current Jira. Can I add a test case for the 
below issue in InsertSuite.scala? 

https://issues.apache.org/jira/browse/SPARK-40988

> Alter partition should verify value
> ---
>
> Key: SPARK-40798
> URL: https://issues.apache.org/jira/browse/SPARK-40798
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.4.0
>
>
>  
> {code:java}
> CREATE TABLE t (c int) USING PARQUET PARTITIONED BY(p int);
> -- This DDL should fail but worked:
> ALTER TABLE t ADD PARTITION(p='aaa'); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41095) Convert unresolved operators to internal errors

2022-11-10 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-41095.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38582
[https://github.com/apache/spark/pull/38582]

> Convert unresolved operators to internal errors
> ---
>
> Key: SPARK-41095
> URL: https://issues.apache.org/jira/browse/SPARK-41095
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> The 'unresolved operator' error is considered as a bug in most cases. Need to 
> convert any such errors to internal errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40988) Spark3 partition column value is not validated with user provided schema.

2022-11-10 Thread Ranga Reddy (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632102#comment-17632102
 ] 

Ranga Reddy commented on SPARK-40988:
-

The following Jira will resolve the issue by throwing the *CAST_INVALID_INPUT* 
error.

https://issues.apache.org/jira/browse/SPARK-40798

> Spark3 partition column value is not validated with user provided schema.
> -
>
> Key: SPARK-40988
> URL: https://issues.apache.org/jira/browse/SPARK-40988
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Ranga Reddy
>Priority: Major
>
> Spark3 has not validated the Partition Column type while inserting the data 
> but on the Hive side exception is thrown while inserting different type 
> values.
> *Spark Code:*
>  
> {code:java}
> scala> val tableName="test_partition_table"
> tableName: String = test_partition_table
> scala>scala> spark.sql(s"DROP TABLE IF EXISTS $tableName")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql(s"CREATE EXTERNAL TABLE $tableName ( id INT, name STRING ) 
> PARTITIONED BY (age INT) LOCATION 'file:/tmp/spark-warehouse/$tableName'")
> res1: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("SHOW tables").show(truncate=false)
> +-+-+---+
> |namespace|tableName            |isTemporary|
> +-+-+---+
> |default  |test_partition_table |false      |
> +-+-+---+
> scala> spark.sql("SET spark.sql.sources.validatePartitionColumns").show(50, 
> false)
> +--+-+
> |key                                       |value|
> +--+-+
> |spark.sql.sources.validatePartitionColumns|true |
> +--+-+
> scala> spark.sql(s"""INSERT INTO $tableName partition (age=25) VALUES (1, 
> 'Ranga')""")
> res4: org.apache.spark.sql.DataFrame = []scala> spark.sql(s"show partitions 
> $tableName").show(50, false)
> +-+
> |partition|
> +-+
> |age=25   |
> +-+
> scala> spark.sql(s"select * from $tableName").show(50, false)
> +---+-+---+
> |id |name |age|
> +---+-+---+
> |1  |Ranga|25 |
> +---+-+---+
> scala> spark.sql(s"""INSERT INTO $tableName partition (age=\"test_age\") 
> VALUES (2, 'Nishanth')""")
> res7: org.apache.spark.sql.DataFrame = []scala> spark.sql(s"show partitions 
> $tableName").show(50, false)
> ++
> |partition   |
> ++
> |age=25      |
> |age=test_age|
> ++
> scala> spark.sql(s"select * from $tableName").show(50, false)
> +---+++
> |id |name    |age |
> +---+++
> |1  |Ranga   |25  |
> |2  |Nishanth|null|
> +---+++ {code}
> *Hive Code:*
>  
>  
> {code:java}
> > INSERT INTO test_partition_table partition (age="test_age2") VALUES (3, 
> > 'Nishanth');
> Error: Error while compiling statement: FAILED: SemanticException [Error 
> 10248]: Cannot add partition column age of type string as it cannot be 
> converted to type int (state=42000,code=10248){code}
>  
> *Expected Result:*
> When *spark.sql.sources.validatePartitionColumns=true* it needs to be 
> validated the datatype value and exception needs to be thrown if we provide 
> wrong data type value.
> *Reference:*
> [https://spark.apache.org/docs/3.3.1/sql-migration-guide.html#data-sources]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41059) Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION

2022-11-10 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-41059.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38572
[https://github.com/apache/spark/pull/38572]

> Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION
> ---
>
> Key: SPARK-41059
> URL: https://issues.apache.org/jira/browse/SPARK-41059
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.4.0
>
>
> We should all _LEGACY errors to error proper named error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40096) Finalize shuffle merge slow due to connection creation fails

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632091#comment-17632091
 ] 

Apache Spark commented on SPARK-40096:
--

User 'mridulm' has created a pull request for this issue:
https://github.com/apache/spark/pull/38617

> Finalize shuffle merge slow due to connection creation fails
> 
>
> Key: SPARK-40096
> URL: https://issues.apache.org/jira/browse/SPARK-40096
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Wan Kun
>Assignee: Wan Kun
>Priority: Major
> Fix For: 3.4.0
>
>
> *How to reproduce this issue*
>  * Enable push based shuffle
>  * Remove some merger nodes before sending finalize RPCs
>  * Driver try to connect those merger shuffle services and send finalize RPC 
> one by one, each connection creation will timeout after 
> SPARK_NETWORK_IO_CONNECTIONCREATIONTIMEOUT_KEY (120s by default)
>  
> We can send these RPCs in *shuffleMergeFinalizeScheduler*  thread pool and 
> handle the connection creation exception



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40096) Finalize shuffle merge slow due to connection creation fails

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632090#comment-17632090
 ] 

Apache Spark commented on SPARK-40096:
--

User 'mridulm' has created a pull request for this issue:
https://github.com/apache/spark/pull/38617

> Finalize shuffle merge slow due to connection creation fails
> 
>
> Key: SPARK-40096
> URL: https://issues.apache.org/jira/browse/SPARK-40096
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Wan Kun
>Assignee: Wan Kun
>Priority: Major
> Fix For: 3.4.0
>
>
> *How to reproduce this issue*
>  * Enable push based shuffle
>  * Remove some merger nodes before sending finalize RPCs
>  * Driver try to connect those merger shuffle services and send finalize RPC 
> one by one, each connection creation will timeout after 
> SPARK_NETWORK_IO_CONNECTIONCREATIONTIMEOUT_KEY (120s by default)
>  
> We can send these RPCs in *shuffleMergeFinalizeScheduler*  thread pool and 
> handle the connection creation exception



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41110) Implement `DataFrame.sparkSession` in Python client

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632084#comment-17632084
 ] 

Apache Spark commented on SPARK-41110:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38616

> Implement `DataFrame.sparkSession` in Python client
> ---
>
> Key: SPARK-41110
> URL: https://issues.apache.org/jira/browse/SPARK-41110
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41110) Implement `DataFrame.sparkSession` in Python client

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41110:


Assignee: Apache Spark

> Implement `DataFrame.sparkSession` in Python client
> ---
>
> Key: SPARK-41110
> URL: https://issues.apache.org/jira/browse/SPARK-41110
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41110) Implement `DataFrame.sparkSession` in Python client

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632083#comment-17632083
 ] 

Apache Spark commented on SPARK-41110:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38616

> Implement `DataFrame.sparkSession` in Python client
> ---
>
> Key: SPARK-41110
> URL: https://issues.apache.org/jira/browse/SPARK-41110
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41110) Implement `DataFrame.sparkSession` in Python client

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41110:


Assignee: (was: Apache Spark)

> Implement `DataFrame.sparkSession` in Python client
> ---
>
> Key: SPARK-41110
> URL: https://issues.apache.org/jira/browse/SPARK-41110
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41110) Implement `DataFrame.sparkSession` in Python client

2022-11-10 Thread Rui Wang (Jira)
Rui Wang created SPARK-41110:


 Summary: Implement `DataFrame.sparkSession` in Python client
 Key: SPARK-41110
 URL: https://issues.apache.org/jira/browse/SPARK-41110
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41108) Control the max size of arrow batch

2022-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41108:


Assignee: Ruifeng Zheng

> Control the max size of arrow batch
> ---
>
> Key: SPARK-41108
> URL: https://issues.apache.org/jira/browse/SPARK-41108
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41108) Control the max size of arrow batch

2022-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41108.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38612
[https://github.com/apache/spark/pull/38612]

> Control the max size of arrow batch
> ---
>
> Key: SPARK-41108
> URL: https://issues.apache.org/jira/browse/SPARK-41108
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41109) Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41109:


Assignee: Apache Spark

> Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN
> --
>
> Key: SPARK-41109
> URL: https://issues.apache.org/jira/browse/SPARK-41109
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41109) Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41109:


Assignee: (was: Apache Spark)

> Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN
> --
>
> Key: SPARK-41109
> URL: https://issues.apache.org/jira/browse/SPARK-41109
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41109) Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632078#comment-17632078
 ] 

Apache Spark commented on SPARK-41109:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38615

> Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN
> --
>
> Key: SPARK-41109
> URL: https://issues.apache.org/jira/browse/SPARK-41109
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41109) Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN

2022-11-10 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-41109:
---

 Summary: Rename the error class _LEGACY_ERROR_TEMP_1216 to 
INVALID_LIKE_PATTERN
 Key: SPARK-41109
 URL: https://issues.apache.org/jira/browse/SPARK-41109
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40988) Spark3 partition column value is not validated with user provided schema.

2022-11-10 Thread Ranga Reddy (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632073#comment-17632073
 ] 

Ranga Reddy commented on SPARK-40988:
-

In Spark 3.4, if we run the following code we can see the *CAST_INVALID_INPUT* 
exception.
{code:java}
spark.sql(s"""INSERT INTO $tableName partition (age=\"test_age\") VALUES (2, 
'Nishanth')"""){code}
*Exception:*
{code:java}
[CAST_INVALID_INPUT] The value 'AGE_34' of the type "STRING" cannot be cast to 
"INT" because it is malformed. Correct the value as per the syntax, or change 
its target type. Use `try_cast` to tolerate malformed input and return NULL 
instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this 
error.
== SQL(line 1, position 1) ==
INSERT INTO TABLE partition_table PARTITION(age="AGE_34") VALUES (1, 'ABC')
^org.apache.spark.SparkNumberFormatException:
 [CAST_INVALID_INPUT] The value 'AGE_34' of the type "STRING" cannot be cast to 
"INT" because it is malformed. Correct the value as per the syntax, or change 
its target type. Use `try_cast` to tolerate malformed input and return NULL 
instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this 
error.
== SQL(line 1, position 1) ==
INSERT INTO TABLE partition_table PARTITION(age="AGE_34") VALUES (1, 'ABC')
^    at 
org.apache.spark.sql.errors.QueryExecutionErrors$.invalidInputInCastToNumberError(QueryExecutionErrors.scala:161)
    at 
org.apache.spark.sql.catalyst.util.UTF8StringUtils$.withException(UTF8StringUtils.scala:51)
    at 
org.apache.spark.sql.catalyst.util.UTF8StringUtils$.toIntExact(UTF8StringUtils.scala:34)
    at 
org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$2(Cast.scala:927)
    at 
org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$2$adapted(Cast.scala:927)
    at org.apache.spark.sql.catalyst.expressions.Cast.buildCast(Cast.scala:588)
    at 
org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$1(Cast.scala:927)
    at 
org.apache.spark.sql.catalyst.expressions.Cast.nullSafeEval(Cast.scala:1285)
    at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:526)
    at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:522)
    at 
org.apache.spark.sql.util.PartitioningUtils$.normalizePartitionStringValue(PartitioningUtils.scala:56)
    at 
org.apache.spark.sql.util.PartitioningUtils$.$anonfun$normalizePartitionSpec$1(PartitioningUtils.scala:100)
    at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at scala.collection.TraversableLike.map(TraversableLike.scala:286)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at 
org.apache.spark.sql.util.PartitioningUtils$.normalizePartitionSpec(PartitioningUtils.scala:76)
    at 
org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$.org$apache$spark$sql$execution$datasources$PreprocessTableInsertion$$preprocess(rules.scala:382)
    at 
org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:426)
    at 
org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:420)
    at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:170)
    at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
    at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:170)
    at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
    at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:168)
    at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164)
    at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30)
    at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning(AnalysisHelper.scala:99)
    at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning$(AnalysisHelper.scala:96)
    at 

[jira] [Commented] (SPARK-41005) Arrow based collect

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632069#comment-17632069
 ] 

Apache Spark commented on SPARK-41005:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38614

> Arrow based collect
> ---
>
> Key: SPARK-41005
> URL: https://issues.apache.org/jira/browse/SPARK-41005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41005) Arrow based collect

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632070#comment-17632070
 ] 

Apache Spark commented on SPARK-41005:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38614

> Arrow based collect
> ---
>
> Key: SPARK-41005
> URL: https://issues.apache.org/jira/browse/SPARK-41005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41005) Arrow based collect

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632067#comment-17632067
 ] 

Apache Spark commented on SPARK-41005:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/38613

> Arrow based collect
> ---
>
> Key: SPARK-41005
> URL: https://issues.apache.org/jira/browse/SPARK-41005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41005) Arrow based collect

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632066#comment-17632066
 ] 

Apache Spark commented on SPARK-41005:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/38613

> Arrow based collect
> ---
>
> Key: SPARK-41005
> URL: https://issues.apache.org/jira/browse/SPARK-41005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41108) Control the max size of arrow batch

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632058#comment-17632058
 ] 

Apache Spark commented on SPARK-41108:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38612

> Control the max size of arrow batch
> ---
>
> Key: SPARK-41108
> URL: https://issues.apache.org/jira/browse/SPARK-41108
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41108) Control the max size of arrow batch

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41108:


Assignee: Apache Spark

> Control the max size of arrow batch
> ---
>
> Key: SPARK-41108
> URL: https://issues.apache.org/jira/browse/SPARK-41108
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41108) Control the max size of arrow batch

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632057#comment-17632057
 ] 

Apache Spark commented on SPARK-41108:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38612

> Control the max size of arrow batch
> ---
>
> Key: SPARK-41108
> URL: https://issues.apache.org/jira/browse/SPARK-41108
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41108) Control the max size of arrow batch

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41108:


Assignee: (was: Apache Spark)

> Control the max size of arrow batch
> ---
>
> Key: SPARK-41108
> URL: https://issues.apache.org/jira/browse/SPARK-41108
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41108) Control the max size of arrow batch

2022-11-10 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-41108:
-

 Summary: Control the max size of arrow batch
 Key: SPARK-41108
 URL: https://issues.apache.org/jira/browse/SPARK-41108
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41063) `hive-thriftserver` module compilation deadlock

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41063:


Assignee: (was: Apache Spark)

> `hive-thriftserver` module compilation deadlock
> ---
>
> Key: SPARK-41063
> URL: https://issues.apache.org/jira/browse/SPARK-41063
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z=HMACV1=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D]
>  
> I have seen it when compiling with Maven locally,  but I haven't investigated
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41107) Install memory-profiler in the CI

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41107:


Assignee: (was: Apache Spark)

> Install memory-profiler in the CI
> -
>
> Key: SPARK-41107
> URL: https://issues.apache.org/jira/browse/SPARK-41107
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> PySpark memory profiler depends on 
> [memory-profiler](https://pypi.org/project/memory-profiler/) . The ticket 
> proposes to install memory-profiler in the CI to enable related tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41107) Install memory-profiler in the CI

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632054#comment-17632054
 ] 

Apache Spark commented on SPARK-41107:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38611

> Install memory-profiler in the CI
> -
>
> Key: SPARK-41107
> URL: https://issues.apache.org/jira/browse/SPARK-41107
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> PySpark memory profiler depends on 
> [memory-profiler](https://pypi.org/project/memory-profiler/) . The ticket 
> proposes to install memory-profiler in the CI to enable related tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41107) Install memory-profiler in the CI

2022-11-10 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-41107:
-
Description: PySpark memory profiler depends on 
[memory-profiler|https://pypi.org/project/memory-profiler/] . The ticket 
proposes to install memory-profiler in the CI to enable related tests.  (was: 
PySpark memory profiler depends on 
[memory-profiler](https://pypi.org/project/memory-profiler/) . The ticket 
proposes to install memory-profiler in the CI to enable related tests.)

> Install memory-profiler in the CI
> -
>
> Key: SPARK-41107
> URL: https://issues.apache.org/jira/browse/SPARK-41107
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> PySpark memory profiler depends on 
> [memory-profiler|https://pypi.org/project/memory-profiler/] . The ticket 
> proposes to install memory-profiler in the CI to enable related tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41063) `hive-thriftserver` module compilation deadlock

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41063:


Assignee: Apache Spark

> `hive-thriftserver` module compilation deadlock
> ---
>
> Key: SPARK-41063
> URL: https://issues.apache.org/jira/browse/SPARK-41063
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z=HMACV1=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D]
>  
> I have seen it when compiling with Maven locally,  but I haven't investigated
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41107) Install memory-profiler in the CI

2022-11-10 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-41107:
-
Description: 
PySpark memory profiler depends on 
[memory-profiler|https://pypi.org/project/memory-profiler/].

The ticket proposes to install memory-profiler in the CI to enable related 
tests.

  was:PySpark memory profiler depends on 
[memory-profiler|https://pypi.org/project/memory-profiler/] . The ticket 
proposes to install memory-profiler in the CI to enable related tests.


> Install memory-profiler in the CI
> -
>
> Key: SPARK-41107
> URL: https://issues.apache.org/jira/browse/SPARK-41107
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> PySpark memory profiler depends on 
> [memory-profiler|https://pypi.org/project/memory-profiler/].
> The ticket proposes to install memory-profiler in the CI to enable related 
> tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41107) Install memory-profiler in the CI

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41107:


Assignee: Apache Spark

> Install memory-profiler in the CI
> -
>
> Key: SPARK-41107
> URL: https://issues.apache.org/jira/browse/SPARK-41107
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> PySpark memory profiler depends on 
> [memory-profiler](https://pypi.org/project/memory-profiler/) . The ticket 
> proposes to install memory-profiler in the CI to enable related tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41107) Install memory-profiler in the CI

2022-11-10 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-41107:
-
Description: PySpark memory profiler depends on 
[memory-profiler](https://pypi.org/project/memory-profiler/) . The ticket 
proposes to install memory-profiler in the CI to enable related tests.  (was: 
We shall install the Memory Profiler in CI in order to enable memory profiling 
tests.)

> Install memory-profiler in the CI
> -
>
> Key: SPARK-41107
> URL: https://issues.apache.org/jira/browse/SPARK-41107
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> PySpark memory profiler depends on 
> [memory-profiler](https://pypi.org/project/memory-profiler/) . The ticket 
> proposes to install memory-profiler in the CI to enable related tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41107) Install memory-profiler in the CI

2022-11-10 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-41107:
-
Summary: Install memory-profiler in the CI  (was: Install memory-profiler 
in CI)

> Install memory-profiler in the CI
> -
>
> Key: SPARK-41107
> URL: https://issues.apache.org/jira/browse/SPARK-41107
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> We shall install the Memory Profiler in CI in order to enable memory 
> profiling tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41107) Install memory-profiler in CI

2022-11-10 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-41107:
-
Summary: Install memory-profiler in CI  (was: Install the Memory Profiler 
in CI)

> Install memory-profiler in CI
> -
>
> Key: SPARK-41107
> URL: https://issues.apache.org/jira/browse/SPARK-41107
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> We shall install the Memory Profiler in CI in order to enable memory 
> profiling tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-41063) `hive-thriftserver` module compilation deadlock

2022-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-41063:
--
  Assignee: (was: Hyukjin Kwon)

> `hive-thriftserver` module compilation deadlock
> ---
>
> Key: SPARK-41063
> URL: https://issues.apache.org/jira/browse/SPARK-41063
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
> Fix For: 3.4.0
>
>
> [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z=HMACV1=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D]
>  
> I have seen it when compiling with Maven locally,  but I haven't investigated
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41063) `hive-thriftserver` module compilation deadlock

2022-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41063:
-
Fix Version/s: (was: 3.4.0)

> `hive-thriftserver` module compilation deadlock
> ---
>
> Key: SPARK-41063
> URL: https://issues.apache.org/jira/browse/SPARK-41063
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z=HMACV1=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D]
>  
> I have seen it when compiling with Maven locally,  but I haven't investigated
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41063) `hive-thriftserver` module compilation deadlock

2022-11-10 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632053#comment-17632053
 ] 

Hyukjin Kwon commented on SPARK-41063:
--

Reverted at 
https://github.com/apache/spark/commit/73bca6e5cace0c2c46938e82fa12ab518faa2248

> `hive-thriftserver` module compilation deadlock
> ---
>
> Key: SPARK-41063
> URL: https://issues.apache.org/jira/browse/SPARK-41063
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z=HMACV1=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D]
>  
> I have seen it when compiling with Maven locally,  but I haven't investigated
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41036) `columns` API should use `schema` API to avoid data fetching

2022-11-10 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-41036.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38546
[https://github.com/apache/spark/pull/38546]

> `columns` API should use `schema` API to avoid data fetching
> 
>
> Key: SPARK-41036
> URL: https://issues.apache.org/jira/browse/SPARK-41036
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41036) `columns` API should use `schema` API to avoid data fetching

2022-11-10 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-41036:
-

Assignee: Rui Wang

> `columns` API should use `schema` API to avoid data fetching
> 
>
> Key: SPARK-41036
> URL: https://issues.apache.org/jira/browse/SPARK-41036
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41107) Install the Memory Profiler in CI

2022-11-10 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-41107:
-
Component/s: Tests

> Install the Memory Profiler in CI
> -
>
> Key: SPARK-41107
> URL: https://issues.apache.org/jira/browse/SPARK-41107
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> We shall install the Memory Profiler in CI in order to enable memory 
> profiling tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41107) Install the Memory Profiler in CI

2022-11-10 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-41107:


 Summary: Install the Memory Profiler in CI
 Key: SPARK-41107
 URL: https://issues.apache.org/jira/browse/SPARK-41107
 Project: Spark
  Issue Type: Sub-task
  Components: Build, PySpark
Affects Versions: 3.4.0
Reporter: Xinrong Meng


We shall install the Memory Profiler in CI in order to enable memory profiling 
tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-41099) Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException

2022-11-10 Thread Bo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Zhang closed SPARK-41099.


> Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException
> 
>
> Key: SPARK-41099
> URL: https://issues.apache.org/jira/browse/SPARK-41099
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Bo Zhang
>Priority: Major
>
> This is similar to https://issues.apache.org/jira/browse/SPARK-40488.
> Exceptions thrown in SparkHadoopWriter.write are wrapped with 
> SparkException("Job aborted.").
> This wrapping provides little extra information, but generates a long 
> stacktrace, which hinders debugging when error happens.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-41099) Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException

2022-11-10 Thread Bo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632048#comment-17632048
 ] 

Bo Zhang edited comment on SPARK-41099 at 11/11/22 3:08 AM:


To keep the exceptions exposed to users who use the RDD APIs, we will not 
change this.

See https://github.com/apache/spark/pull/38602#issuecomment-1310755154


was (Author: bozhang):
To keep the exceptions exposed to users who use the RDD APIs, we will not 
change this.

> Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException
> 
>
> Key: SPARK-41099
> URL: https://issues.apache.org/jira/browse/SPARK-41099
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Bo Zhang
>Priority: Major
>
> This is similar to https://issues.apache.org/jira/browse/SPARK-40488.
> Exceptions thrown in SparkHadoopWriter.write are wrapped with 
> SparkException("Job aborted.").
> This wrapping provides little extra information, but generates a long 
> stacktrace, which hinders debugging when error happens.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41105) Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate if a field is set or unset

2022-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41105.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38606
[https://github.com/apache/spark/pull/38606]

> Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate 
> if a field is set or unset 
> ---
>
> Key: SPARK-41105
> URL: https://issues.apache.org/jira/browse/SPARK-41105
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41099) Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException

2022-11-10 Thread Bo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Zhang resolved SPARK-41099.
--
Resolution: Won't Fix

To keep the exceptions exposed to users who use the RDD APIs, we will not 
change this.

> Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException
> 
>
> Key: SPARK-41099
> URL: https://issues.apache.org/jira/browse/SPARK-41099
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Bo Zhang
>Priority: Major
>
> This is similar to https://issues.apache.org/jira/browse/SPARK-40488.
> Exceptions thrown in SparkHadoopWriter.write are wrapped with 
> SparkException("Job aborted.").
> This wrapping provides little extra information, but generates a long 
> stacktrace, which hinders debugging when error happens.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41105) Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate if a field is set or unset

2022-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41105:


Assignee: Rui Wang

> Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate 
> if a field is set or unset 
> ---
>
> Key: SPARK-41105
> URL: https://issues.apache.org/jira/browse/SPARK-41105
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40281) Memory Profiler on Executors

2022-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40281.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38584
[https://github.com/apache/spark/pull/38584]

> Memory Profiler on Executors
> 
>
> Key: SPARK-40281
> URL: https://issues.apache.org/jira/browse/SPARK-40281
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.4.0
>
>
> The ticket proposes to implement PySpark memory profiling on executors. See 
> more 
> [design|https://docs.google.com/document/d/e/2PACX-1vR2K4TdrM1eAjNDC1bsflCNRH67UWLoC-lCv6TSUVXD91Ruksm99pYTnCeIm7Ui3RgrrRNcQU_D8-oh/pub].
> There are many factors in a PySpark program’s performance. Memory, as one of 
> the key factors of a program’s performance, had been missing in PySpark 
> profiling. A PySpark program on the Spark driver can be profiled with [Memory 
> Profiler|https://www.google.com/url?q=https://pypi.org/project/memory-profiler/=D=editors=1668027860192689=AOvVaw1t4LRcObEGuhaTr5oHEUwU]
>  as a normal Python process, but there was not an easy way to profile memory 
> on Spark executors.
> PySpark UDFs, one of the most popular Python APIs, enable users to run custom 
> code on top of the Apache Spark™ engine. However, it is difficult to optimize 
> UDFs without understanding memory consumption.
> The ticket proposes to introduce the PySpark memory profiler, which profiles 
> memory on executors. It provides information about total memory usage and 
> pinpoints which lines of code in a UDF attribute to the most memory usage. 
> That will help optimize PySpark UDFs and reduce the likelihood of 
> out-of-memory errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40281) Memory Profiler on Executors

2022-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40281:


Assignee: Xinrong Meng

> Memory Profiler on Executors
> 
>
> Key: SPARK-40281
> URL: https://issues.apache.org/jira/browse/SPARK-40281
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> The ticket proposes to implement PySpark memory profiling on executors. See 
> more 
> [design|https://docs.google.com/document/d/e/2PACX-1vR2K4TdrM1eAjNDC1bsflCNRH67UWLoC-lCv6TSUVXD91Ruksm99pYTnCeIm7Ui3RgrrRNcQU_D8-oh/pub].
> There are many factors in a PySpark program’s performance. Memory, as one of 
> the key factors of a program’s performance, had been missing in PySpark 
> profiling. A PySpark program on the Spark driver can be profiled with [Memory 
> Profiler|https://www.google.com/url?q=https://pypi.org/project/memory-profiler/=D=editors=1668027860192689=AOvVaw1t4LRcObEGuhaTr5oHEUwU]
>  as a normal Python process, but there was not an easy way to profile memory 
> on Spark executors.
> PySpark UDFs, one of the most popular Python APIs, enable users to run custom 
> code on top of the Apache Spark™ engine. However, it is difficult to optimize 
> UDFs without understanding memory consumption.
> The ticket proposes to introduce the PySpark memory profiler, which profiles 
> memory on executors. It provides information about total memory usage and 
> pinpoints which lines of code in a UDF attribute to the most memory usage. 
> That will help optimize PySpark UDFs and reduce the likelihood of 
> out-of-memory errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41106) Reduce collection conversion when create AttributeMap

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632041#comment-17632041
 ] 

Apache Spark commented on SPARK-41106:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38610

> Reduce collection conversion when create AttributeMap
> -
>
> Key: SPARK-41106
> URL: https://issues.apache.org/jira/browse/SPARK-41106
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41106) Reduce collection conversion when create AttributeMap

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41106:


Assignee: (was: Apache Spark)

> Reduce collection conversion when create AttributeMap
> -
>
> Key: SPARK-41106
> URL: https://issues.apache.org/jira/browse/SPARK-41106
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41106) Reduce collection conversion when create AttributeMap

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41106:


Assignee: Apache Spark

> Reduce collection conversion when create AttributeMap
> -
>
> Key: SPARK-41106
> URL: https://issues.apache.org/jira/browse/SPARK-41106
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41106) Reduce collection conversion when create AttributeMap

2022-11-10 Thread Yang Jie (Jira)
Yang Jie created SPARK-41106:


 Summary: Reduce collection conversion when create AttributeMap
 Key: SPARK-41106
 URL: https://issues.apache.org/jira/browse/SPARK-41106
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40593) protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632030#comment-17632030
 ] 

Apache Spark commented on SPARK-40593:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38609

> protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14 
> ---
>
> Key: SPARK-40593
> URL: https://issues.apache.org/jira/browse/SPARK-40593
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> Compile Connect module on CentOS release 6.3, the default glibc version is 
> 2.12, this will cause compilation to fail as follows:
> {code:java}
> [ERROR] PROTOC FAILED: 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /lib64/libc.so.6: version `GLIBC_2.14' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.18' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.5' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40593) protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632028#comment-17632028
 ] 

Apache Spark commented on SPARK-40593:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38609

> protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14 
> ---
>
> Key: SPARK-40593
> URL: https://issues.apache.org/jira/browse/SPARK-40593
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> Compile Connect module on CentOS release 6.3, the default glibc version is 
> 2.12, this will cause compilation to fail as follows:
> {code:java}
> [ERROR] PROTOC FAILED: 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /lib64/libc.so.6: version `GLIBC_2.14' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.18' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.5' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40593) protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40593:


Assignee: Apache Spark

> protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14 
> ---
>
> Key: SPARK-40593
> URL: https://issues.apache.org/jira/browse/SPARK-40593
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> Compile Connect module on CentOS release 6.3, the default glibc version is 
> 2.12, this will cause compilation to fail as follows:
> {code:java}
> [ERROR] PROTOC FAILED: 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /lib64/libc.so.6: version `GLIBC_2.14' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.18' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.5' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40593) protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40593:


Assignee: (was: Apache Spark)

> protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14 
> ---
>
> Key: SPARK-40593
> URL: https://issues.apache.org/jira/browse/SPARK-40593
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> Compile Connect module on CentOS release 6.3, the default glibc version is 
> 2.12, this will cause compilation to fail as follows:
> {code:java}
> [ERROR] PROTOC FAILED: 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /lib64/libc.so.6: version `GLIBC_2.14' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.18' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.5' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41077) Rename `ColumnRef` to `Column` in Python client implementation

2022-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41077:


Assignee: Rui Wang

> Rename `ColumnRef` to `Column` in Python client implementation 
> ---
>
> Key: SPARK-41077
> URL: https://issues.apache.org/jira/browse/SPARK-41077
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41077) Rename `ColumnRef` to `Column` in Python client implementation

2022-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41077.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38586
[https://github.com/apache/spark/pull/38586]

> Rename `ColumnRef` to `Column` in Python client implementation 
> ---
>
> Key: SPARK-41077
> URL: https://issues.apache.org/jira/browse/SPARK-41077
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41063) `hive-thriftserver` module compilation deadlock

2022-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41063.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38599
[https://github.com/apache/spark/pull/38599]

> `hive-thriftserver` module compilation deadlock
> ---
>
> Key: SPARK-41063
> URL: https://issues.apache.org/jira/browse/SPARK-41063
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z=HMACV1=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D]
>  
> I have seen it when compiling with Maven locally,  but I haven't investigated
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41063) `hive-thriftserver` module compilation deadlock

2022-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41063:


Assignee: Hyukjin Kwon

> `hive-thriftserver` module compilation deadlock
> ---
>
> Key: SPARK-41063
> URL: https://issues.apache.org/jira/browse/SPARK-41063
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Hyukjin Kwon
>Priority: Major
>
> [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z=HMACV1=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D]
>  
> I have seen it when compiling with Maven locally,  but I haven't investigated
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41005) Arrow based collect

2022-11-10 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-41005:
-

Assignee: Ruifeng Zheng

> Arrow based collect
> ---
>
> Key: SPARK-41005
> URL: https://issues.apache.org/jira/browse/SPARK-41005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41005) Arrow based collect

2022-11-10 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-41005.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38468
[https://github.com/apache/spark/pull/38468]

> Arrow based collect
> ---
>
> Key: SPARK-41005
> URL: https://issues.apache.org/jira/browse/SPARK-41005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41080) Support Bit manipulation function SETBIT

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631991#comment-17631991
 ] 

Apache Spark commented on SPARK-41080:
--

User 'vinodkc' has created a pull request for this issue:
https://github.com/apache/spark/pull/38608

> Support Bit manipulation function SETBIT
> 
>
> Key: SPARK-41080
> URL: https://issues.apache.org/jira/browse/SPARK-41080
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.4
>Reporter: Vinod KC
>Priority: Minor
>
> Support function to change a bit at a specified position. It shall change a 
> bit at a specified position to a 1,  If the optional third argument is set to 
> zero, the specified bit shall be set to 0 instead.
> SETBIT(integer_type a, INT position [, INT zero_or_one])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41080) Support Bit manipulation function SETBIT

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41080:


Assignee: (was: Apache Spark)

> Support Bit manipulation function SETBIT
> 
>
> Key: SPARK-41080
> URL: https://issues.apache.org/jira/browse/SPARK-41080
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.4
>Reporter: Vinod KC
>Priority: Minor
>
> Support function to change a bit at a specified position. It shall change a 
> bit at a specified position to a 1,  If the optional third argument is set to 
> zero, the specified bit shall be set to 0 instead.
> SETBIT(integer_type a, INT position [, INT zero_or_one])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41080) Support Bit manipulation function SETBIT

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631990#comment-17631990
 ] 

Apache Spark commented on SPARK-41080:
--

User 'vinodkc' has created a pull request for this issue:
https://github.com/apache/spark/pull/38608

> Support Bit manipulation function SETBIT
> 
>
> Key: SPARK-41080
> URL: https://issues.apache.org/jira/browse/SPARK-41080
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.4
>Reporter: Vinod KC
>Priority: Minor
>
> Support function to change a bit at a specified position. It shall change a 
> bit at a specified position to a 1,  If the optional third argument is set to 
> zero, the specified bit shall be set to 0 instead.
> SETBIT(integer_type a, INT position [, INT zero_or_one])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41080) Support Bit manipulation function SETBIT

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41080:


Assignee: Apache Spark

> Support Bit manipulation function SETBIT
> 
>
> Key: SPARK-41080
> URL: https://issues.apache.org/jira/browse/SPARK-41080
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.4
>Reporter: Vinod KC
>Assignee: Apache Spark
>Priority: Minor
>
> Support function to change a bit at a specified position. It shall change a 
> bit at a specified position to a 1,  If the optional third argument is set to 
> zero, the specified bit shall be set to 0 instead.
> SETBIT(integer_type a, INT position [, INT zero_or_one])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40938) Support Alias for every Relation

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631982#comment-17631982
 ] 

Apache Spark commented on SPARK-40938:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38607

> Support Alias for every Relation
> 
>
> Key: SPARK-40938
> URL: https://issues.apache.org/jira/browse/SPARK-40938
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40938) Support Alias for every Relation

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631983#comment-17631983
 ] 

Apache Spark commented on SPARK-40938:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38607

> Support Alias for every Relation
> 
>
> Key: SPARK-40938
> URL: https://issues.apache.org/jira/browse/SPARK-40938
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40901) Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path

2022-11-10 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-40901.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38377
[https://github.com/apache/spark/pull/38377]

> Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path
> 
>
> Key: SPARK-40901
> URL: https://issues.apache.org/jira/browse/SPARK-40901
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0, 3.2.2
>Reporter: Swaminathan Balachandran
>Assignee: Swaminathan Balachandran
>Priority: Major
> Fix For: 3.4.0
>
>
> Spark Config: spark.driver.log.dfsDir doesn't support absolute URI hadoop 
> based path. It currently only supports URI path and writes only to 
> fs.defaultFS and does not write logs to any other configured filesystem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40901) Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path

2022-11-10 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-40901:
---

Assignee: Swaminathan Balachandran

> Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path
> 
>
> Key: SPARK-40901
> URL: https://issues.apache.org/jira/browse/SPARK-40901
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0, 3.2.2
>Reporter: Swaminathan Balachandran
>Assignee: Swaminathan Balachandran
>Priority: Major
>
> Spark Config: spark.driver.log.dfsDir doesn't support absolute URI hadoop 
> based path. It currently only supports URI path and writes only to 
> fs.defaultFS and does not write logs to any other configured filesystem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name

2022-11-10 Thread Serhii Nesterov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serhii Nesterov updated SPARK-41060:

Attachment: Screenshot 2022-11-09 015432.png

> Spark Submitter generates a ConfigMap with the same name
> 
>
> Key: SPARK-41060
> URL: https://issues.apache.org/jira/browse/SPARK-41060
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.2, 3.3.0, 3.3.1
>Reporter: Serhii Nesterov
>Priority: Major
> Attachments: Screenshot 2022-11-09 015432.png
>
>
> *Description of the issue:*
> There's a problem with submitting spark jobs to K8s cluster: the library 
> generates and reuses the same name for config maps (for drivers and 
> executors). Ideally, for each job 2 config maps should be created: for a 
> driver and an executor. However, the library creates only one driver config 
> map for all jobs (in some cases it generates only one executor map for all 
> jobs in the same manner). So, if I run 5 jobs, then only one driver config 
> map will be generated and used for every job.  During those runs we 
> experience issues when deleting pods from the cluster: executors pods are 
> endlessly created and immediately terminated overloading cluster resources.
>  
> *The reason of the issue:*
> This problem occurs because of the *KubernetesClientUtils* class in which we 
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
> to be incorrect and should be urgently fixed. I've prepared some changes for 
> review to fix the issue (tested in the cluster of our project).
>  
> *Steps to reproduce the issue:*
>  
>  # Create a *KubernetesClientApplication* object.
>  # Submit at least 2 jobs (sequentially or using *Thread* for running in 
> parallel).
>  
> *The results of my observations according to the steps are as follows:*
>  # Spark 3.1.2 - The same config map in K8S will be overwritten which means 
> all the jobs will point to the same config map.
>  # Spark 3.3.* -  For the first job a new config map will be created. For 
> other jobs an exception will be thrown (the K8S Fabric library does not allow 
> to create a new config map with the existing name).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name

2022-11-10 Thread Serhii Nesterov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631972#comment-17631972
 ] 

Serhii Nesterov commented on SPARK-41060:
-

After applying the fixes from the pull request config maps are created 
correctly:

!Screenshot 2022-11-09 015432.png!

> Spark Submitter generates a ConfigMap with the same name
> 
>
> Key: SPARK-41060
> URL: https://issues.apache.org/jira/browse/SPARK-41060
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.2, 3.3.0, 3.3.1
>Reporter: Serhii Nesterov
>Priority: Major
> Attachments: Screenshot 2022-11-09 015432.png
>
>
> *Description of the issue:*
> There's a problem with submitting spark jobs to K8s cluster: the library 
> generates and reuses the same name for config maps (for drivers and 
> executors). Ideally, for each job 2 config maps should be created: for a 
> driver and an executor. However, the library creates only one driver config 
> map for all jobs (in some cases it generates only one executor map for all 
> jobs in the same manner). So, if I run 5 jobs, then only one driver config 
> map will be generated and used for every job.  During those runs we 
> experience issues when deleting pods from the cluster: executors pods are 
> endlessly created and immediately terminated overloading cluster resources.
>  
> *The reason of the issue:*
> This problem occurs because of the *KubernetesClientUtils* class in which we 
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
> to be incorrect and should be urgently fixed. I've prepared some changes for 
> review to fix the issue (tested in the cluster of our project).
>  
> *Steps to reproduce the issue:*
>  
>  # Create a *KubernetesClientApplication* object.
>  # Submit at least 2 jobs (sequentially or using *Thread* for running in 
> parallel).
>  
> *The results of my observations according to the steps are as follows:*
>  # Spark 3.1.2 - The same config map in K8S will be overwritten which means 
> all the jobs will point to the same config map.
>  # Spark 3.3.* -  For the first job a new config map will be created. For 
> other jobs an exception will be thrown (the K8S Fabric library does not allow 
> to create a new config map with the existing name).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name

2022-11-10 Thread Serhii Nesterov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serhii Nesterov updated SPARK-41060:

Description: 
*Description of the issue:*

There's a problem with submitting spark jobs to K8s cluster: the library 
generates and reuses the same name for config maps (for drivers and executors). 
Ideally, for each job 2 config maps should be created: for a driver and an 
executor. However, the library creates only one driver config map for all jobs 
(in some cases it generates only one executor map for all jobs in the same 
manner). So, if I run 5 jobs, then only one driver config map will be generated 
and used for every job.  During those runs we experience issues when deleting 
pods from the cluster: executors pods are endlessly created and immediately 
terminated overloading cluster resources.

 

*The reason of the issue:*

This problem occurs because of the *KubernetesClientUtils* class in which we 
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
to be incorrect and should be urgently fixed. I've prepared some changes for 
review to fix the issue (tested in the cluster of our project).

 

*Steps to reproduce the issue:*

 
 # Create a *KubernetesClientApplication* object.
 # Submit at least 2 jobs (sequentially or using *Thread* for running in 
parallel).

 

*The results of my observations according to the steps are as follows:*
 # Spark 3.1.2 - The same config map in K8S will be overwritten which means all 
the jobs will point to the same config map.
 # Spark 3.3.* -  For the first job a new config map will be created. For other 
jobs an exception will be thrown (the K8S Fabric library does not allow to 
create a new config map with the existing name).

  was:
There's a problem with submitting spark jobs to K8s cluster: the library 
generates and reuses the same name for config maps (for drivers and executors). 
Ideally, for each job 2 config maps should be created: for a driver and an 
executor. However, the library creates only one driver config map for all jobs 
(in some cases it generates only one executor map for all jobs in the same 
manner). So, if I run 5 jobs, then only one driver config map will be generated 
and used for every job.  During those runs we experience issues when deleting 
pods from the cluster: executors pods are endlessly created and immediately 
terminated overloading cluster resources.

This problem occurs because of the *KubernetesClientUtils* class in which we 
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
to be incorrect and should be urgently fixed. I've prepared some changes for 
review to fix the issue (tested in the cluster of our project).

 

Steps to reproduce the issue:

 
 # Create a *KubernetesClientApplication* object.
 # Submit at least 2 jobs (sequentially or using *Thread* for running in 
parallel).

 

The results of my observations according to the steps are as follows:
 # Spark 3.1.2 - The same config map in K8S will be overwritten which means all 
the jobs will point to the same config map.
 # Spark 3.3.* -  For the first job a new config map will be created. For other 
jobs an exception will be thrown (the K8S Fabric library does not allow to 
create a new config map with the existing name).


> Spark Submitter generates a ConfigMap with the same name
> 
>
> Key: SPARK-41060
> URL: https://issues.apache.org/jira/browse/SPARK-41060
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.2, 3.3.0, 3.3.1
>Reporter: Serhii Nesterov
>Priority: Major
>
> *Description of the issue:*
> There's a problem with submitting spark jobs to K8s cluster: the library 
> generates and reuses the same name for config maps (for drivers and 
> executors). Ideally, for each job 2 config maps should be created: for a 
> driver and an executor. However, the library creates only one driver config 
> map for all jobs (in some cases it generates only one executor map for all 
> jobs in the same manner). So, if I run 5 jobs, then only one driver config 
> map will be generated and used for every job.  During those runs we 
> experience issues when deleting pods from the cluster: executors pods are 
> endlessly created and immediately terminated overloading cluster resources.
>  
> *The reason of the issue:*
> This problem occurs because of the *KubernetesClientUtils* class in which we 
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
> to be incorrect and should be urgently fixed. I've prepared some changes for 
> review to fix the issue (tested in the cluster of our project).
>  
> *Steps to reproduce the issue:*
>  
>  # Create a *KubernetesClientApplication* object.
>  # Submit at least 2 jobs 

[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name

2022-11-10 Thread Serhii Nesterov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serhii Nesterov updated SPARK-41060:

Description: 
There's a problem with submitting spark jobs to K8s cluster: the library 
generates and reuses the same name for config maps (for drivers and executors). 
Ideally, for each job 2 config maps should be created: for a driver and an 
executor. However, the library creates only one driver config map for all jobs 
(in some cases it generates only one executor map for all jobs in the same 
manner). So, if I run 5 jobs, then only one driver config map will be generated 
and used for every job.  During those runs we experience issues when deleting 
pods from the cluster: executors pods are endlessly created and immediately 
terminated overloading cluster resources.

This problem occurs because of the *KubernetesClientUtils* class in which we 
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
to be incorrect and should be urgently fixed. I've prepared some changes for 
review to fix the issue (tested in the cluster of our project).

 

Steps to reproduce the issue:

 
 # Create a *KubernetesClientApplication* object.
 # Submit at least 2 jobs (sequentially or using *Thread* for running in 
parallel).

 

The results of my observations according to the steps are as follows:
 # Spark 3.1.2 - The same config map in K8S will be overwritten which means all 
the jobs will point to the same config map.
 # Spark 3.3.* -  For the first job a new config map will be created. For other 
jobs an exception will be thrown (the K8S Fabric library does not allow to 
create a new config map with the existing name).

  was:
There's a problem with submitting spark jobs to K8s cluster: the library 
generates and reuses the same name for config maps (for drivers and executors). 
Ideally, for each job 2 config maps should be created: for a driver and an 
executor. However, the library creates only one driver config map for all jobs 
(in some cases it generates only one executor map for all jobs in the same 
manner). So, if I run 5 jobs, then only one driver config map will be generated 
and used for every job.  During those runs we experience issues when deleting 
pods from the cluster: executors pods are endlessly created and immediately 
terminated overloading cluster resources.

This problem occurs because of the *KubernetesClientUtils* class in which we 
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
to be incorrect and should be urgently fixed. I've prepared some changes for 
review to fix the issue (tested in the cluster of our project).


> Spark Submitter generates a ConfigMap with the same name
> 
>
> Key: SPARK-41060
> URL: https://issues.apache.org/jira/browse/SPARK-41060
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.2, 3.3.0, 3.3.1
>Reporter: Serhii Nesterov
>Priority: Major
>
> There's a problem with submitting spark jobs to K8s cluster: the library 
> generates and reuses the same name for config maps (for drivers and 
> executors). Ideally, for each job 2 config maps should be created: for a 
> driver and an executor. However, the library creates only one driver config 
> map for all jobs (in some cases it generates only one executor map for all 
> jobs in the same manner). So, if I run 5 jobs, then only one driver config 
> map will be generated and used for every job.  During those runs we 
> experience issues when deleting pods from the cluster: executors pods are 
> endlessly created and immediately terminated overloading cluster resources.
> This problem occurs because of the *KubernetesClientUtils* class in which we 
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
> to be incorrect and should be urgently fixed. I've prepared some changes for 
> review to fix the issue (tested in the cluster of our project).
>  
> Steps to reproduce the issue:
>  
>  # Create a *KubernetesClientApplication* object.
>  # Submit at least 2 jobs (sequentially or using *Thread* for running in 
> parallel).
>  
> The results of my observations according to the steps are as follows:
>  # Spark 3.1.2 - The same config map in K8S will be overwritten which means 
> all the jobs will point to the same config map.
>  # Spark 3.3.* -  For the first job a new config map will be created. For 
> other jobs an exception will be thrown (the K8S Fabric library does not allow 
> to create a new config map with the existing name).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Commented] (SPARK-41105) Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate if a field is set or unset

2022-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631967#comment-17631967
 ] 

Apache Spark commented on SPARK-41105:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38606

> Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate 
> if a field is set or unset 
> ---
>
> Key: SPARK-41105
> URL: https://issues.apache.org/jira/browse/SPARK-41105
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41105) Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate if a field is set or unset

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41105:


Assignee: Apache Spark

> Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate 
> if a field is set or unset 
> ---
>
> Key: SPARK-41105
> URL: https://issues.apache.org/jira/browse/SPARK-41105
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41105) Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate if a field is set or unset

2022-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41105:


Assignee: (was: Apache Spark)

> Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate 
> if a field is set or unset 
> ---
>
> Key: SPARK-41105
> URL: https://issues.apache.org/jira/browse/SPARK-41105
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41105) Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate if a field is set or unset

2022-11-10 Thread Rui Wang (Jira)
Rui Wang created SPARK-41105:


 Summary: Adopt `optional` keyword from proto3 which offers 
`hasXXX` to differentiate if a field is set or unset 
 Key: SPARK-41105
 URL: https://issues.apache.org/jira/browse/SPARK-41105
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41104) Can insert NULL into hive table table with NOT NULL column

2022-11-10 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631946#comment-17631946
 ] 

Rui Wang commented on SPARK-41104:
--

Looks like HIVE only enforce `NOT NULL` since Hive 3.0.0 
https://issues.apache.org/jira/browse/HIVE-16575

> Can insert NULL into hive table table with NOT NULL column
> --
>
> Key: SPARK-41104
> URL: https://issues.apache.org/jira/browse/SPARK-41104
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Critical
>
> spark-sql> CREATE TABLE tttd(c1 int not null);
> 22/11/10 14:04:28 WARN ResolveSessionCatalog: A Hive serde table will be 
> created as there is no table provider specified. You can set 
> spark.sql.legacy.createHiveTableByDefault to false so that native data source 
> table will be created instead.
> 22/11/10 14:04:28 WARN HiveMetaStore: Location: 
> file:/Users/serge.rielau/spark/spark-warehouse/tttd specified for 
> non-external table:tttd
> Time taken: 0.078 seconds
> spark-sql> INSERT INTO tttd VALUES(null);
> Time taken: 0.36 seconds
> spark-sql> SELECT * FROM tttd;
> NULL
> Time taken: 0.074 seconds, Fetched 1 row(s)
> spark-sql> 
> Does hive not support NOT NULL? That's fine, but then we should fail on 
> CREATE TABLE



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >