date:20240620

[jira] [Created] (SPARK-48682) Use ICU in InitCap expression (UTF8_BINARY collation)

2024-06-20 Thread Jira

Uroš Bojanić created SPARK-48682:


 Summary: Use ICU in InitCap expression (UTF8_BINARY collation)
 Key: SPARK-48682
 URL: https://issues.apache.org/jira/browse/SPARK-48682
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Uroš Bojanić






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48681) Use ICU in Lower/Upper expressions (UTF8_BINARY collation)

2024-06-20 Thread Jira

Uroš Bojanić created SPARK-48681:


 Summary: Use ICU in Lower/Upper expressions (UTF8_BINARY collation)
 Key: SPARK-48681
 URL: https://issues.apache.org/jira/browse/SPARK-48681
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Uroš Bojanić






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48680) Add char/varchar doc to language specific tables

2024-06-20 Thread Kent Yao (Jira)

Kent Yao created SPARK-48680:


 Summary: Add char/varchar doc to language specific tables
 Key: SPARK-48680
 URL: https://issues.apache.org/jira/browse/SPARK-48680
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48656) ArrayIndexOutOfBoundsException in CartesianRDD getPartitions

2024-06-20 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48656.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47019
[https://github.com/apache/spark/pull/47019]

> ArrayIndexOutOfBoundsException in CartesianRDD getPartitions
> 
>
> Key: SPARK-48656
> URL: https://issues.apache.org/jira/browse/SPARK-48656
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Nick Young
>Assignee: Wei Guo
>Priority: Major
> Fix For: 4.0.0
>
>
> ```val rdd1 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = 65536)
> val rdd2 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = 
> 65536)rdd2.cartesian(rdd1).partitions```
> Throws `ArrayIndexOutOfBoundsException: 0` at CartesianRDD.scala:69 because 
> `s1.index * numPartitionsInRdd2 + s2.index` overflows and wraps to 0. We 
> should provide a better error message which indicates the number of partition 
> overflows so it's easier for the user to debug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48656) ArrayIndexOutOfBoundsException in CartesianRDD getPartitions

2024-06-20 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48656:


Assignee: Wei Guo

> ArrayIndexOutOfBoundsException in CartesianRDD getPartitions
> 
>
> Key: SPARK-48656
> URL: https://issues.apache.org/jira/browse/SPARK-48656
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Nick Young
>Assignee: Wei Guo
>Priority: Major
>
> ```val rdd1 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = 65536)
> val rdd2 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = 
> 65536)rdd2.cartesian(rdd1).partitions```
> Throws `ArrayIndexOutOfBoundsException: 0` at CartesianRDD.scala:69 because 
> `s1.index * numPartitionsInRdd2 + s2.index` overflows and wraps to 0. We 
> should provide a better error message which indicates the number of partition 
> overflows so it's easier for the user to debug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48630) Make merge_spark_pr properly format revert PR

2024-06-20 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-48630:
-

Assignee: Ruifeng Zheng

> Make merge_spark_pr properly format revert PR
> -
>
> Key: SPARK-48630
> URL: https://issues.apache.org/jira/browse/SPARK-48630
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48630) Make merge_spark_pr properly format revert PR

2024-06-20 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-48630.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46988
[https://github.com/apache/spark/pull/46988]

> Make merge_spark_pr properly format revert PR
> -
>
> Key: SPARK-48630
> URL: https://issues.apache.org/jira/browse/SPARK-48630
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48672) Update Jakarta Servlet reference in security page

2024-06-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48672:


Assignee: Cheng Pan

> Update Jakarta Servlet reference in security page
> -
>
> Key: SPARK-48672
> URL: https://issues.apache.org/jira/browse/SPARK-48672
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48672) Update Jakarta Servlet reference in security page

2024-06-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48672.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47044
[https://github.com/apache/spark/pull/47044]

> Update Jakarta Servlet reference in security page
> -
>
> Key: SPARK-48672
> URL: https://issues.apache.org/jira/browse/SPARK-48672
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48631) Fix test case "error during accessing host local dirs for executors"

2024-06-20 Thread Wu Yi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wu Yi resolved SPARK-48631.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46989
[https://github.com/apache/spark/pull/46989]

> Fix test case "error during accessing host local dirs for executors"
> 
>
> Key: SPARK-48631
> URL: https://issues.apache.org/jira/browse/SPARK-48631
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Bo Zhang
>Assignee: Bo Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> There is a logical error in test case "error during accessing host local dirs 
> for executors" in ShuffleBlockFetcherIteratorSuite.
> It tries to test fetching host-local blocks, but the host-local 
> BlockManagerId is configured incorrectly, and ShuffleBlockFetcherIterator 
> will treat those blocks as remote blocks instead.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48631) Fix test case "error during accessing host local dirs for executors"

2024-06-20 Thread Wu Yi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wu Yi reassigned SPARK-48631:
-

Assignee: Bo Zhang

> Fix test case "error during accessing host local dirs for executors"
> 
>
> Key: SPARK-48631
> URL: https://issues.apache.org/jira/browse/SPARK-48631
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Bo Zhang
>Assignee: Bo Zhang
>Priority: Major
>  Labels: pull-request-available
>
> There is a logical error in test case "error during accessing host local dirs 
> for executors" in ShuffleBlockFetcherIteratorSuite.
> It tries to test fetching host-local blocks, but the host-local 
> BlockManagerId is configured incorrectly, and ShuffleBlockFetcherIterator 
> will treat those blocks as remote blocks instead.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48661) Upgrade RoaringBitmap to 1.1.0

2024-06-20 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-48661:


Assignee: Wei Guo

> Upgrade RoaringBitmap to 1.1.0
> --
>
> Key: SPARK-48661
> URL: https://issues.apache.org/jira/browse/SPARK-48661
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Wei Guo
>Assignee: Wei Guo
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48661) Upgrade RoaringBitmap to 1.1.0

2024-06-20 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-48661.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47020
[https://github.com/apache/spark/pull/47020]

> Upgrade RoaringBitmap to 1.1.0
> --
>
> Key: SPARK-48661
> URL: https://issues.apache.org/jira/browse/SPARK-48661
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Wei Guo
>Assignee: Wei Guo
>Priority: Minor
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48677) Upgrade `scalafmt` to 3.8.2

2024-06-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48677.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47048
[https://github.com/apache/spark/pull/47048]

> Upgrade `scalafmt` to 3.8.2
> ---
>
> Key: SPARK-48677
> URL: https://issues.apache.org/jira/browse/SPARK-48677
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48679) Upgrade checkstyle and spotbugs version

2024-06-20 Thread Zhou JIANG (Jira)

Zhou JIANG created SPARK-48679:
--

 Summary: Upgrade checkstyle and spotbugs version
 Key: SPARK-48679
 URL: https://issues.apache.org/jira/browse/SPARK-48679
 Project: Spark
  Issue Type: Sub-task
  Components: k8s
Affects Versions: kubernetes-operator-0.1.0
Reporter: Zhou JIANG


Upgrade checkstyle/spotbugs versions to latest in operator



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48653) Fix Python data source error class references

2024-06-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48653.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47013
[https://github.com/apache/spark/pull/47013]

> Fix Python data source error class references
> -
>
> Key: SPARK-48653
> URL: https://issues.apache.org/jira/browse/SPARK-48653
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Fix invalid error class references.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48653) Fix Python data source error class references

2024-06-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48653:


Assignee: Allison Wang

> Fix Python data source error class references
> -
>
> Key: SPARK-48653
> URL: https://issues.apache.org/jira/browse/SPARK-48653
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Fix invalid error class references.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48635) Assign classes to join type errors and as-of join error

2024-06-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48635:


Assignee: Wei Guo

>  Assign classes to join type errors  and as-of join error
> -
>
> Key: SPARK-48635
> URL: https://issues.apache.org/jira/browse/SPARK-48635
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wei Guo
>Assignee: Wei Guo
>Priority: Minor
>  Labels: pull-request-available
>
> job type errors: 
> LEGACY_ERROR_TEMP[1319, 3216]
> as-of join error:
> _LEGACY_ERROR_TEMP_3217



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48635) Assign classes to join type errors and as-of join error

2024-06-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48635.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46994
[https://github.com/apache/spark/pull/46994]

>  Assign classes to join type errors  and as-of join error
> -
>
> Key: SPARK-48635
> URL: https://issues.apache.org/jira/browse/SPARK-48635
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wei Guo
>Assignee: Wei Guo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> job type errors: 
> LEGACY_ERROR_TEMP[1319, 3216]
> as-of join error:
> _LEGACY_ERROR_TEMP_3217



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48677) Upgrade `scalafmt` to 3.8.2

2024-06-20 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-48677:
---

 Summary: Upgrade `scalafmt` to 3.8.2
 Key: SPARK-48677
 URL: https://issues.apache.org/jira/browse/SPARK-48677
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48676) Structured Logging Framework Scala Style Migration [Part 2]

2024-06-20 Thread Amanda Liu (Jira)

Amanda Liu created SPARK-48676:
--

 Summary: Structured Logging Framework Scala Style Migration [Part 
2]
 Key: SPARK-48676
 URL: https://issues.apache.org/jira/browse/SPARK-48676
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Amanda Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-48674) Refactor SparkConnect Service to extracted error handling functions to trait

2024-06-20 Thread Arun sethia (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856559#comment-17856559
 ] 

Arun sethia commented on SPARK-48674:
-

I think we can do cherry picking from 3.5 (ErrorUtils).

> Refactor SparkConnect Service to extracted error handling functions to trait
> 
>
> Key: SPARK-48674
> URL: https://issues.apache.org/jira/browse/SPARK-48674
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.2, 3.4.0, 3.4.1, 3.4.3
>Reporter: Arun sethia
>Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Since SparkConnect gRPC server can have multiple services (addService 
> function on NettyServerBuilder) and these functions can be reused across 
> services, specially when we would like to extend sparkconnect with various 
> services.
> We can extract error handling functions from SparkConnectService to a trait, 
> that will increase code reusability. By doing this we can reuse these 
> functions across multiple service implementations. Since we can add multiple 
> Bindable service handlers to SparkConnect gRPC server it will be easy to use 
> such common functions to handle errors and exceptions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48674) Refactor SparkConnect Service to extracted error handling functions to trait

2024-06-20 Thread Arun sethia (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun sethia updated SPARK-48674:

Affects Version/s: (was: 3.5.0)
   (was: 3.5.1)

> Refactor SparkConnect Service to extracted error handling functions to trait
> 
>
> Key: SPARK-48674
> URL: https://issues.apache.org/jira/browse/SPARK-48674
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.2, 3.4.0, 3.4.1, 3.4.3
>Reporter: Arun sethia
>Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Since SparkConnect gRPC server can have multiple services (addService 
> function on NettyServerBuilder) and these functions can be reused across 
> services, specially when we would like to extend sparkconnect with various 
> services.
> We can extract error handling functions from SparkConnectService to a trait, 
> that will increase code reusability. By doing this we can reuse these 
> functions across multiple service implementations. Since we can add multiple 
> Bindable service handlers to SparkConnect gRPC server it will be easy to use 
> such common functions to handle errors and exceptions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48675) Cache table doesn't work with collated column

2024-06-20 Thread Nikola Mandic (Jira)

Nikola Mandic created SPARK-48675:
-

 Summary: Cache table doesn't work with collated column
 Key: SPARK-48675
 URL: https://issues.apache.org/jira/browse/SPARK-48675
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Nikola Mandic


Following sequence of queries produces the error:
{code:java}
>  cache lazy table t as select col from values ('a' collate utf8_lcase) as 
> (col);
> select col from t;
org.apache.spark.SparkException: not support type: 
org.apache.spark.sql.types.StringType@1.
        at 
org.apache.spark.sql.errors.QueryExecutionErrors$.notSupportTypeError(QueryExecutionErrors.scala:1069)
        at 
org.apache.spark.sql.execution.columnar.ColumnBuilder$.apply(ColumnBuilder.scala:200)
        at 
org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.$anonfun$next$1(InMemoryRelation.scala:85)
        at scala.collection.immutable.List.map(List.scala:247)
        at scala.collection.immutable.List.map(List.scala:79)
        at 
org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.next(InMemoryRelation.scala:84)
        at 
org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.next(InMemoryRelation.scala:82)
        at 
org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$2.next(InMemoryRelation.scala:296)
        at 
org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$2.next(InMemoryRelation.scala:293)
... {code}
This is also the problem on non-lazy cached tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-43496) Have a separate config for Memory limits for kubernetes pods

2024-06-20 Thread James Boylan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856508#comment-17856508
 ] 

James Boylan edited comment on SPARK-43496 at 6/20/24 2:30 PM:
---

I can't emphasis enough how important this feature is, and how badly it is 
needed. Also, we should update the ticket to show that it impacts 3.4.0 and 
3.5.0.

I agree having a default behavior of just setting the limits and requests based 
on the Cores and Memory setting makes sense for default, the configuration is 
completely counter to standard Kubernetes practice and actually makes it 
difficult to manage Spark processes on a cluster in a cost effective manner. 

[~julienlau] *said:*
{quote}new options:
_spark.kubernetes.driver.requests.cpu_
_spark.kubernetes.driver.requests.memory_
_spark.kubernetes.driver.limits.cpu_
_spark.kubernetes.driver.limits.memory_
_spark.kubernetes.executor.requests.cpu_
_spark.kubernetes.executor.requests.memory_
_spark.kubernetes.executor.limits.cpu_
_spark.kubernetes.executor.limits.memory_
if unset then stay consistent with current behavior
if set to 0 then disable this definition

This would also solve the issue that driver/executor core is defined as an 
Integer and cannot be 0.5 for a driver.
{quote}
Honestly, this would be the absolute perfect implementation of the feature, and 
line up exactly how applications should support Kubernetes.

This is an area that Spark is painfully losing out to applications like Flink. 
Since Flink does not manage the creation of the Task Managers, it allows 
administrators to build out the manifest to specifically meet the needs of 
their environment. 

I understand why Spark does manage the executor deployments, and I agree with 
the reasoning, the configuration options need to be available to handle all of 
the settings required within the deployment onto Kubernetes. This is almost 
entirely handled by the pod templates, with the exception of Memory and Core 
limits/requests settings.


was (Author: drahkar):
I can't emphasis enough how important this feature is, and how badly it is 
needed. I agree having a default behavior of just setting the limits and 
requests based on the Cores and Memory setting makes sense for default, the 
configuration is completely counter to standard Kubernetes practice and 
actually makes it difficult to manage Spark processes on a cluster in a cost 
effective manner. 

[~julienlau] *said:*
{quote}new options:
_spark.kubernetes.driver.requests.cpu_
_spark.kubernetes.driver.requests.memory_
_spark.kubernetes.driver.limits.cpu_
_spark.kubernetes.driver.limits.memory_
_spark.kubernetes.executor.requests.cpu_
_spark.kubernetes.executor.requests.memory_
_spark.kubernetes.executor.limits.cpu_
_spark.kubernetes.executor.limits.memory_
if unset then stay consistent with current behavior
if set to 0 then disable this definition

This would also solve the issue that driver/executor core is defined as an 
Integer and cannot be 0.5 for a driver.{quote}
Honestly, this would be the absolute perfect implementation of the feature, and 
line up exactly how applications should support Kubernetes.

This is an area that Spark is painfully losing out to applications like Flink. 
Since Flink does not manage the creation of the Task Managers, it allows 
administrators to build out the manifest to specifically meet the needs of 
their environment. 

I understand why Spark does manage the executor deployments, and I agree with 
the reasoning, the configuration options need to be available to handle all of 
the settings required within the deployment onto Kubernetes. This is almost 
entirely handled by the pod templates, with the exception of Memory and Core 
limits/requests settings.

> Have a separate config for Memory limits for kubernetes pods
> 
>
> Key: SPARK-43496
> URL: https://issues.apache.org/jira/browse/SPARK-43496
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Alexander Yerenkow
>Priority: Major
>  Labels: pull-request-available
>
> Whole allocated memory to JVM is set into pod resources as both request and 
> limits.
> This means there's not a way to use more memory for burst-like jobs in a 
> shared environment.
> For example, if spark job uses external process (outside of JVM) to access 
> data, a bit of extra memory required for that, and having configured higher 
> limits for mem could be of use.
> Another thought here - have a way to configure different JVM/ pod memory 
> request also could be a valid use case.
>  
> Github PR: [https://github.com/apache/spark/pull/41067]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (SPARK-43496) Have a separate config for Memory limits for kubernetes pods

2024-06-20 Thread James Boylan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856508#comment-17856508
 ] 

James Boylan commented on SPARK-43496:
--

I can't emphasis enough how important this feature is, and how badly it is 
needed. I agree having a default behavior of just setting the limits and 
requests based on the Cores and Memory setting makes sense for default, the 
configuration is completely counter to standard Kubernetes practice and 
actually makes it difficult to manage Spark processes on a cluster in a cost 
effective manner. 

[~julienlau] *said:*
{quote}new options:
_spark.kubernetes.driver.requests.cpu_
_spark.kubernetes.driver.requests.memory_
_spark.kubernetes.driver.limits.cpu_
_spark.kubernetes.driver.limits.memory_
_spark.kubernetes.executor.requests.cpu_
_spark.kubernetes.executor.requests.memory_
_spark.kubernetes.executor.limits.cpu_
_spark.kubernetes.executor.limits.memory_
if unset then stay consistent with current behavior
if set to 0 then disable this definition

This would also solve the issue that driver/executor core is defined as an 
Integer and cannot be 0.5 for a driver.{quote}
Honestly, this would be the absolute perfect implementation of the feature, and 
line up exactly how applications should support Kubernetes.

This is an area that Spark is painfully losing out to applications like Flink. 
Since Flink does not manage the creation of the Task Managers, it allows 
administrators to build out the manifest to specifically meet the needs of 
their environment. 

I understand why Spark does manage the executor deployments, and I agree with 
the reasoning, the configuration options need to be available to handle all of 
the settings required within the deployment onto Kubernetes. This is almost 
entirely handled by the pod templates, with the exception of Memory and Core 
limits/requests settings.

> Have a separate config for Memory limits for kubernetes pods
> 
>
> Key: SPARK-43496
> URL: https://issues.apache.org/jira/browse/SPARK-43496
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Alexander Yerenkow
>Priority: Major
>  Labels: pull-request-available
>
> Whole allocated memory to JVM is set into pod resources as both request and 
> limits.
> This means there's not a way to use more memory for burst-like jobs in a 
> shared environment.
> For example, if spark job uses external process (outside of JVM) to access 
> data, a bit of extra memory required for that, and having configured higher 
> limits for mem could be of use.
> Another thought here - have a way to configure different JVM/ pod memory 
> request also could be a valid use case.
>  
> Github PR: [https://github.com/apache/spark/pull/41067]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48674) Refactor SparkConnect Service to extracted error handling functions to trait

2024-06-20 Thread Arun sethia (Jira)

Arun sethia created SPARK-48674:
---

 Summary: Refactor SparkConnect Service to extracted error handling 
functions to trait
 Key: SPARK-48674
 URL: https://issues.apache.org/jira/browse/SPARK-48674
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.3, 3.5.1, 3.5.0, 3.4.1, 3.4.0, 3.4.2
Reporter: Arun sethia


Since SparkConnect gRPC server can have multiple services (addService function 
on NettyServerBuilder) and these functions can be reused across services, 
specially when we would like to extend sparkconnect with various services.

We can extract error handling functions from SparkConnectService to a trait, 
that will increase code reusability. By doing this we can reuse these functions 
across multiple service implementations. Since we can add multiple Bindable 
service handlers to SparkConnect gRPC server it will be easy to use such common 
functions to handle errors and exceptions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48673) Scheduling Across Applications in k8s mode

2024-06-20 Thread Samba Shiva (Jira)

Samba Shiva created SPARK-48673:
---

 Summary: Scheduling Across Applications in k8s mode 
 Key: SPARK-48673
 URL: https://issues.apache.org/jira/browse/SPARK-48673
 Project: Spark
  Issue Type: New Feature
  Components: k8s, Kubernetes, Scheduler, Spark Shell, Spark Submit
Affects Versions: 3.5.1
Reporter: Samba Shiva


I have been trying autoscaling in Kubernetes for Spark Jobs,When first job is 
triggered based on load workers pods are scaling which is fine but When second 
job is submitted its not getting allocating any resources as First Job is 
consuming all the resources.

Second job is in Waiting State until First Job is finished.I have gone through 
documentation to set max cores in standalone mode which is not a ideal solution 
as we are planning autoscaling based on load and Jobs submitted.

Is there any solution for this or any alternatives ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48672) Update Jakarta Servlet reference in security page

2024-06-20 Thread Cheng Pan (Jira)

Cheng Pan created SPARK-48672:
-

 Summary: Update Jakarta Servlet reference in security page
 Key: SPARK-48672
 URL: https://issues.apache.org/jira/browse/SPARK-48672
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results

2024-06-20 Thread Abhishek Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh updated SPARK-48652:
---
Labels: newbie  (was: )

> Casting Issue in Spark SQL: String Column Compared to Integer Value Yields 
> Empty Results
> 
>
> Key: SPARK-48652
> URL: https://issues.apache.org/jira/browse/SPARK-48652
> Project: Spark
>  Issue Type: Brainstorming
>  Components: Spark Core, SQL
>Affects Versions: 3.3.2
>Reporter: Abhishek Singh
>Priority: Blocker
>  Labels: newbie
>
> In Spark SQL, comparing a string column to an integer value can lead to 
> unexpected results due to type casting resulting in an empty result set.
> {code:java}
> case class Person(id: String, name: String)
> val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF()
> personDF.createOrReplaceTempView("person_ddf")
> val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1"
> val resultDF = spark.sql(sqlQuery)
> resultDF.show() // Empty result due to type casting issue 
> {code}
> Below is the logical and physical plan which I m getting
> {code:java}
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'Filter NOT ('id = -1)
>+- 'UnresolvedRelation [person_ddf], [], false
> == Analyzed Logical Plan ==
> id: string, name: string
> Project [id#356, name#357]
> +- Filter NOT (cast(id#356 as int) = -1)
>+- SubqueryAlias person_ddf
>   +- View (`person_ddf`, [id#356,name#357])
>  +- LocalRelation [id#356, name#357]{code}
> *But when I m using the same query and table in Redshift which is based on 
> PostGreSQL. I am getting the desired result.*
> {code:java}
> select * from person where id <> -1; {code}
> Explain plan obtained in Redshift.
> {code:java}
> XN Seq Scan on person  (cost=0.00..0.03 rows=1 width=336)
>   Filter: ((id)::text <> '-1'::text) {code}
>  
> In the execution plan for Spark, the ID column is cast as an integer, while 
> in Redshift, the ID column is cast as a varchar.
> Shouldn't Spark SQL handle this the same way as Redshift, using the datatype 
> of the ID column rather than the datatype of -1?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results

2024-06-20 Thread Abhishek Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh updated SPARK-48652:
---
Issue Type: Brainstorming  (was: Question)

> Casting Issue in Spark SQL: String Column Compared to Integer Value Yields 
> Empty Results
> 
>
> Key: SPARK-48652
> URL: https://issues.apache.org/jira/browse/SPARK-48652
> Project: Spark
>  Issue Type: Brainstorming
>  Components: Spark Core, SQL
>Affects Versions: 3.3.2
>Reporter: Abhishek Singh
>Priority: Minor
>
> In Spark SQL, comparing a string column to an integer value can lead to 
> unexpected results due to type casting resulting in an empty result set.
> {code:java}
> case class Person(id: String, name: String)
> val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF()
> personDF.createOrReplaceTempView("person_ddf")
> val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1"
> val resultDF = spark.sql(sqlQuery)
> resultDF.show() // Empty result due to type casting issue 
> {code}
> Below is the logical and physical plan which I m getting
> {code:java}
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'Filter NOT ('id = -1)
>+- 'UnresolvedRelation [person_ddf], [], false
> == Analyzed Logical Plan ==
> id: string, name: string
> Project [id#356, name#357]
> +- Filter NOT (cast(id#356 as int) = -1)
>+- SubqueryAlias person_ddf
>   +- View (`person_ddf`, [id#356,name#357])
>  +- LocalRelation [id#356, name#357]{code}
> *But when I m using the same query and table in Redshift which is based on 
> PostGreSQL. I am getting the desired result.*
> {code:java}
> select * from person where id <> -1; {code}
> Explain plan obtained in Redshift.
> {code:java}
> XN Seq Scan on person  (cost=0.00..0.03 rows=1 width=336)
>   Filter: ((id)::text <> '-1'::text) {code}
>  
> In the execution plan for Spark, the ID column is cast as an integer, while 
> in Redshift, the ID column is cast as a varchar.
> Shouldn't Spark SQL handle this the same way as Redshift, using the datatype 
> of the ID column rather than the datatype of -1?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results

2024-06-20 Thread Abhishek Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh updated SPARK-48652:
---
Priority: Blocker  (was: Minor)

> Casting Issue in Spark SQL: String Column Compared to Integer Value Yields 
> Empty Results
> 
>
> Key: SPARK-48652
> URL: https://issues.apache.org/jira/browse/SPARK-48652
> Project: Spark
>  Issue Type: Brainstorming
>  Components: Spark Core, SQL
>Affects Versions: 3.3.2
>Reporter: Abhishek Singh
>Priority: Blocker
>
> In Spark SQL, comparing a string column to an integer value can lead to 
> unexpected results due to type casting resulting in an empty result set.
> {code:java}
> case class Person(id: String, name: String)
> val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF()
> personDF.createOrReplaceTempView("person_ddf")
> val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1"
> val resultDF = spark.sql(sqlQuery)
> resultDF.show() // Empty result due to type casting issue 
> {code}
> Below is the logical and physical plan which I m getting
> {code:java}
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'Filter NOT ('id = -1)
>+- 'UnresolvedRelation [person_ddf], [], false
> == Analyzed Logical Plan ==
> id: string, name: string
> Project [id#356, name#357]
> +- Filter NOT (cast(id#356 as int) = -1)
>+- SubqueryAlias person_ddf
>   +- View (`person_ddf`, [id#356,name#357])
>  +- LocalRelation [id#356, name#357]{code}
> *But when I m using the same query and table in Redshift which is based on 
> PostGreSQL. I am getting the desired result.*
> {code:java}
> select * from person where id <> -1; {code}
> Explain plan obtained in Redshift.
> {code:java}
> XN Seq Scan on person  (cost=0.00..0.03 rows=1 width=336)
>   Filter: ((id)::text <> '-1'::text) {code}
>  
> In the execution plan for Spark, the ID column is cast as an integer, while 
> in Redshift, the ID column is cast as a varchar.
> Shouldn't Spark SQL handle this the same way as Redshift, using the datatype 
> of the ID column rather than the datatype of -1?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48671) Add test cases for Hex.hex

2024-06-20 Thread Wei Guo (Jira)

Wei Guo created SPARK-48671:
---

 Summary: Add test cases for Hex.hex
 Key: SPARK-48671
 URL: https://issues.apache.org/jira/browse/SPARK-48671
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wei Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-48666) A filter should not be pushed down if it contains Unevaluable expression

2024-06-20 Thread Yokesh NK (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856433#comment-17856433
 ] 

Yokesh NK commented on SPARK-48666:
---

During `PruneFileSourcePartitions` optimization, it converts the expression 
`isnotnull(getdata(cast(snapshot_date#2 as string))#30)` into 
`isnotnull(getdata(cast(input[1, int, true] as string))#30)`. In this case, 
`getdata` is a Python User-Defined Function (PythonUDF). However, when 
attempting to evaluate the transformed expression `getdata(cast(input[1, int, 
true] as string))`, the function fails to execute correctly. Just to test, 
excluding the rule `PruneFileSourcePartitions`, let this execution complete 
with no issue. So, this bug will be fixed in Spark.

> A filter should not be pushed down if it contains Unevaluable expression
> 
>
> Key: SPARK-48666
> URL: https://issues.apache.org/jira/browse/SPARK-48666
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wei Zheng
>Priority: Major
>
> We should avoid pushing down Unevaluable expression as it can cause 
> unexpected failures. For example, the code snippet below (assuming there is a 
> table {{_t_}} with a partition column {{{_}p{_})}}
> {code:java}
> from pyspark import SparkConf
> from pyspark.sql import SparkSession
> from pyspark.sql.types import StringType
> import pyspark.sql.functions as f
> def getdata(p: str) -> str:
> return "data"
> NEW_COLUMN = 'new_column'
> P_COLUMN = 'p'
> f_getdata = f.udf(getdata, StringType())
> rows = spark.sql("select * from default.t")
> table = rows.withColumn(NEW_COLUMN, f_getdata(f.col(P_COLUMN)))
> df = table.alias('t1').join(table.alias('t2'), (f.col(f"t1.{NEW_COLUMN}") == 
> f.col(f"t2.{NEW_COLUMN}")), how='inner')
> df.show(){code}
> will cause an error like:
> {code:java}
> org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot evaluate expression: 
> getdata(input[0, string, true])#16
>     at org.apache.spark.SparkException$.internalError(SparkException.scala:92)
>     at org.apache.spark.SparkException$.internalError(SparkException.scala:96)
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.cannotEvaluateExpressionError(QueryExecutionErrors.scala:66)
>     at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.eval(Expression.scala:391)
>     at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.eval$(Expression.scala:390)
>     at 
> org.apache.spark.sql.catalyst.expressions.PythonUDF.eval(PythonUDF.scala:71)
>     at 
> org.apache.spark.sql.catalyst.expressions.IsNotNull.eval(nullExpressions.scala:384)
>     at 
> org.apache.spark.sql.catalyst.expressions.InterpretedPredicate.eval(predicates.scala:52)
>     at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.$anonfun$prunePartitionsByFilter$1(ExternalCatalogUtils.scala:166)
>     at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.$anonfun$prunePartitionsByFilter$1$adapted(ExternalCatalogUtils.scala:165)
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48669) K8s resource name prefix follows DNS Subdomain Names rule

2024-06-20 Thread Xi Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Chen updated SPARK-48669:

Summary: K8s resource name prefix follows DNS Subdomain Names rule  (was: 
Limit K8s pod name length to follow DNS Subdomain Names rule)

> K8s resource name prefix follows DNS Subdomain Names rule
> -
>
> Key: SPARK-48669
> URL: https://issues.apache.org/jira/browse/SPARK-48669
> Project: Spark
>  Issue Type: Bug
>  Components: k8s
>Affects Versions: 3.5.1
>Reporter: Xi Chen
>Priority: Major
>
> In SPARK-39614, we extended the allowed name length from 63 to 253 for 
> executor pod and config map.
> However, when the pod name is exceeded length 253, we don't truncate it. This 
> leads to error when creating the Spark pods.
> Error example:
> {code:java}
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: POST at: 
> https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/foo/pods. 
> Message: Pod "some-super-long-spark-pod-name-exceeded-length-253-driver" is 
> invalid: metadata.name: Invalid value: 
> "some-super-long-spark-pod-name-exceeded-length-253-driver": must be no more 
> than 253 characters. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48669) Limit K8s pod name length to follow DNS Subdomain Names rule

2024-06-20 Thread Xi Chen (Jira)

Xi Chen created SPARK-48669:
---

 Summary: Limit K8s pod name length to follow DNS Subdomain Names 
rule
 Key: SPARK-48669
 URL: https://issues.apache.org/jira/browse/SPARK-48669
 Project: Spark
  Issue Type: Bug
  Components: k8s
Affects Versions: 3.5.1
Reporter: Xi Chen


In SPARK-39614, we extended the allowed name length from 63 to 253 for executor 
pod and config map.

However, when the pod name is exceeded length 253, we don't truncate it. This 
leads to error when creating the Spark pods.

Error example:
{code:java}
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure 
executing: POST at: 
https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/foo/pods. 
Message: Pod "some-super-long-spark-pod-name-exceeded-length-253-driver" is 
invalid: metadata.name: Invalid value: 
"some-super-long-spark-pod-name-exceeded-length-253-driver": must be no more 
than 253 characters. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48668) Support ALTER NAMESPACE ... UNSET PROPERTIES in v2

2024-06-20 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-48668:
---

 Summary:  Support ALTER NAMESPACE ... UNSET PROPERTIES in v2
 Key: SPARK-48668
 URL: https://issues.apache.org/jira/browse/SPARK-48668
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

37 matches

Mail list logo