[jira] [Created] (SPARK-48682) Use ICU in InitCap expression (UTF8_BINARY collation)
Uroš Bojanić created SPARK-48682: Summary: Use ICU in InitCap expression (UTF8_BINARY collation) Key: SPARK-48682 URL: https://issues.apache.org/jira/browse/SPARK-48682 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Uroš Bojanić -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48681) Use ICU in Lower/Upper expressions (UTF8_BINARY collation)
Uroš Bojanić created SPARK-48681: Summary: Use ICU in Lower/Upper expressions (UTF8_BINARY collation) Key: SPARK-48681 URL: https://issues.apache.org/jira/browse/SPARK-48681 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Uroš Bojanić -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48680) Add char/varchar doc to language specific tables
Kent Yao created SPARK-48680: Summary: Add char/varchar doc to language specific tables Key: SPARK-48680 URL: https://issues.apache.org/jira/browse/SPARK-48680 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48656) ArrayIndexOutOfBoundsException in CartesianRDD getPartitions
[ https://issues.apache.org/jira/browse/SPARK-48656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48656. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47019 [https://github.com/apache/spark/pull/47019] > ArrayIndexOutOfBoundsException in CartesianRDD getPartitions > > > Key: SPARK-48656 > URL: https://issues.apache.org/jira/browse/SPARK-48656 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Nick Young >Assignee: Wei Guo >Priority: Major > Fix For: 4.0.0 > > > ```val rdd1 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = 65536) > val rdd2 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = > 65536)rdd2.cartesian(rdd1).partitions``` > Throws `ArrayIndexOutOfBoundsException: 0` at CartesianRDD.scala:69 because > `s1.index * numPartitionsInRdd2 + s2.index` overflows and wraps to 0. We > should provide a better error message which indicates the number of partition > overflows so it's easier for the user to debug. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48656) ArrayIndexOutOfBoundsException in CartesianRDD getPartitions
[ https://issues.apache.org/jira/browse/SPARK-48656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48656: Assignee: Wei Guo > ArrayIndexOutOfBoundsException in CartesianRDD getPartitions > > > Key: SPARK-48656 > URL: https://issues.apache.org/jira/browse/SPARK-48656 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Nick Young >Assignee: Wei Guo >Priority: Major > > ```val rdd1 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = 65536) > val rdd2 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = > 65536)rdd2.cartesian(rdd1).partitions``` > Throws `ArrayIndexOutOfBoundsException: 0` at CartesianRDD.scala:69 because > `s1.index * numPartitionsInRdd2 + s2.index` overflows and wraps to 0. We > should provide a better error message which indicates the number of partition > overflows so it's easier for the user to debug. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48630) Make merge_spark_pr properly format revert PR
[ https://issues.apache.org/jira/browse/SPARK-48630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-48630: - Assignee: Ruifeng Zheng > Make merge_spark_pr properly format revert PR > - > > Key: SPARK-48630 > URL: https://issues.apache.org/jira/browse/SPARK-48630 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48630) Make merge_spark_pr properly format revert PR
[ https://issues.apache.org/jira/browse/SPARK-48630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-48630. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46988 [https://github.com/apache/spark/pull/46988] > Make merge_spark_pr properly format revert PR > - > > Key: SPARK-48630 > URL: https://issues.apache.org/jira/browse/SPARK-48630 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48672) Update Jakarta Servlet reference in security page
[ https://issues.apache.org/jira/browse/SPARK-48672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48672: Assignee: Cheng Pan > Update Jakarta Servlet reference in security page > - > > Key: SPARK-48672 > URL: https://issues.apache.org/jira/browse/SPARK-48672 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48672) Update Jakarta Servlet reference in security page
[ https://issues.apache.org/jira/browse/SPARK-48672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48672. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47044 [https://github.com/apache/spark/pull/47044] > Update Jakarta Servlet reference in security page > - > > Key: SPARK-48672 > URL: https://issues.apache.org/jira/browse/SPARK-48672 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48631) Fix test case "error during accessing host local dirs for executors"
[ https://issues.apache.org/jira/browse/SPARK-48631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wu Yi resolved SPARK-48631. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46989 [https://github.com/apache/spark/pull/46989] > Fix test case "error during accessing host local dirs for executors" > > > Key: SPARK-48631 > URL: https://issues.apache.org/jira/browse/SPARK-48631 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Bo Zhang >Assignee: Bo Zhang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > There is a logical error in test case "error during accessing host local dirs > for executors" in ShuffleBlockFetcherIteratorSuite. > It tries to test fetching host-local blocks, but the host-local > BlockManagerId is configured incorrectly, and ShuffleBlockFetcherIterator > will treat those blocks as remote blocks instead. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48631) Fix test case "error during accessing host local dirs for executors"
[ https://issues.apache.org/jira/browse/SPARK-48631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wu Yi reassigned SPARK-48631: - Assignee: Bo Zhang > Fix test case "error during accessing host local dirs for executors" > > > Key: SPARK-48631 > URL: https://issues.apache.org/jira/browse/SPARK-48631 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Bo Zhang >Assignee: Bo Zhang >Priority: Major > Labels: pull-request-available > > There is a logical error in test case "error during accessing host local dirs > for executors" in ShuffleBlockFetcherIteratorSuite. > It tries to test fetching host-local blocks, but the host-local > BlockManagerId is configured incorrectly, and ShuffleBlockFetcherIterator > will treat those blocks as remote blocks instead. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48661) Upgrade RoaringBitmap to 1.1.0
[ https://issues.apache.org/jira/browse/SPARK-48661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-48661: Assignee: Wei Guo > Upgrade RoaringBitmap to 1.1.0 > -- > > Key: SPARK-48661 > URL: https://issues.apache.org/jira/browse/SPARK-48661 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Wei Guo >Assignee: Wei Guo >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48661) Upgrade RoaringBitmap to 1.1.0
[ https://issues.apache.org/jira/browse/SPARK-48661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-48661. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47020 [https://github.com/apache/spark/pull/47020] > Upgrade RoaringBitmap to 1.1.0 > -- > > Key: SPARK-48661 > URL: https://issues.apache.org/jira/browse/SPARK-48661 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Wei Guo >Assignee: Wei Guo >Priority: Minor > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48677) Upgrade `scalafmt` to 3.8.2
[ https://issues.apache.org/jira/browse/SPARK-48677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48677. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47048 [https://github.com/apache/spark/pull/47048] > Upgrade `scalafmt` to 3.8.2 > --- > > Key: SPARK-48677 > URL: https://issues.apache.org/jira/browse/SPARK-48677 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48679) Upgrade checkstyle and spotbugs version
Zhou JIANG created SPARK-48679: -- Summary: Upgrade checkstyle and spotbugs version Key: SPARK-48679 URL: https://issues.apache.org/jira/browse/SPARK-48679 Project: Spark Issue Type: Sub-task Components: k8s Affects Versions: kubernetes-operator-0.1.0 Reporter: Zhou JIANG Upgrade checkstyle/spotbugs versions to latest in operator -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48653) Fix Python data source error class references
[ https://issues.apache.org/jira/browse/SPARK-48653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48653. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47013 [https://github.com/apache/spark/pull/47013] > Fix Python data source error class references > - > > Key: SPARK-48653 > URL: https://issues.apache.org/jira/browse/SPARK-48653 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Fix invalid error class references. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48653) Fix Python data source error class references
[ https://issues.apache.org/jira/browse/SPARK-48653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48653: Assignee: Allison Wang > Fix Python data source error class references > - > > Key: SPARK-48653 > URL: https://issues.apache.org/jira/browse/SPARK-48653 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > Fix invalid error class references. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48635) Assign classes to join type errors and as-of join error
[ https://issues.apache.org/jira/browse/SPARK-48635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48635: Assignee: Wei Guo > Assign classes to join type errors and as-of join error > - > > Key: SPARK-48635 > URL: https://issues.apache.org/jira/browse/SPARK-48635 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wei Guo >Assignee: Wei Guo >Priority: Minor > Labels: pull-request-available > > job type errors: > LEGACY_ERROR_TEMP[1319, 3216] > as-of join error: > _LEGACY_ERROR_TEMP_3217 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48635) Assign classes to join type errors and as-of join error
[ https://issues.apache.org/jira/browse/SPARK-48635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48635. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46994 [https://github.com/apache/spark/pull/46994] > Assign classes to join type errors and as-of join error > - > > Key: SPARK-48635 > URL: https://issues.apache.org/jira/browse/SPARK-48635 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wei Guo >Assignee: Wei Guo >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > job type errors: > LEGACY_ERROR_TEMP[1319, 3216] > as-of join error: > _LEGACY_ERROR_TEMP_3217 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48677) Upgrade `scalafmt` to 3.8.2
BingKun Pan created SPARK-48677: --- Summary: Upgrade `scalafmt` to 3.8.2 Key: SPARK-48677 URL: https://issues.apache.org/jira/browse/SPARK-48677 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48676) Structured Logging Framework Scala Style Migration [Part 2]
Amanda Liu created SPARK-48676: -- Summary: Structured Logging Framework Scala Style Migration [Part 2] Key: SPARK-48676 URL: https://issues.apache.org/jira/browse/SPARK-48676 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Amanda Liu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48674) Refactor SparkConnect Service to extracted error handling functions to trait
[ https://issues.apache.org/jira/browse/SPARK-48674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856559#comment-17856559 ] Arun sethia commented on SPARK-48674: - I think we can do cherry picking from 3.5 (ErrorUtils). > Refactor SparkConnect Service to extracted error handling functions to trait > > > Key: SPARK-48674 > URL: https://issues.apache.org/jira/browse/SPARK-48674 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.2, 3.4.0, 3.4.1, 3.4.3 >Reporter: Arun sethia >Priority: Minor > Original Estimate: 2h > Remaining Estimate: 2h > > Since SparkConnect gRPC server can have multiple services (addService > function on NettyServerBuilder) and these functions can be reused across > services, specially when we would like to extend sparkconnect with various > services. > We can extract error handling functions from SparkConnectService to a trait, > that will increase code reusability. By doing this we can reuse these > functions across multiple service implementations. Since we can add multiple > Bindable service handlers to SparkConnect gRPC server it will be easy to use > such common functions to handle errors and exceptions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48674) Refactor SparkConnect Service to extracted error handling functions to trait
[ https://issues.apache.org/jira/browse/SPARK-48674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun sethia updated SPARK-48674: Affects Version/s: (was: 3.5.0) (was: 3.5.1) > Refactor SparkConnect Service to extracted error handling functions to trait > > > Key: SPARK-48674 > URL: https://issues.apache.org/jira/browse/SPARK-48674 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.2, 3.4.0, 3.4.1, 3.4.3 >Reporter: Arun sethia >Priority: Minor > Original Estimate: 2h > Remaining Estimate: 2h > > Since SparkConnect gRPC server can have multiple services (addService > function on NettyServerBuilder) and these functions can be reused across > services, specially when we would like to extend sparkconnect with various > services. > We can extract error handling functions from SparkConnectService to a trait, > that will increase code reusability. By doing this we can reuse these > functions across multiple service implementations. Since we can add multiple > Bindable service handlers to SparkConnect gRPC server it will be easy to use > such common functions to handle errors and exceptions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48675) Cache table doesn't work with collated column
Nikola Mandic created SPARK-48675: - Summary: Cache table doesn't work with collated column Key: SPARK-48675 URL: https://issues.apache.org/jira/browse/SPARK-48675 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Nikola Mandic Following sequence of queries produces the error: {code:java} > cache lazy table t as select col from values ('a' collate utf8_lcase) as > (col); > select col from t; org.apache.spark.SparkException: not support type: org.apache.spark.sql.types.StringType@1. at org.apache.spark.sql.errors.QueryExecutionErrors$.notSupportTypeError(QueryExecutionErrors.scala:1069) at org.apache.spark.sql.execution.columnar.ColumnBuilder$.apply(ColumnBuilder.scala:200) at org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.$anonfun$next$1(InMemoryRelation.scala:85) at scala.collection.immutable.List.map(List.scala:247) at scala.collection.immutable.List.map(List.scala:79) at org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.next(InMemoryRelation.scala:84) at org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.next(InMemoryRelation.scala:82) at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$2.next(InMemoryRelation.scala:296) at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$2.next(InMemoryRelation.scala:293) ... {code} This is also the problem on non-lazy cached tables. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-43496) Have a separate config for Memory limits for kubernetes pods
[ https://issues.apache.org/jira/browse/SPARK-43496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856508#comment-17856508 ] James Boylan edited comment on SPARK-43496 at 6/20/24 2:30 PM: --- I can't emphasis enough how important this feature is, and how badly it is needed. Also, we should update the ticket to show that it impacts 3.4.0 and 3.5.0. I agree having a default behavior of just setting the limits and requests based on the Cores and Memory setting makes sense for default, the configuration is completely counter to standard Kubernetes practice and actually makes it difficult to manage Spark processes on a cluster in a cost effective manner. [~julienlau] *said:* {quote}new options: _spark.kubernetes.driver.requests.cpu_ _spark.kubernetes.driver.requests.memory_ _spark.kubernetes.driver.limits.cpu_ _spark.kubernetes.driver.limits.memory_ _spark.kubernetes.executor.requests.cpu_ _spark.kubernetes.executor.requests.memory_ _spark.kubernetes.executor.limits.cpu_ _spark.kubernetes.executor.limits.memory_ if unset then stay consistent with current behavior if set to 0 then disable this definition This would also solve the issue that driver/executor core is defined as an Integer and cannot be 0.5 for a driver. {quote} Honestly, this would be the absolute perfect implementation of the feature, and line up exactly how applications should support Kubernetes. This is an area that Spark is painfully losing out to applications like Flink. Since Flink does not manage the creation of the Task Managers, it allows administrators to build out the manifest to specifically meet the needs of their environment. I understand why Spark does manage the executor deployments, and I agree with the reasoning, the configuration options need to be available to handle all of the settings required within the deployment onto Kubernetes. This is almost entirely handled by the pod templates, with the exception of Memory and Core limits/requests settings. was (Author: drahkar): I can't emphasis enough how important this feature is, and how badly it is needed. I agree having a default behavior of just setting the limits and requests based on the Cores and Memory setting makes sense for default, the configuration is completely counter to standard Kubernetes practice and actually makes it difficult to manage Spark processes on a cluster in a cost effective manner. [~julienlau] *said:* {quote}new options: _spark.kubernetes.driver.requests.cpu_ _spark.kubernetes.driver.requests.memory_ _spark.kubernetes.driver.limits.cpu_ _spark.kubernetes.driver.limits.memory_ _spark.kubernetes.executor.requests.cpu_ _spark.kubernetes.executor.requests.memory_ _spark.kubernetes.executor.limits.cpu_ _spark.kubernetes.executor.limits.memory_ if unset then stay consistent with current behavior if set to 0 then disable this definition This would also solve the issue that driver/executor core is defined as an Integer and cannot be 0.5 for a driver.{quote} Honestly, this would be the absolute perfect implementation of the feature, and line up exactly how applications should support Kubernetes. This is an area that Spark is painfully losing out to applications like Flink. Since Flink does not manage the creation of the Task Managers, it allows administrators to build out the manifest to specifically meet the needs of their environment. I understand why Spark does manage the executor deployments, and I agree with the reasoning, the configuration options need to be available to handle all of the settings required within the deployment onto Kubernetes. This is almost entirely handled by the pod templates, with the exception of Memory and Core limits/requests settings. > Have a separate config for Memory limits for kubernetes pods > > > Key: SPARK-43496 > URL: https://issues.apache.org/jira/browse/SPARK-43496 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Alexander Yerenkow >Priority: Major > Labels: pull-request-available > > Whole allocated memory to JVM is set into pod resources as both request and > limits. > This means there's not a way to use more memory for burst-like jobs in a > shared environment. > For example, if spark job uses external process (outside of JVM) to access > data, a bit of extra memory required for that, and having configured higher > limits for mem could be of use. > Another thought here - have a way to configure different JVM/ pod memory > request also could be a valid use case. > > Github PR: [https://github.com/apache/spark/pull/41067] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (SPARK-43496) Have a separate config for Memory limits for kubernetes pods
[ https://issues.apache.org/jira/browse/SPARK-43496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856508#comment-17856508 ] James Boylan commented on SPARK-43496: -- I can't emphasis enough how important this feature is, and how badly it is needed. I agree having a default behavior of just setting the limits and requests based on the Cores and Memory setting makes sense for default, the configuration is completely counter to standard Kubernetes practice and actually makes it difficult to manage Spark processes on a cluster in a cost effective manner. [~julienlau] *said:* {quote}new options: _spark.kubernetes.driver.requests.cpu_ _spark.kubernetes.driver.requests.memory_ _spark.kubernetes.driver.limits.cpu_ _spark.kubernetes.driver.limits.memory_ _spark.kubernetes.executor.requests.cpu_ _spark.kubernetes.executor.requests.memory_ _spark.kubernetes.executor.limits.cpu_ _spark.kubernetes.executor.limits.memory_ if unset then stay consistent with current behavior if set to 0 then disable this definition This would also solve the issue that driver/executor core is defined as an Integer and cannot be 0.5 for a driver.{quote} Honestly, this would be the absolute perfect implementation of the feature, and line up exactly how applications should support Kubernetes. This is an area that Spark is painfully losing out to applications like Flink. Since Flink does not manage the creation of the Task Managers, it allows administrators to build out the manifest to specifically meet the needs of their environment. I understand why Spark does manage the executor deployments, and I agree with the reasoning, the configuration options need to be available to handle all of the settings required within the deployment onto Kubernetes. This is almost entirely handled by the pod templates, with the exception of Memory and Core limits/requests settings. > Have a separate config for Memory limits for kubernetes pods > > > Key: SPARK-43496 > URL: https://issues.apache.org/jira/browse/SPARK-43496 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Alexander Yerenkow >Priority: Major > Labels: pull-request-available > > Whole allocated memory to JVM is set into pod resources as both request and > limits. > This means there's not a way to use more memory for burst-like jobs in a > shared environment. > For example, if spark job uses external process (outside of JVM) to access > data, a bit of extra memory required for that, and having configured higher > limits for mem could be of use. > Another thought here - have a way to configure different JVM/ pod memory > request also could be a valid use case. > > Github PR: [https://github.com/apache/spark/pull/41067] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48674) Refactor SparkConnect Service to extracted error handling functions to trait
Arun sethia created SPARK-48674: --- Summary: Refactor SparkConnect Service to extracted error handling functions to trait Key: SPARK-48674 URL: https://issues.apache.org/jira/browse/SPARK-48674 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.3, 3.5.1, 3.5.0, 3.4.1, 3.4.0, 3.4.2 Reporter: Arun sethia Since SparkConnect gRPC server can have multiple services (addService function on NettyServerBuilder) and these functions can be reused across services, specially when we would like to extend sparkconnect with various services. We can extract error handling functions from SparkConnectService to a trait, that will increase code reusability. By doing this we can reuse these functions across multiple service implementations. Since we can add multiple Bindable service handlers to SparkConnect gRPC server it will be easy to use such common functions to handle errors and exceptions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48673) Scheduling Across Applications in k8s mode
Samba Shiva created SPARK-48673: --- Summary: Scheduling Across Applications in k8s mode Key: SPARK-48673 URL: https://issues.apache.org/jira/browse/SPARK-48673 Project: Spark Issue Type: New Feature Components: k8s, Kubernetes, Scheduler, Spark Shell, Spark Submit Affects Versions: 3.5.1 Reporter: Samba Shiva I have been trying autoscaling in Kubernetes for Spark Jobs,When first job is triggered based on load workers pods are scaling which is fine but When second job is submitted its not getting allocating any resources as First Job is consuming all the resources. Second job is in Waiting State until First Job is finished.I have gone through documentation to set max cores in standalone mode which is not a ideal solution as we are planning autoscaling based on load and Jobs submitted. Is there any solution for this or any alternatives ? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48672) Update Jakarta Servlet reference in security page
Cheng Pan created SPARK-48672: - Summary: Update Jakarta Servlet reference in security page Key: SPARK-48672 URL: https://issues.apache.org/jira/browse/SPARK-48672 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 4.0.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results
[ https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh updated SPARK-48652: --- Labels: newbie (was: ) > Casting Issue in Spark SQL: String Column Compared to Integer Value Yields > Empty Results > > > Key: SPARK-48652 > URL: https://issues.apache.org/jira/browse/SPARK-48652 > Project: Spark > Issue Type: Brainstorming > Components: Spark Core, SQL >Affects Versions: 3.3.2 >Reporter: Abhishek Singh >Priority: Blocker > Labels: newbie > > In Spark SQL, comparing a string column to an integer value can lead to > unexpected results due to type casting resulting in an empty result set. > {code:java} > case class Person(id: String, name: String) > val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF() > personDF.createOrReplaceTempView("person_ddf") > val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1" > val resultDF = spark.sql(sqlQuery) > resultDF.show() // Empty result due to type casting issue > {code} > Below is the logical and physical plan which I m getting > {code:java} > == Parsed Logical Plan == > 'Project [*] > +- 'Filter NOT ('id = -1) >+- 'UnresolvedRelation [person_ddf], [], false > == Analyzed Logical Plan == > id: string, name: string > Project [id#356, name#357] > +- Filter NOT (cast(id#356 as int) = -1) >+- SubqueryAlias person_ddf > +- View (`person_ddf`, [id#356,name#357]) > +- LocalRelation [id#356, name#357]{code} > *But when I m using the same query and table in Redshift which is based on > PostGreSQL. I am getting the desired result.* > {code:java} > select * from person where id <> -1; {code} > Explain plan obtained in Redshift. > {code:java} > XN Seq Scan on person (cost=0.00..0.03 rows=1 width=336) > Filter: ((id)::text <> '-1'::text) {code} > > In the execution plan for Spark, the ID column is cast as an integer, while > in Redshift, the ID column is cast as a varchar. > Shouldn't Spark SQL handle this the same way as Redshift, using the datatype > of the ID column rather than the datatype of -1? > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results
[ https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh updated SPARK-48652: --- Issue Type: Brainstorming (was: Question) > Casting Issue in Spark SQL: String Column Compared to Integer Value Yields > Empty Results > > > Key: SPARK-48652 > URL: https://issues.apache.org/jira/browse/SPARK-48652 > Project: Spark > Issue Type: Brainstorming > Components: Spark Core, SQL >Affects Versions: 3.3.2 >Reporter: Abhishek Singh >Priority: Minor > > In Spark SQL, comparing a string column to an integer value can lead to > unexpected results due to type casting resulting in an empty result set. > {code:java} > case class Person(id: String, name: String) > val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF() > personDF.createOrReplaceTempView("person_ddf") > val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1" > val resultDF = spark.sql(sqlQuery) > resultDF.show() // Empty result due to type casting issue > {code} > Below is the logical and physical plan which I m getting > {code:java} > == Parsed Logical Plan == > 'Project [*] > +- 'Filter NOT ('id = -1) >+- 'UnresolvedRelation [person_ddf], [], false > == Analyzed Logical Plan == > id: string, name: string > Project [id#356, name#357] > +- Filter NOT (cast(id#356 as int) = -1) >+- SubqueryAlias person_ddf > +- View (`person_ddf`, [id#356,name#357]) > +- LocalRelation [id#356, name#357]{code} > *But when I m using the same query and table in Redshift which is based on > PostGreSQL. I am getting the desired result.* > {code:java} > select * from person where id <> -1; {code} > Explain plan obtained in Redshift. > {code:java} > XN Seq Scan on person (cost=0.00..0.03 rows=1 width=336) > Filter: ((id)::text <> '-1'::text) {code} > > In the execution plan for Spark, the ID column is cast as an integer, while > in Redshift, the ID column is cast as a varchar. > Shouldn't Spark SQL handle this the same way as Redshift, using the datatype > of the ID column rather than the datatype of -1? > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results
[ https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh updated SPARK-48652: --- Priority: Blocker (was: Minor) > Casting Issue in Spark SQL: String Column Compared to Integer Value Yields > Empty Results > > > Key: SPARK-48652 > URL: https://issues.apache.org/jira/browse/SPARK-48652 > Project: Spark > Issue Type: Brainstorming > Components: Spark Core, SQL >Affects Versions: 3.3.2 >Reporter: Abhishek Singh >Priority: Blocker > > In Spark SQL, comparing a string column to an integer value can lead to > unexpected results due to type casting resulting in an empty result set. > {code:java} > case class Person(id: String, name: String) > val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF() > personDF.createOrReplaceTempView("person_ddf") > val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1" > val resultDF = spark.sql(sqlQuery) > resultDF.show() // Empty result due to type casting issue > {code} > Below is the logical and physical plan which I m getting > {code:java} > == Parsed Logical Plan == > 'Project [*] > +- 'Filter NOT ('id = -1) >+- 'UnresolvedRelation [person_ddf], [], false > == Analyzed Logical Plan == > id: string, name: string > Project [id#356, name#357] > +- Filter NOT (cast(id#356 as int) = -1) >+- SubqueryAlias person_ddf > +- View (`person_ddf`, [id#356,name#357]) > +- LocalRelation [id#356, name#357]{code} > *But when I m using the same query and table in Redshift which is based on > PostGreSQL. I am getting the desired result.* > {code:java} > select * from person where id <> -1; {code} > Explain plan obtained in Redshift. > {code:java} > XN Seq Scan on person (cost=0.00..0.03 rows=1 width=336) > Filter: ((id)::text <> '-1'::text) {code} > > In the execution plan for Spark, the ID column is cast as an integer, while > in Redshift, the ID column is cast as a varchar. > Shouldn't Spark SQL handle this the same way as Redshift, using the datatype > of the ID column rather than the datatype of -1? > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48671) Add test cases for Hex.hex
Wei Guo created SPARK-48671: --- Summary: Add test cases for Hex.hex Key: SPARK-48671 URL: https://issues.apache.org/jira/browse/SPARK-48671 Project: Spark Issue Type: Test Components: SQL Affects Versions: 4.0.0 Reporter: Wei Guo -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48666) A filter should not be pushed down if it contains Unevaluable expression
[ https://issues.apache.org/jira/browse/SPARK-48666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856433#comment-17856433 ] Yokesh NK commented on SPARK-48666: --- During `PruneFileSourcePartitions` optimization, it converts the expression `isnotnull(getdata(cast(snapshot_date#2 as string))#30)` into `isnotnull(getdata(cast(input[1, int, true] as string))#30)`. In this case, `getdata` is a Python User-Defined Function (PythonUDF). However, when attempting to evaluate the transformed expression `getdata(cast(input[1, int, true] as string))`, the function fails to execute correctly. Just to test, excluding the rule `PruneFileSourcePartitions`, let this execution complete with no issue. So, this bug will be fixed in Spark. > A filter should not be pushed down if it contains Unevaluable expression > > > Key: SPARK-48666 > URL: https://issues.apache.org/jira/browse/SPARK-48666 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wei Zheng >Priority: Major > > We should avoid pushing down Unevaluable expression as it can cause > unexpected failures. For example, the code snippet below (assuming there is a > table {{_t_}} with a partition column {{{_}p{_})}} > {code:java} > from pyspark import SparkConf > from pyspark.sql import SparkSession > from pyspark.sql.types import StringType > import pyspark.sql.functions as f > def getdata(p: str) -> str: > return "data" > NEW_COLUMN = 'new_column' > P_COLUMN = 'p' > f_getdata = f.udf(getdata, StringType()) > rows = spark.sql("select * from default.t") > table = rows.withColumn(NEW_COLUMN, f_getdata(f.col(P_COLUMN))) > df = table.alias('t1').join(table.alias('t2'), (f.col(f"t1.{NEW_COLUMN}") == > f.col(f"t2.{NEW_COLUMN}")), how='inner') > df.show(){code} > will cause an error like: > {code:java} > org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot evaluate expression: > getdata(input[0, string, true])#16 > at org.apache.spark.SparkException$.internalError(SparkException.scala:92) > at org.apache.spark.SparkException$.internalError(SparkException.scala:96) > at > org.apache.spark.sql.errors.QueryExecutionErrors$.cannotEvaluateExpressionError(QueryExecutionErrors.scala:66) > at > org.apache.spark.sql.catalyst.expressions.Unevaluable.eval(Expression.scala:391) > at > org.apache.spark.sql.catalyst.expressions.Unevaluable.eval$(Expression.scala:390) > at > org.apache.spark.sql.catalyst.expressions.PythonUDF.eval(PythonUDF.scala:71) > at > org.apache.spark.sql.catalyst.expressions.IsNotNull.eval(nullExpressions.scala:384) > at > org.apache.spark.sql.catalyst.expressions.InterpretedPredicate.eval(predicates.scala:52) > at > org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.$anonfun$prunePartitionsByFilter$1(ExternalCatalogUtils.scala:166) > at > org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.$anonfun$prunePartitionsByFilter$1$adapted(ExternalCatalogUtils.scala:165) > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48669) K8s resource name prefix follows DNS Subdomain Names rule
[ https://issues.apache.org/jira/browse/SPARK-48669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xi Chen updated SPARK-48669: Summary: K8s resource name prefix follows DNS Subdomain Names rule (was: Limit K8s pod name length to follow DNS Subdomain Names rule) > K8s resource name prefix follows DNS Subdomain Names rule > - > > Key: SPARK-48669 > URL: https://issues.apache.org/jira/browse/SPARK-48669 > Project: Spark > Issue Type: Bug > Components: k8s >Affects Versions: 3.5.1 >Reporter: Xi Chen >Priority: Major > > In SPARK-39614, we extended the allowed name length from 63 to 253 for > executor pod and config map. > However, when the pod name is exceeded length 253, we don't truncate it. This > leads to error when creating the Spark pods. > Error example: > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: POST at: > https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/foo/pods. > Message: Pod "some-super-long-spark-pod-name-exceeded-length-253-driver" is > invalid: metadata.name: Invalid value: > "some-super-long-spark-pod-name-exceeded-length-253-driver": must be no more > than 253 characters. {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48669) Limit K8s pod name length to follow DNS Subdomain Names rule
Xi Chen created SPARK-48669: --- Summary: Limit K8s pod name length to follow DNS Subdomain Names rule Key: SPARK-48669 URL: https://issues.apache.org/jira/browse/SPARK-48669 Project: Spark Issue Type: Bug Components: k8s Affects Versions: 3.5.1 Reporter: Xi Chen In SPARK-39614, we extended the allowed name length from 63 to 253 for executor pod and config map. However, when the pod name is exceeded length 253, we don't truncate it. This leads to error when creating the Spark pods. Error example: {code:java} Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/foo/pods. Message: Pod "some-super-long-spark-pod-name-exceeded-length-253-driver" is invalid: metadata.name: Invalid value: "some-super-long-spark-pod-name-exceeded-length-253-driver": must be no more than 253 characters. {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48668) Support ALTER NAMESPACE ... UNSET PROPERTIES in v2
BingKun Pan created SPARK-48668: --- Summary: Support ALTER NAMESPACE ... UNSET PROPERTIES in v2 Key: SPARK-48668 URL: https://issues.apache.org/jira/browse/SPARK-48668 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org