from:"Kent Yao \\\\\\\(Jira\\\\\\\)"

[jira] [Resolved] (SPARK-48608) Spark 3.5: fails to build with value defaultValueNotConstantError is not a member of object org.apache.spark.sql.errors.QueryCompilationErrors

2024-07-29 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48608.
--
Fix Version/s: 3.5.2
   Resolution: Fixed

Issue resolved by https://github.com/apache/spark/pull/46978

> Spark 3.5: fails to build with value defaultValueNotConstantError is not a 
> member of object org.apache.spark.sql.errors.QueryCompilationErrors 
> ---
>
> Key: SPARK-48608
> URL: https://issues.apache.org/jira/browse/SPARK-48608
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.2
>Reporter: Thomas Graves
>Priority: Blocker
> Fix For: 3.5.2
>
>
> PR [https://github.com/apache/spark/pull/46594] seems to have broken the 
> Spark 3.5 build.
> [ERROR] [Error] 
> ...sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:299:
>  value defaultValueNotConstantError is not a member of object 
> org.apache.spark.sql.errors.QueryCompilationErrors
> I don't see that definition defined on the 3.5 branch - 
> [https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala]
> I see it defined on master by 
> https://issues.apache.org/jira/browse/SPARK-46905 which only went into 4.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48308) Unify getting data schema without partition columns in FileSourceStrategy

2024-07-25 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48308:
-
Fix Version/s: 3.5.2

> Unify getting data schema without partition columns in FileSourceStrategy
> -
>
> Key: SPARK-48308
> URL: https://issues.apache.org/jira/browse/SPARK-48308
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.1
>Reporter: Johan Lasperas
>Assignee: Johan Lasperas
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> In 
> [FileSourceStrategy,|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala#L191]
>  the schema of the data excluding partition columns is computed 2 times in a 
> slightly different way:
>  
> {code:java}
> val dataColumnsWithoutPartitionCols = 
> dataColumns.filterNot(partitionSet.contains) {code}
> vs 
> {code:java}
> val readDataColumns = dataColumns
>   .filterNot(partitionColumns.contains) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48991) FileStreamSink.hasMetadata handles invalid path

2024-07-24 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48991:
-
Fix Version/s: 3.5.3
   (was: 3.5.2)

> FileStreamSink.hasMetadata handles invalid path
> ---
>
> Key: SPARK-48991
> URL: https://issues.apache.org/jira/browse/SPARK-48991
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.4.4, 3.5.3
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48991) FileStreamSink.hasMetadata handles invalid path

2024-07-24 Thread Kent Yao (Jira)

Kent Yao created SPARK-48991:


 Summary: FileStreamSink.hasMetadata handles invalid path
 Key: SPARK-48991
 URL: https://issues.apache.org/jira/browse/SPARK-48991
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.3, 3.5.1, 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48963) Support JIRA_ACCESS_TOKEN in translate-contributors.py

2024-07-22 Thread Kent Yao (Jira)

Kent Yao created SPARK-48963:


 Summary: Support JIRA_ACCESS_TOKEN in translate-contributors.py
 Key: SPARK-48963
 URL: https://issues.apache.org/jira/browse/SPARK-48963
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-48921) ScalaUDF in subquery should run through analyzer

2024-07-19 Thread Kent Yao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17867174#comment-17867174
 ] 

Kent Yao commented on SPARK-48921:
--

Collected to 3.5.2

> ScalaUDF in subquery should run through analyzer
> 
>
> Key: SPARK-48921
> URL: https://issues.apache.org/jira/browse/SPARK-48921
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> We got a customer issue that a `MergeInto` query on Iceberg table works 
> earlier but cannot work after upgrading to Spark 3.4.
> The error looks like
> ```
> Caused by: org.apache.spark.SparkRuntimeException: Error while decoding: 
> org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
> nullable on unresolved object
> upcast(getcolumnbyordinal(0, StringType), StringType, - root class: 
> java.lang.String).toString.
> ```
> The source table of `MergeInto` uses `ScalaUDF`. The error happens when Spark 
> invokes the deserializer of input encoder of the `ScalaUDF` and the 
> deserializer is not resolved yet.
> The encoders of ScalaUDF are resolved by the rule `ResolveEncodersInUDF` 
> which will be applied at the end of analysis phase.
> During rewriting `MergeInto` to `ReplaceData` query, Spark creates an 
> `Exists` subquery and `ScalaUDF` is part of the plan of the subquery. Note 
> that the `ScalaUDF` is already resolved by the analyzer.
> Then, in `ResolveSubquery` rule which resolves the subquery, it will resolve 
> the subquery plan if it is not resolved yet. Because the subquery containing 
> `ScalaUDF` is resolved, the rule skips it so `ResolveEncodersInUDF` won't be 
> applied on it. So the analyzed `ReplaceData` query contains a `ScalaUDF` with 
> encoders unresolved that cause the error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48934) Python datetime types converted incorrectly for setting timeout in applyInPandasWithState

2024-07-19 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48934:
-
Fix Version/s: 3.5.2
   (was: 3.5.3)

> Python datetime types converted incorrectly for setting timeout in 
> applyInPandasWithState
> -
>
> Key: SPARK-48934
> URL: https://issues.apache.org/jira/browse/SPARK-48934
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> In applyInPandasWithState(), when state.setTimeoutTimestamp() is passed in 
> with datetime.datetime type, it doesn't function as expected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48921) ScalaUDF in subquery should run through analyzer

2024-07-18 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48921:
-
Fix Version/s: 3.5.2
   (was: 3.5.3)

> ScalaUDF in subquery should run through analyzer
> 
>
> Key: SPARK-48921
> URL: https://issues.apache.org/jira/browse/SPARK-48921
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> We got a customer issue that a `MergeInto` query on Iceberg table works 
> earlier but cannot work after upgrading to Spark 3.4.
> The error looks like
> ```
> Caused by: org.apache.spark.SparkRuntimeException: Error while decoding: 
> org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
> nullable on unresolved object
> upcast(getcolumnbyordinal(0, StringType), StringType, - root class: 
> java.lang.String).toString.
> ```
> The source table of `MergeInto` uses `ScalaUDF`. The error happens when Spark 
> invokes the deserializer of input encoder of the `ScalaUDF` and the 
> deserializer is not resolved yet.
> The encoders of ScalaUDF are resolved by the rule `ResolveEncodersInUDF` 
> which will be applied at the end of analysis phase.
> During rewriting `MergeInto` to `ReplaceData` query, Spark creates an 
> `Exists` subquery and `ScalaUDF` is part of the plan of the subquery. Note 
> that the `ScalaUDF` is already resolved by the analyzer.
> Then, in `ResolveSubquery` rule which resolves the subquery, it will resolve 
> the subquery plan if it is not resolved yet. Because the subquery containing 
> `ScalaUDF` is resolved, the rule skips it so `ResolveEncodersInUDF` won't be 
> applied on it. So the analyzed `ReplaceData` query contains a `ScalaUDF` with 
> encoders unresolved that cause the error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48791) Perf regression due to accumulator registration overhead using CopyOnWriteArrayList

2024-07-18 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48791:
-
Fix Version/s: 3.5.2
   (was: 3.5.3)

> Perf regression due to accumulator registration overhead using 
> CopyOnWriteArrayList
> ---
>
> Key: SPARK-48791
> URL: https://issues.apache.org/jira/browse/SPARK-48791
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0, 3.5.1, 3.3.4, 3.4.3
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2, 3.4.4
>
>
> We noticed query perf regression and locate the root cause is the overhead 
> introuduced when registering accumulators using CopyOnWriteArrayList.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-48791) Perf regression due to accumulator registration overhead using CopyOnWriteArrayList

2024-07-18 Thread Kent Yao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17867175#comment-17867175
 ] 

Kent Yao commented on SPARK-48791:
--

Collected to 3.5.2

> Perf regression due to accumulator registration overhead using 
> CopyOnWriteArrayList
> ---
>
> Key: SPARK-48791
> URL: https://issues.apache.org/jira/browse/SPARK-48791
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0, 3.5.1, 3.3.4, 3.4.3
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2, 3.4.4
>
>
> We noticed query perf regression and locate the root cause is the overhead 
> introuduced when registering accumulators using CopyOnWriteArrayList.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-48934) Python datetime types converted incorrectly for setting timeout in applyInPandasWithState

2024-07-18 Thread Kent Yao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17867173#comment-17867173
 ] 

Kent Yao commented on SPARK-48934:
--

Collected this to 3.5.2

> Python datetime types converted incorrectly for setting timeout in 
> applyInPandasWithState
> -
>
> Key: SPARK-48934
> URL: https://issues.apache.org/jira/browse/SPARK-48934
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> In applyInPandasWithState(), when state.setTimeoutTimestamp() is passed in 
> with datetime.datetime type, it doesn't function as expected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48865) Add try_url_decode function

2024-07-17 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48865.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47294
[https://github.com/apache/spark/pull/47294]

> Add try_url_decode function
> ---
>
> Key: SPARK-48865
> URL: https://issues.apache.org/jira/browse/SPARK-48865
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Zhen Wang
>Assignee: Zhen Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add a `try_url_decode` function that performs the same operation as 
> `url_decode`, but returns a NULL value instead of raising an error if the 
> decoding cannot be performed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48908) GitHub API Rate Limit Exceeded Problem in spark-rm Dockerfile

2024-07-16 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48908.
--
Resolution: Not A Problem

Only 3.5 has such an issue

> GitHub API Rate Limit Exceeded Problem in spark-rm Dockerfile
> -
>
> Key: SPARK-48908
> URL: https://issues.apache.org/jira/browse/SPARK-48908
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48908) GitHub API Rate Limit Exceeded Problem in spark-rm Dockerfile

2024-07-16 Thread Kent Yao (Jira)

Kent Yao created SPARK-48908:


 Summary: GitHub API Rate Limit Exceeded Problem in spark-rm 
Dockerfile
 Key: SPARK-48908
 URL: https://issues.apache.org/jira/browse/SPARK-48908
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.4.3, 3.5.1, 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48905) Add a guideline for version updates in DOC and various API DOCs

2024-07-15 Thread Kent Yao (Jira)

Kent Yao created SPARK-48905:


 Summary: Add a guideline for version updates in DOC and various 
API DOCs
 Key: SPARK-48905
 URL: https://issues.apache.org/jira/browse/SPARK-48905
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48904) Update doc's version field to align with fixedVersion field of JIRA ticket

2024-07-15 Thread Kent Yao (Jira)

Kent Yao created SPARK-48904:


 Summary: Update doc's version field to align with fixedVersion 
field of JIRA ticket
 Key: SPARK-48904
 URL: https://issues.apache.org/jira/browse/SPARK-48904
 Project: Spark
  Issue Type: Umbrella
  Components: Documentation, Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48885) Make some inheritances of RuntimeReplaceable override replacement to lazy val

2024-07-15 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48885.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47333
[https://github.com/apache/spark/pull/47333]

> Make some inheritances of RuntimeReplaceable override replacement to lazy val
> -
>
> Key: SPARK-48885
> URL: https://issues.apache.org/jira/browse/SPARK-48885
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48885) Make some inheritances of RuntimeReplaceable override replacement to lazy val

2024-07-15 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48885:


Assignee: Kent Yao

> Make some inheritances of RuntimeReplaceable override replacement to lazy val
> -
>
> Key: SPARK-48885
> URL: https://issues.apache.org/jira/browse/SPARK-48885
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47819) Use asynchronous callback for execution cleanup

2024-07-14 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-47819:
-
Fix Version/s: (was: 4.0.0)
   (was: 3.5.2)

> Use asynchronous callback for execution cleanup
> ---
>
> Key: SPARK-47819
> URL: https://issues.apache.org/jira/browse/SPARK-47819
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0, 3.5.1
>Reporter: Xi Lyu
>Priority: Major
>  Labels: pull-request-available
>
> Expired sessions are regularly checked and cleaned up by a maintenance 
> thread. However, currently, this process is synchronous. Therefore, in rare 
> cases, interrupting the execution thread of a query in a session can take 
> hours, causing the entire maintenance process to stall, resulting in a large 
> amount of memory not being cleared.
> We address this by introducing asynchronous callbacks for execution cleanup, 
> avoiding synchronous joins of execution threads, and preventing the 
> maintenance thread from stalling in the above scenarios. To be more specific, 
> instead of calling {{runner.join()}} in ExecutorHolder.close(), we set a 
> post-cleanup function as the callback through 
> {{{}runner.processOnCompletion{}}}, which will be called asynchronously once 
> the execution runner is completed or interrupted. In this way, the 
> maintenance thread won't get blocked on {{{}join{}}}ing an execution thread.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47652) Spark Remote Connect to multiple Spark Sessions

2024-07-14 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-47652:
-
Fix Version/s: (was: 3.5.2)

> Spark Remote Connect to multiple Spark Sessions
> ---
>
> Key: SPARK-47652
> URL: https://issues.apache.org/jira/browse/SPARK-47652
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Nagharajan Raghavendran
>Priority: Major
>
> Current Spark Remote connects looks to have a single spark remote connect 
> session. Can it be extended to do multiple Spark Sessions. This would help in 
> creating decentralized Kubernetes/Custom Cloud Environment and reduce the 
> compute of the current Spark Session.
> Making Spark to work like a full remote API call for multiple datasets where 
> requried.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47947) Add AssertDataFrameEquality util function for scala

2024-07-14 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-47947:
-
Fix Version/s: (was: 3.5.2)

> Add AssertDataFrameEquality util function for scala
> ---
>
> Key: SPARK-47947
> URL: https://issues.apache.org/jira/browse/SPARK-47947
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.5.2
>Reporter: Anh Tuan Pham
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48315) Create user-facing error for null locale in CSV options

2024-07-14 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48315:
-
Fix Version/s: (was: 4.0.0)
   (was: 3.5.2)

> Create user-facing error for null locale in CSV options
> ---
>
> Key: SPARK-48315
> URL: https://issues.apache.org/jira/browse/SPARK-48315
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0, 4.0.0, 3.5.1, 3.5.2
>Reporter: Michael Zhang
>Priority: Major
>
> When user incorrectly sets `locale` option to `null` with csv, a null pointer 
> exception is thrown. We should wrap the exception so the user understands 
> what the issue is.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string

2024-07-14 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-47759:
-
Fix Version/s: (was: 3.5.0)
   (was: 4.0.0)
   (was: 3.5.1)
   (was: 3.5.2)

> Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate 
> time string
> -
>
> Key: SPARK-47759
> URL: https://issues.apache.org/jira/browse/SPARK-47759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0, 3.5.1
>Reporter: Bo Xiong
>Assignee: Bo Xiong
>Priority: Critical
>  Labels: hang, pull-request-available, stuck, threadsafe
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h2. Symptom
> It's observed that our Spark apps occasionally got stuck with an unexpected 
> stack trace when reading/parsing a legitimate time string. Note that we 
> manually killed the stuck app instances and the retry goes thru on the same 
> cluster (without requiring any app code change).
>  
> *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
> legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
> runtime.
> {code:java}
> Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
> must be specified as seconds (s), milliseconds (ms), microseconds (us), 
> minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
> Failed to parse time string: 120s
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
> at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
> at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
> at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
> at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
> at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
> at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
> at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
> at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192)
> at 
>

[jira] [Resolved] (SPARK-47307) Spark 3.3 produces invalid base64

2024-07-13 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-47307.
--
Fix Version/s: 4.0.0
 Assignee: Zhen Wang
   Resolution: Fixed

> Spark 3.3 produces invalid base64
> -
>
> Key: SPARK-47307
> URL: https://issues.apache.org/jira/browse/SPARK-47307
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 4.0.0, 3.5.2, 3.4.4
>Reporter: Willi Raschkowski
>Assignee: Zhen Wang
>Priority: Blocker
>  Labels: correctness, pull-request-available
> Fix For: 4.0.0
>
>
> SPARK-37820 was introduced in Spark 3.3 and breaks behavior of {{base64}} 
> (which is fine but shouldn't happen between minor version).
> {code:title=Spark 3.2}
> >>> spark.sql(f"""SELECT base64('{'a' * 58}') AS base64""").collect()[0][0]
> 'YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYQ=='
> {code}
> Note the different output in Spark 3.3 (the addition of {{\r\n}} newlines).
> {code:title=Spark 3.3}
> >>> spark.sql(f"""SELECT base64('{'a' * 58}') AS base64""").collect()[0][0]
> 'YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh\r\nYQ=='
> {code}
> The former decodes fine with the {{base64}} on my machine but the latter does 
> not:
> {code}
> $ pbpaste | base64 --decode
> aa%
> $ pbpaste | base64 --decode
> base64: stdin: (null): error decoding base64 input stream
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48885) Make some inheritances of RuntimeReplaceable override replacement to lazy val

2024-07-12 Thread Kent Yao (Jira)

Kent Yao created SPARK-48885:


 Summary: Make some inheritances of RuntimeReplaceable override 
replacement to lazy val
 Key: SPARK-48885
 URL: https://issues.apache.org/jira/browse/SPARK-48885
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48845) GenericUDF Can not CatchException From Child UDFs

2024-07-12 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48845.
--
Fix Version/s: 3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 47268
[https://github.com/apache/spark/pull/47268]

> GenericUDF Can not CatchException From Child UDFs
> -
>
> Key: SPARK-48845
> URL: https://issues.apache.org/jira/browse/SPARK-48845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0, 3.5.1
>Reporter: Junqing Li
>Assignee: Junqing Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>
> During the upgrade from Spark 3.3.1 to 3.5.1, we encountered syntax issues 
> with this pr. The problem arose from DeferredObject currently passing a value 
> instead of a function, which prevented users from catching exceptions in 
> GenericUDF, resulting in semantic differences.
> Here is an example case we encountered. Originally, the semantics were that 
> {{str_to_map_udf}} would throw an exception due to issues with the input 
> string, while {{merge_map_udf}} could catch the exception and return a null 
> value. However, currently, any exception encountered by {{str_to_map_udf}} 
> will cause the program to fail.
> {code:java}
> select merge_map_udf(str_to_map_udf(col1), parse_map_udf(col2), map("key", 
> "value")) from table {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48876) Upgrade Guava used by the connect module to 33.2.1-jre

2024-07-12 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48876.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47296
[https://github.com/apache/spark/pull/47296]

> Upgrade Guava used by the connect module to 33.2.1-jre
> --
>
> Key: SPARK-48876
> URL: https://issues.apache.org/jira/browse/SPARK-48876
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48879) Expand the charset list with Chinese Standard Charsets

2024-07-12 Thread Kent Yao (Jira)

Kent Yao created SPARK-48879:


 Summary: Expand the charset list with Chinese Standard Charsets
 Key: SPARK-48879
 URL: https://issues.apache.org/jira/browse/SPARK-48879
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48874) Upgrade MySQL docker image version to 9.0.0

2024-07-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48874.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47311
[https://github.com/apache/spark/pull/47311]

> Upgrade MySQL docker image version to 9.0.0
> ---
>
> Key: SPARK-48874
> URL: https://issues.apache.org/jira/browse/SPARK-48874
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Docker, SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48874) Upgrade MySQL docker image version to 9.0.0

2024-07-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48874:


Assignee: BingKun Pan

> Upgrade MySQL docker image version to 9.0.0
> ---
>
> Key: SPARK-48874
> URL: https://issues.apache.org/jira/browse/SPARK-48874
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Docker, SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47651) Better Documentation of Spark Remote Connect for Pyspark

2024-07-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-47651:
-
Issue Type: Question  (was: Improvement)

> Better Documentation of Spark Remote Connect for Pyspark 
> -
>
> Key: SPARK-47651
> URL: https://issues.apache.org/jira/browse/SPARK-47651
> Project: Spark
>  Issue Type: Question
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Nagharajan Raghavendran
>Priority: Major
> Fix For: 3.5.2
>
>
> Is there a better documentation for Spark Remote Connect on Kubernetes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48093) Add config to switch between client side listener and server side listener

2024-07-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48093:
-
Affects Version/s: 4.0.0
   (was: 3.5.0)
   (was: 3.5.1)
   (was: 3.5.2)

> Add config to switch between client side listener and server side listener 
> ---
>
> Key: SPARK-48093
> URL: https://issues.apache.org/jira/browse/SPARK-48093
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>
> We are moving the implementation of Streaming Query Listener from server to 
> client. For clients already running client side listener, to prevent 
> regression, we should add a config to let them decide what type of listener 
> the user want to use.
>  
> This is only added to 3.5.x published versions. For 4.0 and upwards we only 
> use client side listener.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47801) Use simdjson-java in JSON related UDFs

2024-07-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-47801:
-
Affects Version/s: 4.0.0
   (was: 3.5.2)

> Use simdjson-java in JSON related UDFs
> --
>
> Key: SPARK-47801
> URL: https://issues.apache.org/jira/browse/SPARK-47801
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Zheng Shao
>Priority: Major
>
> JSON parsing speed is important.
> Right now, functions like GET_JSON_OBJECT, FROM_JSON are slow because they 
> don't use [https://github.com/simdjson/simdjson-java]
>  
> We should consider adopting [https://github.com/simdjson/simdjson-java] for 
> those UDFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46814) Build and Run with Java 21

2024-07-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-46814.
--
Resolution: Duplicate

> Build and Run with Java 21
> --
>
> Key: SPARK-46814
> URL: https://issues.apache.org/jira/browse/SPARK-46814
> Project: Spark
>  Issue Type: New Feature
>  Components: Build
>Affects Versions: 3.5.2, 3.4.3
>Reporter: Madhavan
>Priority: Major
>  Labels: Releasenotes, releasenotes
>
> Apache Spark supports Java 8, Java 11 (LTS) and Java 17 (LTS). The next Java 
> LTS version is {*}21{*}.
> ||Version||Release Date||
> |Java 21 (LTS)|19th September 2023|
> Apache Spark has a release plan and Spark Code freeze along with the release 
> branch cut details published here,
>  - [https://spark.apache.org/versioning-policy.html]
> Supporting new Java version is considered as a new feature which we cannot 
> allow to backport.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48866) Fix hints of valid charset in the error message of INVALID_PARAMETER_VALUE.CHARSET

2024-07-10 Thread Kent Yao (Jira)

Kent Yao created SPARK-48866:


 Summary: Fix hints of valid charset in the error message of 
INVALID_PARAMETER_VALUE.CHARSET
 Key: SPARK-48866
 URL: https://issues.apache.org/jira/browse/SPARK-48866
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48855) Make ExecutorPodsAllocatorSuite independent from default allocation batch size

2024-07-10 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48855:


Assignee: Dongjoon Hyun

> Make ExecutorPodsAllocatorSuite independent from default allocation batch size
> --
>
> Key: SPARK-48855
> URL: https://issues.apache.org/jira/browse/SPARK-48855
> Project: Spark
>  Issue Type: Test
>  Components: Kubernetes, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48855) Make ExecutorPodsAllocatorSuite independent from default allocation batch size

2024-07-10 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48855.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47279
[https://github.com/apache/spark/pull/47279]

> Make ExecutorPodsAllocatorSuite independent from default allocation batch size
> --
>
> Key: SPARK-48855
> URL: https://issues.apache.org/jira/browse/SPARK-48855
> Project: Spark
>  Issue Type: Test
>  Components: Kubernetes, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48857) Restrict charsets in CSVOptions

2024-07-10 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48857.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47280
[https://github.com/apache/spark/pull/47280]

> Restrict charsets in CSVOptions
> ---
>
> Key: SPARK-48857
> URL: https://issues.apache.org/jira/browse/SPARK-48857
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48857) Restrict charsets in CSVOptions

2024-07-10 Thread Kent Yao (Jira)

Kent Yao created SPARK-48857:


 Summary: Restrict charsets in CSVOptions
 Key: SPARK-48857
 URL: https://issues.apache.org/jira/browse/SPARK-48857
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48854) Add missing options in CSV documentation

2024-07-10 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48854.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47278
[https://github.com/apache/spark/pull/47278]

> Add missing options in CSV documentation
> 
>
> Key: SPARK-48854
> URL: https://issues.apache.org/jira/browse/SPARK-48854
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48854) Add missing options in CSV documentation

2024-07-10 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48854:


Assignee: Kent Yao

> Add missing options in CSV documentation
> 
>
> Key: SPARK-48854
> URL: https://issues.apache.org/jira/browse/SPARK-48854
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48854) Add missing options in CSV documentation

2024-07-10 Thread Kent Yao (Jira)

Kent Yao created SPARK-48854:


 Summary: Add missing options in CSV documentation
 Key: SPARK-48854
 URL: https://issues.apache.org/jira/browse/SPARK-48854
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48807) Binary Support for CSV datasource

2024-07-10 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48807.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47212
[https://github.com/apache/spark/pull/47212]

> Binary Support for CSV datasource
> -
>
> Key: SPARK-48807
> URL: https://issues.apache.org/jira/browse/SPARK-48807
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48816) Perf improvement for CSV UnivocityParser with ANSI Intervals

2024-07-09 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48816:


Assignee: Kent Yao

> Perf improvement for CSV UnivocityParser with ANSI Intervals
> 
>
> Key: SPARK-48816
> URL: https://issues.apache.org/jira/browse/SPARK-48816
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48816) Perf improvement for CSV UnivocityParser with ANSI Intervals

2024-07-09 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48816.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47227
[https://github.com/apache/spark/pull/47227]

> Perf improvement for CSV UnivocityParser with ANSI Intervals
> 
>
> Key: SPARK-48816
> URL: https://issues.apache.org/jira/browse/SPARK-48816
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48804) Add classIsLoadable & OutputCommitter.isAssignableFrom check for outputCommitterClasses

2024-07-09 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48804.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47209
[https://github.com/apache/spark/pull/47209]

> Add classIsLoadable & OutputCommitter.isAssignableFrom check for 
> outputCommitterClasses
> ---
>
> Key: SPARK-48804
> URL: https://issues.apache.org/jira/browse/SPARK-48804
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48640) Perf improvement for format hex from byte array

2024-07-05 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48640.
--
Resolution: Not A Problem

> Perf improvement for format hex from byte array
> ---
>
> Key: SPARK-48640
> URL: https://issues.apache.org/jira/browse/SPARK-48640
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48792) INSERT with partial column list to table with char/varchar crashes

2024-07-05 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48792:


Assignee: Kent Yao

> INSERT with partial column list to table with char/varchar crashes
> --
>
> Key: SPARK-48792
> URL: https://issues.apache.org/jira/browse/SPARK-48792
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> ```
> 24/07/03 16:29:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
> org.apache.spark.SparkException: [INTERNAL_ERROR] Unsupported data type 
> VarcharType(64). SQLSTATE: XX000
>   at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:92)
>   at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:96)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.makeWriter(ParquetWriteSupport.scala:266)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.$anonfun$init$2(ParquetWriteSupport.scala:111)
>   at scala.collection.immutable.List.map(List.scala:247)
>   at scala.collection.immutable.List.map(List.scala:79)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:111)
>   at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:478)
>   at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:422)
>   at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:36)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetUtils$$anon$1.newInstance(ParquetUtils.scala:500)
>   at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:180)
>   at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:165)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:391)
>   at 
> org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:107)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:896)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:896)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:369)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:333)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
>   at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
>   at org.apache.spark.scheduler.Task.run(Task.scala:146)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:640)
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:643)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>   at java.base/java.lang.Thread.run(Thread.java:840)
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48792) INSERT with partial column list to table with char/varchar crashes

2024-07-05 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48792.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

> INSERT with partial column list to table with char/varchar crashes
> --
>
> Key: SPARK-48792
> URL: https://issues.apache.org/jira/browse/SPARK-48792
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> ```
> 24/07/03 16:29:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
> org.apache.spark.SparkException: [INTERNAL_ERROR] Unsupported data type 
> VarcharType(64). SQLSTATE: XX000
>   at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:92)
>   at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:96)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.makeWriter(ParquetWriteSupport.scala:266)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.$anonfun$init$2(ParquetWriteSupport.scala:111)
>   at scala.collection.immutable.List.map(List.scala:247)
>   at scala.collection.immutable.List.map(List.scala:79)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:111)
>   at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:478)
>   at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:422)
>   at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:36)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetUtils$$anon$1.newInstance(ParquetUtils.scala:500)
>   at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:180)
>   at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:165)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:391)
>   at 
> org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:107)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:896)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:896)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:369)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:333)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
>   at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
>   at org.apache.spark.scheduler.Task.run(Task.scala:146)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:640)
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:643)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>   at java.base/java.lang.Thread.run(Thread.java:840)
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48816) Perf improvement for CSV UnivocityParser with ANSI Intervals

2024-07-05 Thread Kent Yao (Jira)

Kent Yao created SPARK-48816:


 Summary: Perf improvement for CSV UnivocityParser with ANSI 
Intervals
 Key: SPARK-48816
 URL: https://issues.apache.org/jira/browse/SPARK-48816
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48806) Pass actual exception when url_decode fails

2024-07-04 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48806.
--
Fix Version/s: 3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 47211
[https://github.com/apache/spark/pull/47211]

> Pass actual exception when url_decode fails
> ---
>
> Key: SPARK-48806
> URL: https://issues.apache.org/jira/browse/SPARK-48806
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Zhen Wang
>Assignee: Zhen Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>
> Currently url_decode function ignores actual exception, which contains 
> information that is useful for quickly locating the problem.
>  
> Like executing this sql:
> {code:java}
> select url_decode('https%3A%2F%2spark.apache.org'); {code}
> We only get the error message:
> {code:java}
> org.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The 
> provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure 
> that the URL is properly formatted and try again.
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:376)
>     at 
> org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:118)
>     at 
> org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)
>  {code}
> However, the actual useful exception information is ignored:
> {code:java}
> java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in 
> escape (%) pattern - Error at index 1 in: "2s" {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48806) Pass actual exception when url_decode fails

2024-07-04 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48806:


Assignee: Zhen Wang

> Pass actual exception when url_decode fails
> ---
>
> Key: SPARK-48806
> URL: https://issues.apache.org/jira/browse/SPARK-48806
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Zhen Wang
>Assignee: Zhen Wang
>Priority: Minor
>  Labels: pull-request-available
>
> Currently url_decode function ignores actual exception, which contains 
> information that is useful for quickly locating the problem.
>  
> Like executing this sql:
> {code:java}
> select url_decode('https%3A%2F%2spark.apache.org'); {code}
> We only get the error message:
> {code:java}
> org.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The 
> provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure 
> that the URL is properly formatted and try again.
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:376)
>     at 
> org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:118)
>     at 
> org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)
>  {code}
> However, the actual useful exception information is ignored:
> {code:java}
> java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in 
> escape (%) pattern - Error at index 1 in: "2s" {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48808) NPE when connecting thriftserver through Hive 1.2.1

2024-07-04 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48808:


Assignee: Kent Yao

> NPE when connecting thriftserver through Hive 1.2.1
> ---
>
> Key: SPARK-48808
> URL: https://issues.apache.org/jira/browse/SPARK-48808
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48808) NPE when connecting thriftserver through Hive 1.2.1

2024-07-04 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48808.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47213
[https://github.com/apache/spark/pull/47213]

> NPE when connecting thriftserver through Hive 1.2.1
> ---
>
> Key: SPARK-48808
> URL: https://issues.apache.org/jira/browse/SPARK-48808
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48808) NPE when connecting thriftserver through Hive 1.2.1

2024-07-04 Thread Kent Yao (Jira)

Kent Yao created SPARK-48808:


 Summary: NPE when connecting thriftserver through Hive 1.2.1
 Key: SPARK-48808
 URL: https://issues.apache.org/jira/browse/SPARK-48808
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48807) Binary Support for CSV datasource

2024-07-04 Thread Kent Yao (Jira)

Kent Yao created SPARK-48807:


 Summary: Binary Support for CSV datasource
 Key: SPARK-48807
 URL: https://issues.apache.org/jira/browse/SPARK-48807
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48804) Add classIsLoadable & OutputCommitter.isAssignableFrom check for outputCommitterClasses

2024-07-03 Thread Kent Yao (Jira)

Kent Yao created SPARK-48804:


 Summary: Add classIsLoadable & OutputCommitter.isAssignableFrom 
check for outputCommitterClasses
 Key: SPARK-48804
 URL: https://issues.apache.org/jira/browse/SPARK-48804
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48803) Throw internal error in OrcDeserializer to align with ParquetWriteSupport

2024-07-03 Thread Kent Yao (Jira)

Kent Yao created SPARK-48803:


 Summary: Throw internal error in OrcDeserializer to align with 
ParquetWriteSupport
 Key: SPARK-48803
 URL: https://issues.apache.org/jira/browse/SPARK-48803
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48795) Upgrade mysql-connector-j to 9.0.0

2024-07-03 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48795.
--
Fix Version/s: 4.0.0
 Assignee: Wei Guo
   Resolution: Fixed

Issue resolved by https://github.com/apache/spark/pull/47200

> Upgrade mysql-connector-j to 9.0.0
> --
>
> Key: SPARK-48795
> URL: https://issues.apache.org/jira/browse/SPARK-48795
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: Wei Guo
>Assignee: Wei Guo
>Priority: Minor
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48792) INSERT with partial column list to table with char/varchar crashes

2024-07-03 Thread Kent Yao (Jira)

Kent Yao created SPARK-48792:


 Summary: INSERT with partial column list to table with 
char/varchar crashes
 Key: SPARK-48792
 URL: https://issues.apache.org/jira/browse/SPARK-48792
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.1
Reporter: Kent Yao


```
24/07/03 16:29:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: [INTERNAL_ERROR] Unsupported data type 
VarcharType(64). SQLSTATE: XX000
at 
org.apache.spark.SparkException$.internalError(SparkException.scala:92)
at 
org.apache.spark.SparkException$.internalError(SparkException.scala:96)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.makeWriter(ParquetWriteSupport.scala:266)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.$anonfun$init$2(ParquetWriteSupport.scala:111)
at scala.collection.immutable.List.map(List.scala:247)
at scala.collection.immutable.List.map(List.scala:79)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:111)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:478)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:422)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:36)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetUtils$$anon$1.newInstance(ParquetUtils.scala:500)
at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:180)
at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:165)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:391)
at 
org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:107)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:896)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:896)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:369)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:333)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
at org.apache.spark.scheduler.Task.run(Task.scala:146)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:640)
at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:643)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48749) Simplify UnaryPositive with RuntimeReplacable

2024-07-01 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48749:


Assignee: Kent Yao

> Simplify UnaryPositive with RuntimeReplacable
> -
>
> Key: SPARK-48749
> URL: https://issues.apache.org/jira/browse/SPARK-48749
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48749) Simplify UnaryPositive with RuntimeReplacable

2024-07-01 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48749.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47143
[https://github.com/apache/spark/pull/47143]

> Simplify UnaryPositive with RuntimeReplacable
> -
>
> Key: SPARK-48749
> URL: https://issues.apache.org/jira/browse/SPARK-48749
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48748) Cache numChars in UTF8String

2024-06-30 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48748:


Assignee: Uroš Bojanić

> Cache numChars in UTF8String
> 
>
> Key: SPARK-48748
> URL: https://issues.apache.org/jira/browse/SPARK-48748
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>
> Thread-safe cache for numChars value in UTF8String to allow faster access.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48748) Cache numChars in UTF8String

2024-06-30 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48748.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47142
[https://github.com/apache/spark/pull/47142]

> Cache numChars in UTF8String
> 
>
> Key: SPARK-48748
> URL: https://issues.apache.org/jira/browse/SPARK-48748
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
> Fix For: 4.0.0
>
>
> Thread-safe cache for numChars value in UTF8String to allow faster access.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48673) Scheduling Across Applications in k8s mode

2024-06-28 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48673.
--
Resolution: Information Provided

Please use the dev or user mailing lists for questions 
https://spark.apache.org/community.html

> Scheduling Across Applications in k8s mode 
> ---
>
> Key: SPARK-48673
> URL: https://issues.apache.org/jira/browse/SPARK-48673
> Project: Spark
>  Issue Type: Question
>  Components: k8s, Kubernetes, Scheduler, Spark Shell, Spark Submit
>Affects Versions: 3.5.1
>Reporter: Samba Shiva
>Priority: Trivial
>
> I have been trying autoscaling in Kubernetes for Spark Jobs,When first job is 
> triggered based on load workers pods are scaling which is fine but When 
> second job is submitted its not getting allocating any resources as First Job 
> is consuming all the resources.
> Second job is in Waiting State until First Job is finished.I have gone 
> through documentation to set max cores in standalone mode which is not a 
> ideal solution as we are planning autoscaling based on load and Jobs 
> submitted.
> Is there any solution for this or any alternatives ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48673) Scheduling Across Applications in k8s mode

2024-06-28 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48673:
-
Issue Type: Question  (was: Improvement)

> Scheduling Across Applications in k8s mode 
> ---
>
> Key: SPARK-48673
> URL: https://issues.apache.org/jira/browse/SPARK-48673
> Project: Spark
>  Issue Type: Question
>  Components: k8s, Kubernetes, Scheduler, Spark Shell, Spark Submit
>Affects Versions: 3.5.1
>Reporter: Samba Shiva
>Priority: Blocker
>
> I have been trying autoscaling in Kubernetes for Spark Jobs,When first job is 
> triggered based on load workers pods are scaling which is fine but When 
> second job is submitted its not getting allocating any resources as First Job 
> is consuming all the resources.
> Second job is in Waiting State until First Job is finished.I have gone 
> through documentation to set max cores in standalone mode which is not a 
> ideal solution as we are planning autoscaling based on load and Jobs 
> submitted.
> Is there any solution for this or any alternatives ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48673) Scheduling Across Applications in k8s mode

2024-06-28 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48673:
-
Priority: Trivial  (was: Blocker)

> Scheduling Across Applications in k8s mode 
> ---
>
> Key: SPARK-48673
> URL: https://issues.apache.org/jira/browse/SPARK-48673
> Project: Spark
>  Issue Type: Question
>  Components: k8s, Kubernetes, Scheduler, Spark Shell, Spark Submit
>Affects Versions: 3.5.1
>Reporter: Samba Shiva
>Priority: Trivial
>
> I have been trying autoscaling in Kubernetes for Spark Jobs,When first job is 
> triggered based on load workers pods are scaling which is fine but When 
> second job is submitted its not getting allocating any resources as First Job 
> is consuming all the resources.
> Second job is in Waiting State until First Job is finished.I have gone 
> through documentation to set max cores in standalone mode which is not a 
> ideal solution as we are planning autoscaling based on load and Jobs 
> submitted.
> Is there any solution for this or any alternatives ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48749) Simplify UnaryPositive with RuntimeReplacable

2024-06-28 Thread Kent Yao (Jira)

Kent Yao created SPARK-48749:


 Summary: Simplify UnaryPositive with RuntimeReplacable
 Key: SPARK-48749
 URL: https://issues.apache.org/jira/browse/SPARK-48749
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48709) varchar resolution mismatch for DataSourceV2 CTAS

2024-06-27 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48709:
-
Fix Version/s: 3.5.2

> varchar resolution mismatch for DataSourceV2 CTAS
> -
>
> Key: SPARK-48709
> URL: https://issues.apache.org/jira/browse/SPARK-48709
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0, 4.0.0, 3.5.1
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 4.0.0, 3.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46957) Migrated shuffle data files from the decommissioned node should be removed when job completed

2024-06-27 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-46957:
-
Fix Version/s: 3.5.2
   3.4.4

> Migrated shuffle data files from the decommissioned node should be removed 
> when job completed
> -
>
> Key: SPARK-46957
> URL: https://issues.apache.org/jira/browse/SPARK-46957
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Yu-Jhe Li
>Assignee: wuyi
>Priority: Major
> Fix For: 4.0.0, 3.5.2, 3.4.4
>
>
> Hi, we have a long-lived Spark application run on a standalone cluster on GCP 
> and we are using spot instances. To reduce the impact of preempted instances, 
> we have enabled node decommission to let the preempted node migrate its 
> shuffle data to other instances before it is deleted by GCP.
> However, we found the migrated shuffle data from the decommissioned node is 
> never removed. (same behavior on spark-3.5)
> *Reproduce steps:*
> 1. Start spark-shell with 3 executors and enable decommission on both 
> driver/worker
> {code:java}
> start-worker.sh[3331]: Spark Command: 
> /usr/lib/jvm/java-17-openjdk-amd64/bin/java -cp 
> /opt/spark/conf/:/opt/spark/jars/* -Dspark.worker.cleanup.appDataTtl=1800 
> -Dspark.decommission.enabled=true -Xmx1g 
> org.apache.spark.deploy.worker.Worker --webui-port 8081 
> spark://master-01.com:7077 {code}
> {code:java}
> /opt/spark/bin/spark-shell --master spark://master-01.spark.com:7077 \
>   --total-executor-cores 12 \
>   --conf spark.decommission.enabled=true \
>   --conf spark.storage.decommission.enabled=true \
>   --conf spark.storage.decommission.shuffleBlocks.enabled=true \
>   --conf spark.storage.decommission.rddBlocks.enabled=true{code}
>  
> 2. Manually stop 1 worker during execution
> {code:java}
> (1 to 10).foreach { i =>
>   println(s"start iter $i ...")
>   val longString = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
> Integer eget tortor id libero ultricies faucibus nec ac neque. Vivamus ac 
> risus vitae mi efficitur lacinia. Quisque dignissim quam vel tellus placerat, 
> non laoreet elit rhoncus. Nam et magna id dui tempor sagittis. Aliquam erat 
> volutpat. Integer tristique purus ac eros bibendum, at varius velit viverra. 
> Sed eleifend luctus massa, ac accumsan leo feugiat ac. Sed id nisl et enim 
> tristique auctor. Sed vel ante nec leo placerat tincidunt. Ut varius, risus 
> nec sodales tempor, odio augue euismod ipsum, nec tristique e"
>   val df = (1 to 1 * i).map(j => (j, s"${j}_${longString}")).toDF("id", 
> "mystr")
>   df.repartition(6).count()
>   System.gc()
>   println(s"finished iter $i, wait 15s for next round")
>   Thread.sleep(15*1000)
> }
> System.gc()
> start iter 1 ...
> finished iter 1, wait 15s for next round
> ... {code}
>  
> 3. Check the migrated shuffle data files on the remaining workers
> {*}decommissioned node{*}: migrated shuffle file successfully
> {code:java}
> less /mnt/spark_work/app-20240202084807-0003/1/stdout | grep 'Migrated '
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_4_41 to BlockManagerId(2, 10.67.5.139, 35949, None)
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_4_38 to BlockManagerId(0, 10.67.5.134, 36175, None)
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_4_47 to BlockManagerId(0, 10.67.5.134, 36175, None)
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_4_44 to BlockManagerId(2, 10.67.5.139, 35949, None)
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_5_52 to BlockManagerId(0, 10.67.5.134, 36175, None)
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_5_55 to BlockManagerId(2, 10.67.5.139, 35949, None) {code}
> {*}remaining shuffle data files on the other workers{*}: the migrated shuffle 
> files are never removed
> {code:java}
> 10.67.5.134 | CHANGED | rc=0 >>
> -rw-r--r-- 1 spark spark 126 Feb  2 08:48 
> /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/13/shuffle_4_47_0.data
> -rw-r--r-- 1 spark spark 126 Feb  2 08:48 
> /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/31/shuffle_4_38_0.data
> -rw-r--r-- 1 spark spark 32 Feb  2 08:48 
> /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/3a/shuffle_5_52_0.data
> 10.67.5.139 | CHANGED | rc=0 >>
> -rw-r--r-- 1 spark spark 126 Feb  2 08:48 
>

[jira] [Assigned] (SPARK-48735) Performance Improvement for BIN function

2024-06-27 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48735:


Assignee: Kent Yao

> Performance Improvement for BIN function
> 
>
> Key: SPARK-48735
> URL: https://issues.apache.org/jira/browse/SPARK-48735
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> {code:java}
> --- a/sql/core/benchmarks/MathFunctionBenchmark-results.txt
> +++ b/sql/core/benchmarks/MathFunctionBenchmark-results.txt
> @@ -2,5 +2,5 @@ OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5
>  Apple M2 Max
>  encode:                                   Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
>  
> 
> -BIN                                                2657           2661       
>     5          3.8         265.7       1.0X
> +BIN                                                1524           1567       
>    61          6.6         152.4       1.0X {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48735) Performance Improvement for BIN function

2024-06-27 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48735.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47119
[https://github.com/apache/spark/pull/47119]

> Performance Improvement for BIN function
> 
>
> Key: SPARK-48735
> URL: https://issues.apache.org/jira/browse/SPARK-48735
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 4.0.0
>
>
> {code:java}
> --- a/sql/core/benchmarks/MathFunctionBenchmark-results.txt
> +++ b/sql/core/benchmarks/MathFunctionBenchmark-results.txt
> @@ -2,5 +2,5 @@ OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5
>  Apple M2 Max
>  encode:                                   Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
>  
> 
> -BIN                                                2657           2661       
>     5          3.8         265.7       1.0X
> +BIN                                                1524           1567       
>    61          6.6         152.4       1.0X {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46957) Migrated shuffle data files from the decommissioned node should be removed when job completed

2024-06-27 Thread Kent Yao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860437#comment-17860437
 ] 

Kent Yao commented on SPARK-46957:
--

https://github.com/apache/spark/commit/7aa12b6cd01da88cbbb3e8c6e50863e6139315b7
https://github.com/apache/spark/commit/f8b1040ea006fe48df6bb52e0ace4dce54ab6d56

Reverted it from 3.4 and 3.5 to fix the CI

> Migrated shuffle data files from the decommissioned node should be removed 
> when job completed
> -
>
> Key: SPARK-46957
> URL: https://issues.apache.org/jira/browse/SPARK-46957
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Yu-Jhe Li
>Assignee: wuyi
>Priority: Major
> Fix For: 4.0.0
>
>
> Hi, we have a long-lived Spark application run on a standalone cluster on GCP 
> and we are using spot instances. To reduce the impact of preempted instances, 
> we have enabled node decommission to let the preempted node migrate its 
> shuffle data to other instances before it is deleted by GCP.
> However, we found the migrated shuffle data from the decommissioned node is 
> never removed. (same behavior on spark-3.5)
> *Reproduce steps:*
> 1. Start spark-shell with 3 executors and enable decommission on both 
> driver/worker
> {code:java}
> start-worker.sh[3331]: Spark Command: 
> /usr/lib/jvm/java-17-openjdk-amd64/bin/java -cp 
> /opt/spark/conf/:/opt/spark/jars/* -Dspark.worker.cleanup.appDataTtl=1800 
> -Dspark.decommission.enabled=true -Xmx1g 
> org.apache.spark.deploy.worker.Worker --webui-port 8081 
> spark://master-01.com:7077 {code}
> {code:java}
> /opt/spark/bin/spark-shell --master spark://master-01.spark.com:7077 \
>   --total-executor-cores 12 \
>   --conf spark.decommission.enabled=true \
>   --conf spark.storage.decommission.enabled=true \
>   --conf spark.storage.decommission.shuffleBlocks.enabled=true \
>   --conf spark.storage.decommission.rddBlocks.enabled=true{code}
>  
> 2. Manually stop 1 worker during execution
> {code:java}
> (1 to 10).foreach { i =>
>   println(s"start iter $i ...")
>   val longString = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
> Integer eget tortor id libero ultricies faucibus nec ac neque. Vivamus ac 
> risus vitae mi efficitur lacinia. Quisque dignissim quam vel tellus placerat, 
> non laoreet elit rhoncus. Nam et magna id dui tempor sagittis. Aliquam erat 
> volutpat. Integer tristique purus ac eros bibendum, at varius velit viverra. 
> Sed eleifend luctus massa, ac accumsan leo feugiat ac. Sed id nisl et enim 
> tristique auctor. Sed vel ante nec leo placerat tincidunt. Ut varius, risus 
> nec sodales tempor, odio augue euismod ipsum, nec tristique e"
>   val df = (1 to 1 * i).map(j => (j, s"${j}_${longString}")).toDF("id", 
> "mystr")
>   df.repartition(6).count()
>   System.gc()
>   println(s"finished iter $i, wait 15s for next round")
>   Thread.sleep(15*1000)
> }
> System.gc()
> start iter 1 ...
> finished iter 1, wait 15s for next round
> ... {code}
>  
> 3. Check the migrated shuffle data files on the remaining workers
> {*}decommissioned node{*}: migrated shuffle file successfully
> {code:java}
> less /mnt/spark_work/app-20240202084807-0003/1/stdout | grep 'Migrated '
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_4_41 to BlockManagerId(2, 10.67.5.139, 35949, None)
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_4_38 to BlockManagerId(0, 10.67.5.134, 36175, None)
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_4_47 to BlockManagerId(0, 10.67.5.134, 36175, None)
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_4_44 to BlockManagerId(2, 10.67.5.139, 35949, None)
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_5_52 to BlockManagerId(0, 10.67.5.134, 36175, None)
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_5_55 to BlockManagerId(2, 10.67.5.139, 35949, None) {code}
> {*}remaining shuffle data files on the other workers{*}: the migrated shuffle 
> files are never removed
> {code:java}
> 10.67.5.134 | CHANGED | rc=0 >>
> -rw-r--r-- 1 spark spark 126 Feb  2 08:48 
> /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/13/shuffle_4_47_0.data
> -rw-r--r-- 1 spark spark 126 Feb  2 08:48 
> /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/31/shuffle_4_38_0.data
> -rw-r--r-- 1 spark spark 32 Feb  2 08:48 
>

[jira] [Updated] (SPARK-46957) Migrated shuffle data files from the decommissioned node should be removed when job completed

2024-06-27 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-46957:
-
Fix Version/s: (was: 3.5.2)
   (was: 3.4.4)

> Migrated shuffle data files from the decommissioned node should be removed 
> when job completed
> -
>
> Key: SPARK-46957
> URL: https://issues.apache.org/jira/browse/SPARK-46957
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Yu-Jhe Li
>Assignee: wuyi
>Priority: Major
> Fix For: 4.0.0
>
>
> Hi, we have a long-lived Spark application run on a standalone cluster on GCP 
> and we are using spot instances. To reduce the impact of preempted instances, 
> we have enabled node decommission to let the preempted node migrate its 
> shuffle data to other instances before it is deleted by GCP.
> However, we found the migrated shuffle data from the decommissioned node is 
> never removed. (same behavior on spark-3.5)
> *Reproduce steps:*
> 1. Start spark-shell with 3 executors and enable decommission on both 
> driver/worker
> {code:java}
> start-worker.sh[3331]: Spark Command: 
> /usr/lib/jvm/java-17-openjdk-amd64/bin/java -cp 
> /opt/spark/conf/:/opt/spark/jars/* -Dspark.worker.cleanup.appDataTtl=1800 
> -Dspark.decommission.enabled=true -Xmx1g 
> org.apache.spark.deploy.worker.Worker --webui-port 8081 
> spark://master-01.com:7077 {code}
> {code:java}
> /opt/spark/bin/spark-shell --master spark://master-01.spark.com:7077 \
>   --total-executor-cores 12 \
>   --conf spark.decommission.enabled=true \
>   --conf spark.storage.decommission.enabled=true \
>   --conf spark.storage.decommission.shuffleBlocks.enabled=true \
>   --conf spark.storage.decommission.rddBlocks.enabled=true{code}
>  
> 2. Manually stop 1 worker during execution
> {code:java}
> (1 to 10).foreach { i =>
>   println(s"start iter $i ...")
>   val longString = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
> Integer eget tortor id libero ultricies faucibus nec ac neque. Vivamus ac 
> risus vitae mi efficitur lacinia. Quisque dignissim quam vel tellus placerat, 
> non laoreet elit rhoncus. Nam et magna id dui tempor sagittis. Aliquam erat 
> volutpat. Integer tristique purus ac eros bibendum, at varius velit viverra. 
> Sed eleifend luctus massa, ac accumsan leo feugiat ac. Sed id nisl et enim 
> tristique auctor. Sed vel ante nec leo placerat tincidunt. Ut varius, risus 
> nec sodales tempor, odio augue euismod ipsum, nec tristique e"
>   val df = (1 to 1 * i).map(j => (j, s"${j}_${longString}")).toDF("id", 
> "mystr")
>   df.repartition(6).count()
>   System.gc()
>   println(s"finished iter $i, wait 15s for next round")
>   Thread.sleep(15*1000)
> }
> System.gc()
> start iter 1 ...
> finished iter 1, wait 15s for next round
> ... {code}
>  
> 3. Check the migrated shuffle data files on the remaining workers
> {*}decommissioned node{*}: migrated shuffle file successfully
> {code:java}
> less /mnt/spark_work/app-20240202084807-0003/1/stdout | grep 'Migrated '
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_4_41 to BlockManagerId(2, 10.67.5.139, 35949, None)
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_4_38 to BlockManagerId(0, 10.67.5.134, 36175, None)
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_4_47 to BlockManagerId(0, 10.67.5.134, 36175, None)
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_4_44 to BlockManagerId(2, 10.67.5.139, 35949, None)
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_5_52 to BlockManagerId(0, 10.67.5.134, 36175, None)
> 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated 
> migrate_shuffle_5_55 to BlockManagerId(2, 10.67.5.139, 35949, None) {code}
> {*}remaining shuffle data files on the other workers{*}: the migrated shuffle 
> files are never removed
> {code:java}
> 10.67.5.134 | CHANGED | rc=0 >>
> -rw-r--r-- 1 spark spark 126 Feb  2 08:48 
> /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/13/shuffle_4_47_0.data
> -rw-r--r-- 1 spark spark 126 Feb  2 08:48 
> /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/31/shuffle_4_38_0.data
> -rw-r--r-- 1 spark spark 32 Feb  2 08:48 
> /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/3a/shuffle_5_52_0.data
> 10.67.5.139 | CHANGED | rc=0 >>
> -rw-r--r-- 1 spark spark 126 Feb  2 08:48 
>

[jira] [Updated] (SPARK-48735) Performance Improvement for BIN function

2024-06-27 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48735:
-
Description: 
{code:java}
--- a/sql/core/benchmarks/MathFunctionBenchmark-results.txt
+++ b/sql/core/benchmarks/MathFunctionBenchmark-results.txt
@@ -2,5 +2,5 @@ OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5
 Apple M2 Max
 encode:                                   Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 

-BIN                                                2657           2661         
  5          3.8         265.7       1.0X
+BIN                                                1524           1567         
 61          6.6         152.4       1.0X {code}

  was:
{code:diff}
--- a/sql/core/benchmarks/MathFunctionBenchmark-results.txt
+++ b/sql/core/benchmarks/MathFunctionBenchmark-results.txt
@@ -2,5 +2,5 @@ OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5
 Apple M2 Max
 encode:                                   Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 

-BIN                                                2657           2661         
  5          3.8         265.7       1.0X
+BIN                                                1524           1567         
 61          6.6         152.4       1.0X {code}


> Performance Improvement for BIN function
> 
>
> Key: SPARK-48735
> URL: https://issues.apache.org/jira/browse/SPARK-48735
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>
> {code:java}
> --- a/sql/core/benchmarks/MathFunctionBenchmark-results.txt
> +++ b/sql/core/benchmarks/MathFunctionBenchmark-results.txt
> @@ -2,5 +2,5 @@ OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5
>  Apple M2 Max
>  encode:                                   Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
>  
> 
> -BIN                                                2657           2661       
>     5          3.8         265.7       1.0X
> +BIN                                                1524           1567       
>    61          6.6         152.4       1.0X {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48735) Performance Improvement for BIN function

2024-06-27 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48735:
-
Description: 
{code:diff}
--- a/sql/core/benchmarks/MathFunctionBenchmark-results.txt
+++ b/sql/core/benchmarks/MathFunctionBenchmark-results.txt
@@ -2,5 +2,5 @@ OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5
 Apple M2 Max
 encode:                                   Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 

-BIN                                                2657           2661         
  5          3.8         265.7       1.0X
+BIN                                                1524           1567         
 61          6.6         152.4       1.0X {code}

> Performance Improvement for BIN function
> 
>
> Key: SPARK-48735
> URL: https://issues.apache.org/jira/browse/SPARK-48735
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>
> {code:diff}
> --- a/sql/core/benchmarks/MathFunctionBenchmark-results.txt
> +++ b/sql/core/benchmarks/MathFunctionBenchmark-results.txt
> @@ -2,5 +2,5 @@ OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5
>  Apple M2 Max
>  encode:                                   Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
>  
> 
> -BIN                                                2657           2661       
>     5          3.8         265.7       1.0X
> +BIN                                                1524           1567       
>    61          6.6         152.4       1.0X {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48735) Performance Improvement for BIN function

2024-06-27 Thread Kent Yao (Jira)

Kent Yao created SPARK-48735:


 Summary: Performance Improvement for BIN function
 Key: SPARK-48735
 URL: https://issues.apache.org/jira/browse/SPARK-48735
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48712) Perf Improvement for Encode with empty string and UTF-8 charset

2024-06-27 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48712:
-
Parent: SPARK-48624
Issue Type: Sub-task  (was: Improvement)

> Perf Improvement for Encode with empty string and UTF-8 charset
> ---
>
> Key: SPARK-48712
> URL: https://issues.apache.org/jira/browse/SPARK-48712
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 4.0.0
>
>
>  Apple M2 Max
>  encode:                                   Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
>  
> 
> -UTF-8                                              3672           3697       
>    22          5.4         183.6       1.0X
> +UTF-8                                             79270          79698       
>   448          0.3        3963.5       1.0X



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48712) Perf Improvement for Encode with empty string and UTF-8 charset

2024-06-27 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48712.
--
Fix Version/s: 4.0.0
 Assignee: Kent Yao
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/47096

> Perf Improvement for Encode with empty string and UTF-8 charset
> ---
>
> Key: SPARK-48712
> URL: https://issues.apache.org/jira/browse/SPARK-48712
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 4.0.0
>
>
>  Apple M2 Max
>  encode:                                   Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
>  
> 
> -UTF-8                                              3672           3697       
>    22          5.4         183.6       1.0X
> +UTF-8                                             79270          79698       
>   448          0.3        3963.5       1.0X



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48723) Run `git cherry-pick --abort` if backporting is denied by committer

2024-06-26 Thread Kent Yao (Jira)

Kent Yao created SPARK-48723:


 Summary: Run `git cherry-pick --abort` if backporting is denied by 
committer
 Key: SPARK-48723
 URL: https://issues.apache.org/jira/browse/SPARK-48723
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48713) Add index range check for UnsafeRow.pointTo when baseObject is byte array

2024-06-26 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48713.
--
Fix Version/s: 4.0.0
 Assignee: wuyi
   Resolution: Fixed

> Add index range check for UnsafeRow.pointTo when baseObject is byte array
> -
>
> Key: SPARK-48713
> URL: https://issues.apache.org/jira/browse/SPARK-48713
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48721) Fix Decode doc in SQL API page

2024-06-26 Thread Kent Yao (Jira)

Kent Yao created SPARK-48721:


 Summary: Fix Decode doc in SQL API page
 Key: SPARK-48721
 URL: https://issues.apache.org/jira/browse/SPARK-48721
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.3, 3.5.1, 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48706) Python UDF in higher order functions should not throw internal error

2024-06-25 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48706:


Assignee: Hyukjin Kwon

> Python UDF in higher order functions should not throw internal error
> 
>
> Key: SPARK-48706
> URL: https://issues.apache.org/jira/browse/SPARK-48706
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> {code}
> from pyspark.sql.functions import transform, udf, col, array
> spark.range(1).select(transform(array("id"), lambda x: udf(lambda y: 
> y)(x))).collect()
> {code}
> throws an internal error:
> {code}
>   at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:88)
>   at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:92)
>   at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.cannotEvaluateExpressionError(QueryExecutionErrors.scala:73)
>   at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.eval(Expression.scala:507)
>   at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.eval$(Expression.scala:506)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48706) Python UDF in higher order functions should not throw internal error

2024-06-25 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48706.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47079
[https://github.com/apache/spark/pull/47079]

> Python UDF in higher order functions should not throw internal error
> 
>
> Key: SPARK-48706
> URL: https://issues.apache.org/jira/browse/SPARK-48706
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 4.0.0
>
>
> {code}
> from pyspark.sql.functions import transform, udf, col, array
> spark.range(1).select(transform(array("id"), lambda x: udf(lambda y: 
> y)(x))).collect()
> {code}
> throws an internal error:
> {code}
>   at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:88)
>   at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:92)
>   at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.cannotEvaluateExpressionError(QueryExecutionErrors.scala:73)
>   at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.eval(Expression.scala:507)
>   at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.eval$(Expression.scala:506)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48573) Upgrade ICU version

2024-06-25 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48573.
--
Target Version/s: 4.0.0
Assignee: Mihailo Milosevic
  Resolution: Fixed

> Upgrade ICU version
> ---
>
> Key: SPARK-48573
> URL: https://issues.apache.org/jira/browse/SPARK-48573
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48573) Upgrade ICU version

2024-06-25 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48573:
-
Fix Version/s: 4.0.0

> Upgrade ICU version
> ---
>
> Key: SPARK-48573
> URL: https://issues.apache.org/jira/browse/SPARK-48573
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48712) Perf Improvement for Encode with empty string and UTF-8 charset

2024-06-25 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48712:
-
Description: 
 Apple M2 Max
 encode:                                   Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 

-UTF-8                                              3672           3697         
 22          5.4         183.6       1.0X
+UTF-8                                             79270          79698         
448          0.3        3963.5       1.0X

> Perf Improvement for Encode with empty string and UTF-8 charset
> ---
>
> Key: SPARK-48712
> URL: https://issues.apache.org/jira/browse/SPARK-48712
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>
>  Apple M2 Max
>  encode:                                   Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
>  
> 
> -UTF-8                                              3672           3697       
>    22          5.4         183.6       1.0X
> +UTF-8                                             79270          79698       
>   448          0.3        3963.5       1.0X



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48712) Perf Improvement for Encode with empty string and UTF-8 charset

2024-06-25 Thread Kent Yao (Jira)

Kent Yao created SPARK-48712:


 Summary: Perf Improvement for Encode with empty string and UTF-8 
charset
 Key: SPARK-48712
 URL: https://issues.apache.org/jira/browse/SPARK-48712
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48693) simplify and unify toString of Invoke and StaticInvoke

2024-06-25 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48693.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47066
[https://github.com/apache/spark/pull/47066]

> simplify and unify toString of Invoke and StaticInvoke
> --
>
> Key: SPARK-48693
> URL: https://issues.apache.org/jira/browse/SPARK-48693
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48658) Encode/Decode functions report coding error instead of mojibake

2024-06-24 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48658.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47017
[https://github.com/apache/spark/pull/47017]

> Encode/Decode functions report coding error instead of mojibake
> ---
>
> Key: SPARK-48658
> URL: https://issues.apache.org/jira/browse/SPARK-48658
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48658) Encode/Decode functions report coding error instead of mojibake

2024-06-24 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48658:


Assignee: Kent Yao

> Encode/Decode functions report coding error instead of mojibake
> ---
>
> Key: SPARK-48658
> URL: https://issues.apache.org/jira/browse/SPARK-48658
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48696) Also truncate the schema row for show function

2024-06-24 Thread Kent Yao (Jira)

Kent Yao created SPARK-48696:


 Summary: Also truncate the schema row for show function
 Key: SPARK-48696
 URL: https://issues.apache.org/jira/browse/SPARK-48696
 Project: Spark
  Issue Type: Improvement
  Components: Connect, SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48680) Add char/varchar doc to language specific tables

2024-06-24 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48680.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47052
[https://github.com/apache/spark/pull/47052]

> Add char/varchar doc to language specific tables
> 
>
> Key: SPARK-48680
> URL: https://issues.apache.org/jira/browse/SPARK-48680
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48693) simplify and unify toString of Invoke and StaticInvoke

2024-06-23 Thread Kent Yao (Jira)

Kent Yao created SPARK-48693:


 Summary: simplify and unify toString of Invoke and StaticInvoke
 Key: SPARK-48693
 URL: https://issues.apache.org/jira/browse/SPARK-48693
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48684) Print related JIRA summary before proceeding merge

2024-06-21 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48684.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47057
[https://github.com/apache/spark/pull/47057]

> Print related JIRA summary before proceeding merge
> --
>
> Key: SPARK-48684
> URL: https://issues.apache.org/jira/browse/SPARK-48684
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48684) Print related JIRA summary before proceeding merge

2024-06-21 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48684:
-
Component/s: Project Infra
 (was: SQL)

> Print related JIRA summary before proceeding merge
> --
>
> Key: SPARK-48684
> URL: https://issues.apache.org/jira/browse/SPARK-48684
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48684) Print related JIRA summary before proceeding merge

2024-06-21 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48684:
-
Priority: Minor  (was: Major)

> Print related JIRA summary before proceeding merge
> --
>
> Key: SPARK-48684
> URL: https://issues.apache.org/jira/browse/SPARK-48684
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48684) Print related JIRA summary before proceeding merge

2024-06-21 Thread Kent Yao (Jira)

Kent Yao created SPARK-48684:


 Summary: Print related JIRA summary before proceeding merge
 Key: SPARK-48684
 URL: https://issues.apache.org/jira/browse/SPARK-48684
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48680) Add char/varchar doc to language specific tables

2024-06-21 Thread Kent Yao (Jira)

Kent Yao created SPARK-48680:


 Summary: Add char/varchar doc to language specific tables
 Key: SPARK-48680
 URL: https://issues.apache.org/jira/browse/SPARK-48680
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48656) ArrayIndexOutOfBoundsException in CartesianRDD getPartitions

2024-06-20 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48656.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47019
[https://github.com/apache/spark/pull/47019]

> ArrayIndexOutOfBoundsException in CartesianRDD getPartitions
> 
>
> Key: SPARK-48656
> URL: https://issues.apache.org/jira/browse/SPARK-48656
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Nick Young
>Assignee: Wei Guo
>Priority: Major
> Fix For: 4.0.0
>
>
> ```val rdd1 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = 65536)
> val rdd2 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = 
> 65536)rdd2.cartesian(rdd1).partitions```
> Throws `ArrayIndexOutOfBoundsException: 0` at CartesianRDD.scala:69 because 
> `s1.index * numPartitionsInRdd2 + s2.index` overflows and wraps to 0. We 
> should provide a better error message which indicates the number of partition 
> overflows so it's easier for the user to debug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1539 matches

Mail list logo