[jira] [Resolved] (SPARK-48608) Spark 3.5: fails to build with value defaultValueNotConstantError is not a member of object org.apache.spark.sql.errors.QueryCompilationErrors
[ https://issues.apache.org/jira/browse/SPARK-48608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48608. -- Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by https://github.com/apache/spark/pull/46978 > Spark 3.5: fails to build with value defaultValueNotConstantError is not a > member of object org.apache.spark.sql.errors.QueryCompilationErrors > --- > > Key: SPARK-48608 > URL: https://issues.apache.org/jira/browse/SPARK-48608 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.2 >Reporter: Thomas Graves >Priority: Blocker > Fix For: 3.5.2 > > > PR [https://github.com/apache/spark/pull/46594] seems to have broken the > Spark 3.5 build. > [ERROR] [Error] > ...sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:299: > value defaultValueNotConstantError is not a member of object > org.apache.spark.sql.errors.QueryCompilationErrors > I don't see that definition defined on the 3.5 branch - > [https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala] > I see it defined on master by > https://issues.apache.org/jira/browse/SPARK-46905 which only went into 4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48308) Unify getting data schema without partition columns in FileSourceStrategy
[ https://issues.apache.org/jira/browse/SPARK-48308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48308: - Fix Version/s: 3.5.2 > Unify getting data schema without partition columns in FileSourceStrategy > - > > Key: SPARK-48308 > URL: https://issues.apache.org/jira/browse/SPARK-48308 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.1 >Reporter: Johan Lasperas >Assignee: Johan Lasperas >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > In > [FileSourceStrategy,|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala#L191] > the schema of the data excluding partition columns is computed 2 times in a > slightly different way: > > {code:java} > val dataColumnsWithoutPartitionCols = > dataColumns.filterNot(partitionSet.contains) {code} > vs > {code:java} > val readDataColumns = dataColumns > .filterNot(partitionColumns.contains) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48991) FileStreamSink.hasMetadata handles invalid path
[ https://issues.apache.org/jira/browse/SPARK-48991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48991: - Fix Version/s: 3.5.3 (was: 3.5.2) > FileStreamSink.hasMetadata handles invalid path > --- > > Key: SPARK-48991 > URL: https://issues.apache.org/jira/browse/SPARK-48991 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.4.4, 3.5.3 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48991) FileStreamSink.hasMetadata handles invalid path
Kent Yao created SPARK-48991: Summary: FileStreamSink.hasMetadata handles invalid path Key: SPARK-48991 URL: https://issues.apache.org/jira/browse/SPARK-48991 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.3, 3.5.1, 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48963) Support JIRA_ACCESS_TOKEN in translate-contributors.py
Kent Yao created SPARK-48963: Summary: Support JIRA_ACCESS_TOKEN in translate-contributors.py Key: SPARK-48963 URL: https://issues.apache.org/jira/browse/SPARK-48963 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48921) ScalaUDF in subquery should run through analyzer
[ https://issues.apache.org/jira/browse/SPARK-48921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17867174#comment-17867174 ] Kent Yao commented on SPARK-48921: -- Collected to 3.5.2 > ScalaUDF in subquery should run through analyzer > > > Key: SPARK-48921 > URL: https://issues.apache.org/jira/browse/SPARK-48921 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > We got a customer issue that a `MergeInto` query on Iceberg table works > earlier but cannot work after upgrading to Spark 3.4. > The error looks like > ``` > Caused by: org.apache.spark.SparkRuntimeException: Error while decoding: > org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to > nullable on unresolved object > upcast(getcolumnbyordinal(0, StringType), StringType, - root class: > java.lang.String).toString. > ``` > The source table of `MergeInto` uses `ScalaUDF`. The error happens when Spark > invokes the deserializer of input encoder of the `ScalaUDF` and the > deserializer is not resolved yet. > The encoders of ScalaUDF are resolved by the rule `ResolveEncodersInUDF` > which will be applied at the end of analysis phase. > During rewriting `MergeInto` to `ReplaceData` query, Spark creates an > `Exists` subquery and `ScalaUDF` is part of the plan of the subquery. Note > that the `ScalaUDF` is already resolved by the analyzer. > Then, in `ResolveSubquery` rule which resolves the subquery, it will resolve > the subquery plan if it is not resolved yet. Because the subquery containing > `ScalaUDF` is resolved, the rule skips it so `ResolveEncodersInUDF` won't be > applied on it. So the analyzed `ReplaceData` query contains a `ScalaUDF` with > encoders unresolved that cause the error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48934) Python datetime types converted incorrectly for setting timeout in applyInPandasWithState
[ https://issues.apache.org/jira/browse/SPARK-48934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48934: - Fix Version/s: 3.5.2 (was: 3.5.3) > Python datetime types converted incorrectly for setting timeout in > applyInPandasWithState > - > > Key: SPARK-48934 > URL: https://issues.apache.org/jira/browse/SPARK-48934 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > In applyInPandasWithState(), when state.setTimeoutTimestamp() is passed in > with datetime.datetime type, it doesn't function as expected. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48921) ScalaUDF in subquery should run through analyzer
[ https://issues.apache.org/jira/browse/SPARK-48921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48921: - Fix Version/s: 3.5.2 (was: 3.5.3) > ScalaUDF in subquery should run through analyzer > > > Key: SPARK-48921 > URL: https://issues.apache.org/jira/browse/SPARK-48921 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > We got a customer issue that a `MergeInto` query on Iceberg table works > earlier but cannot work after upgrading to Spark 3.4. > The error looks like > ``` > Caused by: org.apache.spark.SparkRuntimeException: Error while decoding: > org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to > nullable on unresolved object > upcast(getcolumnbyordinal(0, StringType), StringType, - root class: > java.lang.String).toString. > ``` > The source table of `MergeInto` uses `ScalaUDF`. The error happens when Spark > invokes the deserializer of input encoder of the `ScalaUDF` and the > deserializer is not resolved yet. > The encoders of ScalaUDF are resolved by the rule `ResolveEncodersInUDF` > which will be applied at the end of analysis phase. > During rewriting `MergeInto` to `ReplaceData` query, Spark creates an > `Exists` subquery and `ScalaUDF` is part of the plan of the subquery. Note > that the `ScalaUDF` is already resolved by the analyzer. > Then, in `ResolveSubquery` rule which resolves the subquery, it will resolve > the subquery plan if it is not resolved yet. Because the subquery containing > `ScalaUDF` is resolved, the rule skips it so `ResolveEncodersInUDF` won't be > applied on it. So the analyzed `ReplaceData` query contains a `ScalaUDF` with > encoders unresolved that cause the error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48791) Perf regression due to accumulator registration overhead using CopyOnWriteArrayList
[ https://issues.apache.org/jira/browse/SPARK-48791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48791: - Fix Version/s: 3.5.2 (was: 3.5.3) > Perf regression due to accumulator registration overhead using > CopyOnWriteArrayList > --- > > Key: SPARK-48791 > URL: https://issues.apache.org/jira/browse/SPARK-48791 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0, 3.5.1, 3.3.4, 3.4.3 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2, 3.4.4 > > > We noticed query perf regression and locate the root cause is the overhead > introuduced when registering accumulators using CopyOnWriteArrayList. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48791) Perf regression due to accumulator registration overhead using CopyOnWriteArrayList
[ https://issues.apache.org/jira/browse/SPARK-48791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17867175#comment-17867175 ] Kent Yao commented on SPARK-48791: -- Collected to 3.5.2 > Perf regression due to accumulator registration overhead using > CopyOnWriteArrayList > --- > > Key: SPARK-48791 > URL: https://issues.apache.org/jira/browse/SPARK-48791 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0, 3.5.1, 3.3.4, 3.4.3 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2, 3.4.4 > > > We noticed query perf regression and locate the root cause is the overhead > introuduced when registering accumulators using CopyOnWriteArrayList. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48934) Python datetime types converted incorrectly for setting timeout in applyInPandasWithState
[ https://issues.apache.org/jira/browse/SPARK-48934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17867173#comment-17867173 ] Kent Yao commented on SPARK-48934: -- Collected this to 3.5.2 > Python datetime types converted incorrectly for setting timeout in > applyInPandasWithState > - > > Key: SPARK-48934 > URL: https://issues.apache.org/jira/browse/SPARK-48934 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > In applyInPandasWithState(), when state.setTimeoutTimestamp() is passed in > with datetime.datetime type, it doesn't function as expected. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48865) Add try_url_decode function
[ https://issues.apache.org/jira/browse/SPARK-48865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48865. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47294 [https://github.com/apache/spark/pull/47294] > Add try_url_decode function > --- > > Key: SPARK-48865 > URL: https://issues.apache.org/jira/browse/SPARK-48865 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Zhen Wang >Assignee: Zhen Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add a `try_url_decode` function that performs the same operation as > `url_decode`, but returns a NULL value instead of raising an error if the > decoding cannot be performed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48908) GitHub API Rate Limit Exceeded Problem in spark-rm Dockerfile
[ https://issues.apache.org/jira/browse/SPARK-48908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48908. -- Resolution: Not A Problem Only 3.5 has such an issue > GitHub API Rate Limit Exceeded Problem in spark-rm Dockerfile > - > > Key: SPARK-48908 > URL: https://issues.apache.org/jira/browse/SPARK-48908 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48908) GitHub API Rate Limit Exceeded Problem in spark-rm Dockerfile
Kent Yao created SPARK-48908: Summary: GitHub API Rate Limit Exceeded Problem in spark-rm Dockerfile Key: SPARK-48908 URL: https://issues.apache.org/jira/browse/SPARK-48908 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 3.4.3, 3.5.1, 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48905) Add a guideline for version updates in DOC and various API DOCs
Kent Yao created SPARK-48905: Summary: Add a guideline for version updates in DOC and various API DOCs Key: SPARK-48905 URL: https://issues.apache.org/jira/browse/SPARK-48905 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48904) Update doc's version field to align with fixedVersion field of JIRA ticket
Kent Yao created SPARK-48904: Summary: Update doc's version field to align with fixedVersion field of JIRA ticket Key: SPARK-48904 URL: https://issues.apache.org/jira/browse/SPARK-48904 Project: Spark Issue Type: Umbrella Components: Documentation, Spark Core, SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48885) Make some inheritances of RuntimeReplaceable override replacement to lazy val
[ https://issues.apache.org/jira/browse/SPARK-48885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48885. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47333 [https://github.com/apache/spark/pull/47333] > Make some inheritances of RuntimeReplaceable override replacement to lazy val > - > > Key: SPARK-48885 > URL: https://issues.apache.org/jira/browse/SPARK-48885 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48885) Make some inheritances of RuntimeReplaceable override replacement to lazy val
[ https://issues.apache.org/jira/browse/SPARK-48885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48885: Assignee: Kent Yao > Make some inheritances of RuntimeReplaceable override replacement to lazy val > - > > Key: SPARK-48885 > URL: https://issues.apache.org/jira/browse/SPARK-48885 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47819) Use asynchronous callback for execution cleanup
[ https://issues.apache.org/jira/browse/SPARK-47819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-47819: - Fix Version/s: (was: 4.0.0) (was: 3.5.2) > Use asynchronous callback for execution cleanup > --- > > Key: SPARK-47819 > URL: https://issues.apache.org/jira/browse/SPARK-47819 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0, 4.0.0, 3.5.1 >Reporter: Xi Lyu >Priority: Major > Labels: pull-request-available > > Expired sessions are regularly checked and cleaned up by a maintenance > thread. However, currently, this process is synchronous. Therefore, in rare > cases, interrupting the execution thread of a query in a session can take > hours, causing the entire maintenance process to stall, resulting in a large > amount of memory not being cleared. > We address this by introducing asynchronous callbacks for execution cleanup, > avoiding synchronous joins of execution threads, and preventing the > maintenance thread from stalling in the above scenarios. To be more specific, > instead of calling {{runner.join()}} in ExecutorHolder.close(), we set a > post-cleanup function as the callback through > {{{}runner.processOnCompletion{}}}, which will be called asynchronously once > the execution runner is completed or interrupted. In this way, the > maintenance thread won't get blocked on {{{}join{}}}ing an execution thread. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47652) Spark Remote Connect to multiple Spark Sessions
[ https://issues.apache.org/jira/browse/SPARK-47652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-47652: - Fix Version/s: (was: 3.5.2) > Spark Remote Connect to multiple Spark Sessions > --- > > Key: SPARK-47652 > URL: https://issues.apache.org/jira/browse/SPARK-47652 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Nagharajan Raghavendran >Priority: Major > > Current Spark Remote connects looks to have a single spark remote connect > session. Can it be extended to do multiple Spark Sessions. This would help in > creating decentralized Kubernetes/Custom Cloud Environment and reduce the > compute of the current Spark Session. > Making Spark to work like a full remote API call for multiple datasets where > requried. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47947) Add AssertDataFrameEquality util function for scala
[ https://issues.apache.org/jira/browse/SPARK-47947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-47947: - Fix Version/s: (was: 3.5.2) > Add AssertDataFrameEquality util function for scala > --- > > Key: SPARK-47947 > URL: https://issues.apache.org/jira/browse/SPARK-47947 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.5.2 >Reporter: Anh Tuan Pham >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48315) Create user-facing error for null locale in CSV options
[ https://issues.apache.org/jira/browse/SPARK-48315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48315: - Fix Version/s: (was: 4.0.0) (was: 3.5.2) > Create user-facing error for null locale in CSV options > --- > > Key: SPARK-48315 > URL: https://issues.apache.org/jira/browse/SPARK-48315 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0, 4.0.0, 3.5.1, 3.5.2 >Reporter: Michael Zhang >Priority: Major > > When user incorrectly sets `locale` option to `null` with csv, a null pointer > exception is thrown. We should wrap the exception so the user understands > what the issue is. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-47759: - Fix Version/s: (was: 3.5.0) (was: 4.0.0) (was: 3.5.1) (was: 3.5.2) > Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate > time string > - > > Key: SPARK-47759 > URL: https://issues.apache.org/jira/browse/SPARK-47759 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0, 3.5.1 >Reporter: Bo Xiong >Assignee: Bo Xiong >Priority: Critical > Labels: hang, pull-request-available, stuck, threadsafe > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > It's observed that our Spark apps occasionally got stuck with an unexpected > stack trace when reading/parsing a legitimate time string. Note that we > manually killed the stuck app instances and the retry goes thru on the same > cluster (without requiring any app code change). > > *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a > legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 > runtime. > {code:java} > Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time > must be specified as seconds (s), milliseconds (ms), microseconds (us), > minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. > Failed to parse time string: 120s > at > org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) > at > org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) > at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) > at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) > at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) > at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) > at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) > at > org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) > at > org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) > at >
[jira] [Resolved] (SPARK-47307) Spark 3.3 produces invalid base64
[ https://issues.apache.org/jira/browse/SPARK-47307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-47307. -- Fix Version/s: 4.0.0 Assignee: Zhen Wang Resolution: Fixed > Spark 3.3 produces invalid base64 > - > > Key: SPARK-47307 > URL: https://issues.apache.org/jira/browse/SPARK-47307 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 4.0.0, 3.5.2, 3.4.4 >Reporter: Willi Raschkowski >Assignee: Zhen Wang >Priority: Blocker > Labels: correctness, pull-request-available > Fix For: 4.0.0 > > > SPARK-37820 was introduced in Spark 3.3 and breaks behavior of {{base64}} > (which is fine but shouldn't happen between minor version). > {code:title=Spark 3.2} > >>> spark.sql(f"""SELECT base64('{'a' * 58}') AS base64""").collect()[0][0] > 'YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYQ==' > {code} > Note the different output in Spark 3.3 (the addition of {{\r\n}} newlines). > {code:title=Spark 3.3} > >>> spark.sql(f"""SELECT base64('{'a' * 58}') AS base64""").collect()[0][0] > 'YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh\r\nYQ==' > {code} > The former decodes fine with the {{base64}} on my machine but the latter does > not: > {code} > $ pbpaste | base64 --decode > aa% > $ pbpaste | base64 --decode > base64: stdin: (null): error decoding base64 input stream > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48885) Make some inheritances of RuntimeReplaceable override replacement to lazy val
Kent Yao created SPARK-48885: Summary: Make some inheritances of RuntimeReplaceable override replacement to lazy val Key: SPARK-48885 URL: https://issues.apache.org/jira/browse/SPARK-48885 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48845) GenericUDF Can not CatchException From Child UDFs
[ https://issues.apache.org/jira/browse/SPARK-48845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48845. -- Fix Version/s: 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 47268 [https://github.com/apache/spark/pull/47268] > GenericUDF Can not CatchException From Child UDFs > - > > Key: SPARK-48845 > URL: https://issues.apache.org/jira/browse/SPARK-48845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0, 3.5.1 >Reporter: Junqing Li >Assignee: Junqing Li >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > > During the upgrade from Spark 3.3.1 to 3.5.1, we encountered syntax issues > with this pr. The problem arose from DeferredObject currently passing a value > instead of a function, which prevented users from catching exceptions in > GenericUDF, resulting in semantic differences. > Here is an example case we encountered. Originally, the semantics were that > {{str_to_map_udf}} would throw an exception due to issues with the input > string, while {{merge_map_udf}} could catch the exception and return a null > value. However, currently, any exception encountered by {{str_to_map_udf}} > will cause the program to fail. > {code:java} > select merge_map_udf(str_to_map_udf(col1), parse_map_udf(col2), map("key", > "value")) from table {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48876) Upgrade Guava used by the connect module to 33.2.1-jre
[ https://issues.apache.org/jira/browse/SPARK-48876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48876. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47296 [https://github.com/apache/spark/pull/47296] > Upgrade Guava used by the connect module to 33.2.1-jre > -- > > Key: SPARK-48876 > URL: https://issues.apache.org/jira/browse/SPARK-48876 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48879) Expand the charset list with Chinese Standard Charsets
Kent Yao created SPARK-48879: Summary: Expand the charset list with Chinese Standard Charsets Key: SPARK-48879 URL: https://issues.apache.org/jira/browse/SPARK-48879 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48874) Upgrade MySQL docker image version to 9.0.0
[ https://issues.apache.org/jira/browse/SPARK-48874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48874. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47311 [https://github.com/apache/spark/pull/47311] > Upgrade MySQL docker image version to 9.0.0 > --- > > Key: SPARK-48874 > URL: https://issues.apache.org/jira/browse/SPARK-48874 > Project: Spark > Issue Type: Improvement > Components: Spark Docker, SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48874) Upgrade MySQL docker image version to 9.0.0
[ https://issues.apache.org/jira/browse/SPARK-48874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48874: Assignee: BingKun Pan > Upgrade MySQL docker image version to 9.0.0 > --- > > Key: SPARK-48874 > URL: https://issues.apache.org/jira/browse/SPARK-48874 > Project: Spark > Issue Type: Improvement > Components: Spark Docker, SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47651) Better Documentation of Spark Remote Connect for Pyspark
[ https://issues.apache.org/jira/browse/SPARK-47651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-47651: - Issue Type: Question (was: Improvement) > Better Documentation of Spark Remote Connect for Pyspark > - > > Key: SPARK-47651 > URL: https://issues.apache.org/jira/browse/SPARK-47651 > Project: Spark > Issue Type: Question > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Nagharajan Raghavendran >Priority: Major > Fix For: 3.5.2 > > > Is there a better documentation for Spark Remote Connect on Kubernetes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48093) Add config to switch between client side listener and server side listener
[ https://issues.apache.org/jira/browse/SPARK-48093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48093: - Affects Version/s: 4.0.0 (was: 3.5.0) (was: 3.5.1) (was: 3.5.2) > Add config to switch between client side listener and server side listener > --- > > Key: SPARK-48093 > URL: https://issues.apache.org/jira/browse/SPARK-48093 > Project: Spark > Issue Type: New Feature > Components: Connect, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > Labels: pull-request-available > > We are moving the implementation of Streaming Query Listener from server to > client. For clients already running client side listener, to prevent > regression, we should add a config to let them decide what type of listener > the user want to use. > > This is only added to 3.5.x published versions. For 4.0 and upwards we only > use client side listener. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47801) Use simdjson-java in JSON related UDFs
[ https://issues.apache.org/jira/browse/SPARK-47801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-47801: - Affects Version/s: 4.0.0 (was: 3.5.2) > Use simdjson-java in JSON related UDFs > -- > > Key: SPARK-47801 > URL: https://issues.apache.org/jira/browse/SPARK-47801 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Zheng Shao >Priority: Major > > JSON parsing speed is important. > Right now, functions like GET_JSON_OBJECT, FROM_JSON are slow because they > don't use [https://github.com/simdjson/simdjson-java] > > We should consider adopting [https://github.com/simdjson/simdjson-java] for > those UDFs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46814) Build and Run with Java 21
[ https://issues.apache.org/jira/browse/SPARK-46814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-46814. -- Resolution: Duplicate > Build and Run with Java 21 > -- > > Key: SPARK-46814 > URL: https://issues.apache.org/jira/browse/SPARK-46814 > Project: Spark > Issue Type: New Feature > Components: Build >Affects Versions: 3.5.2, 3.4.3 >Reporter: Madhavan >Priority: Major > Labels: Releasenotes, releasenotes > > Apache Spark supports Java 8, Java 11 (LTS) and Java 17 (LTS). The next Java > LTS version is {*}21{*}. > ||Version||Release Date|| > |Java 21 (LTS)|19th September 2023| > Apache Spark has a release plan and Spark Code freeze along with the release > branch cut details published here, > - [https://spark.apache.org/versioning-policy.html] > Supporting new Java version is considered as a new feature which we cannot > allow to backport. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48866) Fix hints of valid charset in the error message of INVALID_PARAMETER_VALUE.CHARSET
Kent Yao created SPARK-48866: Summary: Fix hints of valid charset in the error message of INVALID_PARAMETER_VALUE.CHARSET Key: SPARK-48866 URL: https://issues.apache.org/jira/browse/SPARK-48866 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48855) Make ExecutorPodsAllocatorSuite independent from default allocation batch size
[ https://issues.apache.org/jira/browse/SPARK-48855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48855: Assignee: Dongjoon Hyun > Make ExecutorPodsAllocatorSuite independent from default allocation batch size > -- > > Key: SPARK-48855 > URL: https://issues.apache.org/jira/browse/SPARK-48855 > Project: Spark > Issue Type: Test > Components: Kubernetes, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48855) Make ExecutorPodsAllocatorSuite independent from default allocation batch size
[ https://issues.apache.org/jira/browse/SPARK-48855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48855. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47279 [https://github.com/apache/spark/pull/47279] > Make ExecutorPodsAllocatorSuite independent from default allocation batch size > -- > > Key: SPARK-48855 > URL: https://issues.apache.org/jira/browse/SPARK-48855 > Project: Spark > Issue Type: Test > Components: Kubernetes, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48857) Restrict charsets in CSVOptions
[ https://issues.apache.org/jira/browse/SPARK-48857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48857. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47280 [https://github.com/apache/spark/pull/47280] > Restrict charsets in CSVOptions > --- > > Key: SPARK-48857 > URL: https://issues.apache.org/jira/browse/SPARK-48857 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48857) Restrict charsets in CSVOptions
Kent Yao created SPARK-48857: Summary: Restrict charsets in CSVOptions Key: SPARK-48857 URL: https://issues.apache.org/jira/browse/SPARK-48857 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48854) Add missing options in CSV documentation
[ https://issues.apache.org/jira/browse/SPARK-48854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48854. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47278 [https://github.com/apache/spark/pull/47278] > Add missing options in CSV documentation > > > Key: SPARK-48854 > URL: https://issues.apache.org/jira/browse/SPARK-48854 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48854) Add missing options in CSV documentation
[ https://issues.apache.org/jira/browse/SPARK-48854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48854: Assignee: Kent Yao > Add missing options in CSV documentation > > > Key: SPARK-48854 > URL: https://issues.apache.org/jira/browse/SPARK-48854 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48854) Add missing options in CSV documentation
Kent Yao created SPARK-48854: Summary: Add missing options in CSV documentation Key: SPARK-48854 URL: https://issues.apache.org/jira/browse/SPARK-48854 Project: Spark Issue Type: Improvement Components: Documentation, SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48807) Binary Support for CSV datasource
[ https://issues.apache.org/jira/browse/SPARK-48807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48807. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47212 [https://github.com/apache/spark/pull/47212] > Binary Support for CSV datasource > - > > Key: SPARK-48807 > URL: https://issues.apache.org/jira/browse/SPARK-48807 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48816) Perf improvement for CSV UnivocityParser with ANSI Intervals
[ https://issues.apache.org/jira/browse/SPARK-48816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48816: Assignee: Kent Yao > Perf improvement for CSV UnivocityParser with ANSI Intervals > > > Key: SPARK-48816 > URL: https://issues.apache.org/jira/browse/SPARK-48816 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48816) Perf improvement for CSV UnivocityParser with ANSI Intervals
[ https://issues.apache.org/jira/browse/SPARK-48816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48816. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47227 [https://github.com/apache/spark/pull/47227] > Perf improvement for CSV UnivocityParser with ANSI Intervals > > > Key: SPARK-48816 > URL: https://issues.apache.org/jira/browse/SPARK-48816 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48804) Add classIsLoadable & OutputCommitter.isAssignableFrom check for outputCommitterClasses
[ https://issues.apache.org/jira/browse/SPARK-48804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48804. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47209 [https://github.com/apache/spark/pull/47209] > Add classIsLoadable & OutputCommitter.isAssignableFrom check for > outputCommitterClasses > --- > > Key: SPARK-48804 > URL: https://issues.apache.org/jira/browse/SPARK-48804 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48640) Perf improvement for format hex from byte array
[ https://issues.apache.org/jira/browse/SPARK-48640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48640. -- Resolution: Not A Problem > Perf improvement for format hex from byte array > --- > > Key: SPARK-48640 > URL: https://issues.apache.org/jira/browse/SPARK-48640 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48792) INSERT with partial column list to table with char/varchar crashes
[ https://issues.apache.org/jira/browse/SPARK-48792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48792: Assignee: Kent Yao > INSERT with partial column list to table with char/varchar crashes > -- > > Key: SPARK-48792 > URL: https://issues.apache.org/jira/browse/SPARK-48792 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.1 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > ``` > 24/07/03 16:29:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > org.apache.spark.SparkException: [INTERNAL_ERROR] Unsupported data type > VarcharType(64). SQLSTATE: XX000 > at > org.apache.spark.SparkException$.internalError(SparkException.scala:92) > at > org.apache.spark.SparkException$.internalError(SparkException.scala:96) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.makeWriter(ParquetWriteSupport.scala:266) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.$anonfun$init$2(ParquetWriteSupport.scala:111) > at scala.collection.immutable.List.map(List.scala:247) > at scala.collection.immutable.List.map(List.scala:79) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:111) > at > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:478) > at > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:422) > at > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:36) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetUtils$$anon$1.newInstance(ParquetUtils.scala:500) > at > org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:180) > at > org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:165) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:391) > at > org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:107) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:896) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:896) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:369) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:333) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) > at org.apache.spark.scheduler.Task.run(Task.scala:146) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:640) > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:643) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:840) > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48792) INSERT with partial column list to table with char/varchar crashes
[ https://issues.apache.org/jira/browse/SPARK-48792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48792. -- Fix Version/s: 4.0.0 Resolution: Fixed > INSERT with partial column list to table with char/varchar crashes > -- > > Key: SPARK-48792 > URL: https://issues.apache.org/jira/browse/SPARK-48792 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.1 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > ``` > 24/07/03 16:29:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > org.apache.spark.SparkException: [INTERNAL_ERROR] Unsupported data type > VarcharType(64). SQLSTATE: XX000 > at > org.apache.spark.SparkException$.internalError(SparkException.scala:92) > at > org.apache.spark.SparkException$.internalError(SparkException.scala:96) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.makeWriter(ParquetWriteSupport.scala:266) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.$anonfun$init$2(ParquetWriteSupport.scala:111) > at scala.collection.immutable.List.map(List.scala:247) > at scala.collection.immutable.List.map(List.scala:79) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:111) > at > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:478) > at > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:422) > at > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:36) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetUtils$$anon$1.newInstance(ParquetUtils.scala:500) > at > org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:180) > at > org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:165) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:391) > at > org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:107) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:896) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:896) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:369) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:333) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) > at org.apache.spark.scheduler.Task.run(Task.scala:146) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:640) > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:643) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:840) > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48816) Perf improvement for CSV UnivocityParser with ANSI Intervals
Kent Yao created SPARK-48816: Summary: Perf improvement for CSV UnivocityParser with ANSI Intervals Key: SPARK-48816 URL: https://issues.apache.org/jira/browse/SPARK-48816 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48806) Pass actual exception when url_decode fails
[ https://issues.apache.org/jira/browse/SPARK-48806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48806. -- Fix Version/s: 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 47211 [https://github.com/apache/spark/pull/47211] > Pass actual exception when url_decode fails > --- > > Key: SPARK-48806 > URL: https://issues.apache.org/jira/browse/SPARK-48806 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: Zhen Wang >Assignee: Zhen Wang >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > > Currently url_decode function ignores actual exception, which contains > information that is useful for quickly locating the problem. > > Like executing this sql: > {code:java} > select url_decode('https%3A%2F%2spark.apache.org'); {code} > We only get the error message: > {code:java} > org.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The > provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure > that the URL is properly formatted and try again. > at > org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:376) > at > org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:118) > at > org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala) > {code} > However, the actual useful exception information is ignored: > {code:java} > java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in > escape (%) pattern - Error at index 1 in: "2s" {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48806) Pass actual exception when url_decode fails
[ https://issues.apache.org/jira/browse/SPARK-48806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48806: Assignee: Zhen Wang > Pass actual exception when url_decode fails > --- > > Key: SPARK-48806 > URL: https://issues.apache.org/jira/browse/SPARK-48806 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: Zhen Wang >Assignee: Zhen Wang >Priority: Minor > Labels: pull-request-available > > Currently url_decode function ignores actual exception, which contains > information that is useful for quickly locating the problem. > > Like executing this sql: > {code:java} > select url_decode('https%3A%2F%2spark.apache.org'); {code} > We only get the error message: > {code:java} > org.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The > provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure > that the URL is properly formatted and try again. > at > org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:376) > at > org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:118) > at > org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala) > {code} > However, the actual useful exception information is ignored: > {code:java} > java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in > escape (%) pattern - Error at index 1 in: "2s" {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48808) NPE when connecting thriftserver through Hive 1.2.1
[ https://issues.apache.org/jira/browse/SPARK-48808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48808: Assignee: Kent Yao > NPE when connecting thriftserver through Hive 1.2.1 > --- > > Key: SPARK-48808 > URL: https://issues.apache.org/jira/browse/SPARK-48808 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48808) NPE when connecting thriftserver through Hive 1.2.1
[ https://issues.apache.org/jira/browse/SPARK-48808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48808. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47213 [https://github.com/apache/spark/pull/47213] > NPE when connecting thriftserver through Hive 1.2.1 > --- > > Key: SPARK-48808 > URL: https://issues.apache.org/jira/browse/SPARK-48808 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48808) NPE when connecting thriftserver through Hive 1.2.1
Kent Yao created SPARK-48808: Summary: NPE when connecting thriftserver through Hive 1.2.1 Key: SPARK-48808 URL: https://issues.apache.org/jira/browse/SPARK-48808 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48807) Binary Support for CSV datasource
Kent Yao created SPARK-48807: Summary: Binary Support for CSV datasource Key: SPARK-48807 URL: https://issues.apache.org/jira/browse/SPARK-48807 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48804) Add classIsLoadable & OutputCommitter.isAssignableFrom check for outputCommitterClasses
Kent Yao created SPARK-48804: Summary: Add classIsLoadable & OutputCommitter.isAssignableFrom check for outputCommitterClasses Key: SPARK-48804 URL: https://issues.apache.org/jira/browse/SPARK-48804 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48803) Throw internal error in OrcDeserializer to align with ParquetWriteSupport
Kent Yao created SPARK-48803: Summary: Throw internal error in OrcDeserializer to align with ParquetWriteSupport Key: SPARK-48803 URL: https://issues.apache.org/jira/browse/SPARK-48803 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48795) Upgrade mysql-connector-j to 9.0.0
[ https://issues.apache.org/jira/browse/SPARK-48795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48795. -- Fix Version/s: 4.0.0 Assignee: Wei Guo Resolution: Fixed Issue resolved by https://github.com/apache/spark/pull/47200 > Upgrade mysql-connector-j to 9.0.0 > -- > > Key: SPARK-48795 > URL: https://issues.apache.org/jira/browse/SPARK-48795 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Affects Versions: 4.0.0 >Reporter: Wei Guo >Assignee: Wei Guo >Priority: Minor > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48792) INSERT with partial column list to table with char/varchar crashes
Kent Yao created SPARK-48792: Summary: INSERT with partial column list to table with char/varchar crashes Key: SPARK-48792 URL: https://issues.apache.org/jira/browse/SPARK-48792 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.1 Reporter: Kent Yao ``` 24/07/03 16:29:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) org.apache.spark.SparkException: [INTERNAL_ERROR] Unsupported data type VarcharType(64). SQLSTATE: XX000 at org.apache.spark.SparkException$.internalError(SparkException.scala:92) at org.apache.spark.SparkException$.internalError(SparkException.scala:96) at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.makeWriter(ParquetWriteSupport.scala:266) at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.$anonfun$init$2(ParquetWriteSupport.scala:111) at scala.collection.immutable.List.map(List.scala:247) at scala.collection.immutable.List.map(List.scala:79) at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:111) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:478) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:422) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:36) at org.apache.spark.sql.execution.datasources.parquet.ParquetUtils$$anon$1.newInstance(ParquetUtils.scala:500) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:180) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:165) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:391) at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:107) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:896) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:896) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:369) at org.apache.spark.rdd.RDD.iterator(RDD.scala:333) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) at org.apache.spark.scheduler.Task.run(Task.scala:146) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:640) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:643) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840) ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48749) Simplify UnaryPositive with RuntimeReplacable
[ https://issues.apache.org/jira/browse/SPARK-48749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48749: Assignee: Kent Yao > Simplify UnaryPositive with RuntimeReplacable > - > > Key: SPARK-48749 > URL: https://issues.apache.org/jira/browse/SPARK-48749 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48749) Simplify UnaryPositive with RuntimeReplacable
[ https://issues.apache.org/jira/browse/SPARK-48749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48749. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47143 [https://github.com/apache/spark/pull/47143] > Simplify UnaryPositive with RuntimeReplacable > - > > Key: SPARK-48749 > URL: https://issues.apache.org/jira/browse/SPARK-48749 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48748) Cache numChars in UTF8String
[ https://issues.apache.org/jira/browse/SPARK-48748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48748: Assignee: Uroš Bojanić > Cache numChars in UTF8String > > > Key: SPARK-48748 > URL: https://issues.apache.org/jira/browse/SPARK-48748 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > > Thread-safe cache for numChars value in UTF8String to allow faster access. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48748) Cache numChars in UTF8String
[ https://issues.apache.org/jira/browse/SPARK-48748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48748. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47142 [https://github.com/apache/spark/pull/47142] > Cache numChars in UTF8String > > > Key: SPARK-48748 > URL: https://issues.apache.org/jira/browse/SPARK-48748 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Fix For: 4.0.0 > > > Thread-safe cache for numChars value in UTF8String to allow faster access. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48673) Scheduling Across Applications in k8s mode
[ https://issues.apache.org/jira/browse/SPARK-48673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48673. -- Resolution: Information Provided Please use the dev or user mailing lists for questions https://spark.apache.org/community.html > Scheduling Across Applications in k8s mode > --- > > Key: SPARK-48673 > URL: https://issues.apache.org/jira/browse/SPARK-48673 > Project: Spark > Issue Type: Question > Components: k8s, Kubernetes, Scheduler, Spark Shell, Spark Submit >Affects Versions: 3.5.1 >Reporter: Samba Shiva >Priority: Trivial > > I have been trying autoscaling in Kubernetes for Spark Jobs,When first job is > triggered based on load workers pods are scaling which is fine but When > second job is submitted its not getting allocating any resources as First Job > is consuming all the resources. > Second job is in Waiting State until First Job is finished.I have gone > through documentation to set max cores in standalone mode which is not a > ideal solution as we are planning autoscaling based on load and Jobs > submitted. > Is there any solution for this or any alternatives ? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48673) Scheduling Across Applications in k8s mode
[ https://issues.apache.org/jira/browse/SPARK-48673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48673: - Issue Type: Question (was: Improvement) > Scheduling Across Applications in k8s mode > --- > > Key: SPARK-48673 > URL: https://issues.apache.org/jira/browse/SPARK-48673 > Project: Spark > Issue Type: Question > Components: k8s, Kubernetes, Scheduler, Spark Shell, Spark Submit >Affects Versions: 3.5.1 >Reporter: Samba Shiva >Priority: Blocker > > I have been trying autoscaling in Kubernetes for Spark Jobs,When first job is > triggered based on load workers pods are scaling which is fine but When > second job is submitted its not getting allocating any resources as First Job > is consuming all the resources. > Second job is in Waiting State until First Job is finished.I have gone > through documentation to set max cores in standalone mode which is not a > ideal solution as we are planning autoscaling based on load and Jobs > submitted. > Is there any solution for this or any alternatives ? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48673) Scheduling Across Applications in k8s mode
[ https://issues.apache.org/jira/browse/SPARK-48673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48673: - Priority: Trivial (was: Blocker) > Scheduling Across Applications in k8s mode > --- > > Key: SPARK-48673 > URL: https://issues.apache.org/jira/browse/SPARK-48673 > Project: Spark > Issue Type: Question > Components: k8s, Kubernetes, Scheduler, Spark Shell, Spark Submit >Affects Versions: 3.5.1 >Reporter: Samba Shiva >Priority: Trivial > > I have been trying autoscaling in Kubernetes for Spark Jobs,When first job is > triggered based on load workers pods are scaling which is fine but When > second job is submitted its not getting allocating any resources as First Job > is consuming all the resources. > Second job is in Waiting State until First Job is finished.I have gone > through documentation to set max cores in standalone mode which is not a > ideal solution as we are planning autoscaling based on load and Jobs > submitted. > Is there any solution for this or any alternatives ? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48749) Simplify UnaryPositive with RuntimeReplacable
Kent Yao created SPARK-48749: Summary: Simplify UnaryPositive with RuntimeReplacable Key: SPARK-48749 URL: https://issues.apache.org/jira/browse/SPARK-48749 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48709) varchar resolution mismatch for DataSourceV2 CTAS
[ https://issues.apache.org/jira/browse/SPARK-48709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48709: - Fix Version/s: 3.5.2 > varchar resolution mismatch for DataSourceV2 CTAS > - > > Key: SPARK-48709 > URL: https://issues.apache.org/jira/browse/SPARK-48709 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0, 4.0.0, 3.5.1 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 4.0.0, 3.5.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46957) Migrated shuffle data files from the decommissioned node should be removed when job completed
[ https://issues.apache.org/jira/browse/SPARK-46957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-46957: - Fix Version/s: 3.5.2 3.4.4 > Migrated shuffle data files from the decommissioned node should be removed > when job completed > - > > Key: SPARK-46957 > URL: https://issues.apache.org/jira/browse/SPARK-46957 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Yu-Jhe Li >Assignee: wuyi >Priority: Major > Fix For: 4.0.0, 3.5.2, 3.4.4 > > > Hi, we have a long-lived Spark application run on a standalone cluster on GCP > and we are using spot instances. To reduce the impact of preempted instances, > we have enabled node decommission to let the preempted node migrate its > shuffle data to other instances before it is deleted by GCP. > However, we found the migrated shuffle data from the decommissioned node is > never removed. (same behavior on spark-3.5) > *Reproduce steps:* > 1. Start spark-shell with 3 executors and enable decommission on both > driver/worker > {code:java} > start-worker.sh[3331]: Spark Command: > /usr/lib/jvm/java-17-openjdk-amd64/bin/java -cp > /opt/spark/conf/:/opt/spark/jars/* -Dspark.worker.cleanup.appDataTtl=1800 > -Dspark.decommission.enabled=true -Xmx1g > org.apache.spark.deploy.worker.Worker --webui-port 8081 > spark://master-01.com:7077 {code} > {code:java} > /opt/spark/bin/spark-shell --master spark://master-01.spark.com:7077 \ > --total-executor-cores 12 \ > --conf spark.decommission.enabled=true \ > --conf spark.storage.decommission.enabled=true \ > --conf spark.storage.decommission.shuffleBlocks.enabled=true \ > --conf spark.storage.decommission.rddBlocks.enabled=true{code} > > 2. Manually stop 1 worker during execution > {code:java} > (1 to 10).foreach { i => > println(s"start iter $i ...") > val longString = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. > Integer eget tortor id libero ultricies faucibus nec ac neque. Vivamus ac > risus vitae mi efficitur lacinia. Quisque dignissim quam vel tellus placerat, > non laoreet elit rhoncus. Nam et magna id dui tempor sagittis. Aliquam erat > volutpat. Integer tristique purus ac eros bibendum, at varius velit viverra. > Sed eleifend luctus massa, ac accumsan leo feugiat ac. Sed id nisl et enim > tristique auctor. Sed vel ante nec leo placerat tincidunt. Ut varius, risus > nec sodales tempor, odio augue euismod ipsum, nec tristique e" > val df = (1 to 1 * i).map(j => (j, s"${j}_${longString}")).toDF("id", > "mystr") > df.repartition(6).count() > System.gc() > println(s"finished iter $i, wait 15s for next round") > Thread.sleep(15*1000) > } > System.gc() > start iter 1 ... > finished iter 1, wait 15s for next round > ... {code} > > 3. Check the migrated shuffle data files on the remaining workers > {*}decommissioned node{*}: migrated shuffle file successfully > {code:java} > less /mnt/spark_work/app-20240202084807-0003/1/stdout | grep 'Migrated ' > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_4_41 to BlockManagerId(2, 10.67.5.139, 35949, None) > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_4_38 to BlockManagerId(0, 10.67.5.134, 36175, None) > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_4_47 to BlockManagerId(0, 10.67.5.134, 36175, None) > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_4_44 to BlockManagerId(2, 10.67.5.139, 35949, None) > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_5_52 to BlockManagerId(0, 10.67.5.134, 36175, None) > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_5_55 to BlockManagerId(2, 10.67.5.139, 35949, None) {code} > {*}remaining shuffle data files on the other workers{*}: the migrated shuffle > files are never removed > {code:java} > 10.67.5.134 | CHANGED | rc=0 >> > -rw-r--r-- 1 spark spark 126 Feb 2 08:48 > /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/13/shuffle_4_47_0.data > -rw-r--r-- 1 spark spark 126 Feb 2 08:48 > /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/31/shuffle_4_38_0.data > -rw-r--r-- 1 spark spark 32 Feb 2 08:48 > /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/3a/shuffle_5_52_0.data > 10.67.5.139 | CHANGED | rc=0 >> > -rw-r--r-- 1 spark spark 126 Feb 2 08:48 >
[jira] [Assigned] (SPARK-48735) Performance Improvement for BIN function
[ https://issues.apache.org/jira/browse/SPARK-48735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48735: Assignee: Kent Yao > Performance Improvement for BIN function > > > Key: SPARK-48735 > URL: https://issues.apache.org/jira/browse/SPARK-48735 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > {code:java} > --- a/sql/core/benchmarks/MathFunctionBenchmark-results.txt > +++ b/sql/core/benchmarks/MathFunctionBenchmark-results.txt > @@ -2,5 +2,5 @@ OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5 > Apple M2 Max > encode: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > > > -BIN 2657 2661 > 5 3.8 265.7 1.0X > +BIN 1524 1567 > 61 6.6 152.4 1.0X {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48735) Performance Improvement for BIN function
[ https://issues.apache.org/jira/browse/SPARK-48735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48735. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47119 [https://github.com/apache/spark/pull/47119] > Performance Improvement for BIN function > > > Key: SPARK-48735 > URL: https://issues.apache.org/jira/browse/SPARK-48735 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 4.0.0 > > > {code:java} > --- a/sql/core/benchmarks/MathFunctionBenchmark-results.txt > +++ b/sql/core/benchmarks/MathFunctionBenchmark-results.txt > @@ -2,5 +2,5 @@ OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5 > Apple M2 Max > encode: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > > > -BIN 2657 2661 > 5 3.8 265.7 1.0X > +BIN 1524 1567 > 61 6.6 152.4 1.0X {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46957) Migrated shuffle data files from the decommissioned node should be removed when job completed
[ https://issues.apache.org/jira/browse/SPARK-46957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860437#comment-17860437 ] Kent Yao commented on SPARK-46957: -- https://github.com/apache/spark/commit/7aa12b6cd01da88cbbb3e8c6e50863e6139315b7 https://github.com/apache/spark/commit/f8b1040ea006fe48df6bb52e0ace4dce54ab6d56 Reverted it from 3.4 and 3.5 to fix the CI > Migrated shuffle data files from the decommissioned node should be removed > when job completed > - > > Key: SPARK-46957 > URL: https://issues.apache.org/jira/browse/SPARK-46957 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Yu-Jhe Li >Assignee: wuyi >Priority: Major > Fix For: 4.0.0 > > > Hi, we have a long-lived Spark application run on a standalone cluster on GCP > and we are using spot instances. To reduce the impact of preempted instances, > we have enabled node decommission to let the preempted node migrate its > shuffle data to other instances before it is deleted by GCP. > However, we found the migrated shuffle data from the decommissioned node is > never removed. (same behavior on spark-3.5) > *Reproduce steps:* > 1. Start spark-shell with 3 executors and enable decommission on both > driver/worker > {code:java} > start-worker.sh[3331]: Spark Command: > /usr/lib/jvm/java-17-openjdk-amd64/bin/java -cp > /opt/spark/conf/:/opt/spark/jars/* -Dspark.worker.cleanup.appDataTtl=1800 > -Dspark.decommission.enabled=true -Xmx1g > org.apache.spark.deploy.worker.Worker --webui-port 8081 > spark://master-01.com:7077 {code} > {code:java} > /opt/spark/bin/spark-shell --master spark://master-01.spark.com:7077 \ > --total-executor-cores 12 \ > --conf spark.decommission.enabled=true \ > --conf spark.storage.decommission.enabled=true \ > --conf spark.storage.decommission.shuffleBlocks.enabled=true \ > --conf spark.storage.decommission.rddBlocks.enabled=true{code} > > 2. Manually stop 1 worker during execution > {code:java} > (1 to 10).foreach { i => > println(s"start iter $i ...") > val longString = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. > Integer eget tortor id libero ultricies faucibus nec ac neque. Vivamus ac > risus vitae mi efficitur lacinia. Quisque dignissim quam vel tellus placerat, > non laoreet elit rhoncus. Nam et magna id dui tempor sagittis. Aliquam erat > volutpat. Integer tristique purus ac eros bibendum, at varius velit viverra. > Sed eleifend luctus massa, ac accumsan leo feugiat ac. Sed id nisl et enim > tristique auctor. Sed vel ante nec leo placerat tincidunt. Ut varius, risus > nec sodales tempor, odio augue euismod ipsum, nec tristique e" > val df = (1 to 1 * i).map(j => (j, s"${j}_${longString}")).toDF("id", > "mystr") > df.repartition(6).count() > System.gc() > println(s"finished iter $i, wait 15s for next round") > Thread.sleep(15*1000) > } > System.gc() > start iter 1 ... > finished iter 1, wait 15s for next round > ... {code} > > 3. Check the migrated shuffle data files on the remaining workers > {*}decommissioned node{*}: migrated shuffle file successfully > {code:java} > less /mnt/spark_work/app-20240202084807-0003/1/stdout | grep 'Migrated ' > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_4_41 to BlockManagerId(2, 10.67.5.139, 35949, None) > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_4_38 to BlockManagerId(0, 10.67.5.134, 36175, None) > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_4_47 to BlockManagerId(0, 10.67.5.134, 36175, None) > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_4_44 to BlockManagerId(2, 10.67.5.139, 35949, None) > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_5_52 to BlockManagerId(0, 10.67.5.134, 36175, None) > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_5_55 to BlockManagerId(2, 10.67.5.139, 35949, None) {code} > {*}remaining shuffle data files on the other workers{*}: the migrated shuffle > files are never removed > {code:java} > 10.67.5.134 | CHANGED | rc=0 >> > -rw-r--r-- 1 spark spark 126 Feb 2 08:48 > /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/13/shuffle_4_47_0.data > -rw-r--r-- 1 spark spark 126 Feb 2 08:48 > /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/31/shuffle_4_38_0.data > -rw-r--r-- 1 spark spark 32 Feb 2 08:48 >
[jira] [Updated] (SPARK-46957) Migrated shuffle data files from the decommissioned node should be removed when job completed
[ https://issues.apache.org/jira/browse/SPARK-46957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-46957: - Fix Version/s: (was: 3.5.2) (was: 3.4.4) > Migrated shuffle data files from the decommissioned node should be removed > when job completed > - > > Key: SPARK-46957 > URL: https://issues.apache.org/jira/browse/SPARK-46957 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Yu-Jhe Li >Assignee: wuyi >Priority: Major > Fix For: 4.0.0 > > > Hi, we have a long-lived Spark application run on a standalone cluster on GCP > and we are using spot instances. To reduce the impact of preempted instances, > we have enabled node decommission to let the preempted node migrate its > shuffle data to other instances before it is deleted by GCP. > However, we found the migrated shuffle data from the decommissioned node is > never removed. (same behavior on spark-3.5) > *Reproduce steps:* > 1. Start spark-shell with 3 executors and enable decommission on both > driver/worker > {code:java} > start-worker.sh[3331]: Spark Command: > /usr/lib/jvm/java-17-openjdk-amd64/bin/java -cp > /opt/spark/conf/:/opt/spark/jars/* -Dspark.worker.cleanup.appDataTtl=1800 > -Dspark.decommission.enabled=true -Xmx1g > org.apache.spark.deploy.worker.Worker --webui-port 8081 > spark://master-01.com:7077 {code} > {code:java} > /opt/spark/bin/spark-shell --master spark://master-01.spark.com:7077 \ > --total-executor-cores 12 \ > --conf spark.decommission.enabled=true \ > --conf spark.storage.decommission.enabled=true \ > --conf spark.storage.decommission.shuffleBlocks.enabled=true \ > --conf spark.storage.decommission.rddBlocks.enabled=true{code} > > 2. Manually stop 1 worker during execution > {code:java} > (1 to 10).foreach { i => > println(s"start iter $i ...") > val longString = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. > Integer eget tortor id libero ultricies faucibus nec ac neque. Vivamus ac > risus vitae mi efficitur lacinia. Quisque dignissim quam vel tellus placerat, > non laoreet elit rhoncus. Nam et magna id dui tempor sagittis. Aliquam erat > volutpat. Integer tristique purus ac eros bibendum, at varius velit viverra. > Sed eleifend luctus massa, ac accumsan leo feugiat ac. Sed id nisl et enim > tristique auctor. Sed vel ante nec leo placerat tincidunt. Ut varius, risus > nec sodales tempor, odio augue euismod ipsum, nec tristique e" > val df = (1 to 1 * i).map(j => (j, s"${j}_${longString}")).toDF("id", > "mystr") > df.repartition(6).count() > System.gc() > println(s"finished iter $i, wait 15s for next round") > Thread.sleep(15*1000) > } > System.gc() > start iter 1 ... > finished iter 1, wait 15s for next round > ... {code} > > 3. Check the migrated shuffle data files on the remaining workers > {*}decommissioned node{*}: migrated shuffle file successfully > {code:java} > less /mnt/spark_work/app-20240202084807-0003/1/stdout | grep 'Migrated ' > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_4_41 to BlockManagerId(2, 10.67.5.139, 35949, None) > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_4_38 to BlockManagerId(0, 10.67.5.134, 36175, None) > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_4_47 to BlockManagerId(0, 10.67.5.134, 36175, None) > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_4_44 to BlockManagerId(2, 10.67.5.139, 35949, None) > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_5_52 to BlockManagerId(0, 10.67.5.134, 36175, None) > 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated > migrate_shuffle_5_55 to BlockManagerId(2, 10.67.5.139, 35949, None) {code} > {*}remaining shuffle data files on the other workers{*}: the migrated shuffle > files are never removed > {code:java} > 10.67.5.134 | CHANGED | rc=0 >> > -rw-r--r-- 1 spark spark 126 Feb 2 08:48 > /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/13/shuffle_4_47_0.data > -rw-r--r-- 1 spark spark 126 Feb 2 08:48 > /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/31/shuffle_4_38_0.data > -rw-r--r-- 1 spark spark 32 Feb 2 08:48 > /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/3a/shuffle_5_52_0.data > 10.67.5.139 | CHANGED | rc=0 >> > -rw-r--r-- 1 spark spark 126 Feb 2 08:48 >
[jira] [Updated] (SPARK-48735) Performance Improvement for BIN function
[ https://issues.apache.org/jira/browse/SPARK-48735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48735: - Description: {code:java} --- a/sql/core/benchmarks/MathFunctionBenchmark-results.txt +++ b/sql/core/benchmarks/MathFunctionBenchmark-results.txt @@ -2,5 +2,5 @@ OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5 Apple M2 Max encode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative -BIN 2657 2661 5 3.8 265.7 1.0X +BIN 1524 1567 61 6.6 152.4 1.0X {code} was: {code:diff} --- a/sql/core/benchmarks/MathFunctionBenchmark-results.txt +++ b/sql/core/benchmarks/MathFunctionBenchmark-results.txt @@ -2,5 +2,5 @@ OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5 Apple M2 Max encode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative -BIN 2657 2661 5 3.8 265.7 1.0X +BIN 1524 1567 61 6.6 152.4 1.0X {code} > Performance Improvement for BIN function > > > Key: SPARK-48735 > URL: https://issues.apache.org/jira/browse/SPARK-48735 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > > {code:java} > --- a/sql/core/benchmarks/MathFunctionBenchmark-results.txt > +++ b/sql/core/benchmarks/MathFunctionBenchmark-results.txt > @@ -2,5 +2,5 @@ OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5 > Apple M2 Max > encode: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > > > -BIN 2657 2661 > 5 3.8 265.7 1.0X > +BIN 1524 1567 > 61 6.6 152.4 1.0X {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48735) Performance Improvement for BIN function
[ https://issues.apache.org/jira/browse/SPARK-48735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48735: - Description: {code:diff} --- a/sql/core/benchmarks/MathFunctionBenchmark-results.txt +++ b/sql/core/benchmarks/MathFunctionBenchmark-results.txt @@ -2,5 +2,5 @@ OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5 Apple M2 Max encode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative -BIN 2657 2661 5 3.8 265.7 1.0X +BIN 1524 1567 61 6.6 152.4 1.0X {code} > Performance Improvement for BIN function > > > Key: SPARK-48735 > URL: https://issues.apache.org/jira/browse/SPARK-48735 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > > {code:diff} > --- a/sql/core/benchmarks/MathFunctionBenchmark-results.txt > +++ b/sql/core/benchmarks/MathFunctionBenchmark-results.txt > @@ -2,5 +2,5 @@ OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5 > Apple M2 Max > encode: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > > > -BIN 2657 2661 > 5 3.8 265.7 1.0X > +BIN 1524 1567 > 61 6.6 152.4 1.0X {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48735) Performance Improvement for BIN function
Kent Yao created SPARK-48735: Summary: Performance Improvement for BIN function Key: SPARK-48735 URL: https://issues.apache.org/jira/browse/SPARK-48735 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48712) Perf Improvement for Encode with empty string and UTF-8 charset
[ https://issues.apache.org/jira/browse/SPARK-48712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48712: - Parent: SPARK-48624 Issue Type: Sub-task (was: Improvement) > Perf Improvement for Encode with empty string and UTF-8 charset > --- > > Key: SPARK-48712 > URL: https://issues.apache.org/jira/browse/SPARK-48712 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 4.0.0 > > > Apple M2 Max > encode: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > > > -UTF-8 3672 3697 > 22 5.4 183.6 1.0X > +UTF-8 79270 79698 > 448 0.3 3963.5 1.0X -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48712) Perf Improvement for Encode with empty string and UTF-8 charset
[ https://issues.apache.org/jira/browse/SPARK-48712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48712. -- Fix Version/s: 4.0.0 Assignee: Kent Yao Resolution: Fixed Resolved by https://github.com/apache/spark/pull/47096 > Perf Improvement for Encode with empty string and UTF-8 charset > --- > > Key: SPARK-48712 > URL: https://issues.apache.org/jira/browse/SPARK-48712 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 4.0.0 > > > Apple M2 Max > encode: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > > > -UTF-8 3672 3697 > 22 5.4 183.6 1.0X > +UTF-8 79270 79698 > 448 0.3 3963.5 1.0X -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48723) Run `git cherry-pick --abort` if backporting is denied by committer
Kent Yao created SPARK-48723: Summary: Run `git cherry-pick --abort` if backporting is denied by committer Key: SPARK-48723 URL: https://issues.apache.org/jira/browse/SPARK-48723 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48713) Add index range check for UnsafeRow.pointTo when baseObject is byte array
[ https://issues.apache.org/jira/browse/SPARK-48713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48713. -- Fix Version/s: 4.0.0 Assignee: wuyi Resolution: Fixed > Add index range check for UnsafeRow.pointTo when baseObject is byte array > - > > Key: SPARK-48713 > URL: https://issues.apache.org/jira/browse/SPARK-48713 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48721) Fix Decode doc in SQL API page
Kent Yao created SPARK-48721: Summary: Fix Decode doc in SQL API page Key: SPARK-48721 URL: https://issues.apache.org/jira/browse/SPARK-48721 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.3, 3.5.1, 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48706) Python UDF in higher order functions should not throw internal error
[ https://issues.apache.org/jira/browse/SPARK-48706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48706: Assignee: Hyukjin Kwon > Python UDF in higher order functions should not throw internal error > > > Key: SPARK-48706 > URL: https://issues.apache.org/jira/browse/SPARK-48706 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > {code} > from pyspark.sql.functions import transform, udf, col, array > spark.range(1).select(transform(array("id"), lambda x: udf(lambda y: > y)(x))).collect() > {code} > throws an internal error: > {code} > at > org.apache.spark.SparkException$.internalError(SparkException.scala:88) > at > org.apache.spark.SparkException$.internalError(SparkException.scala:92) > at > org.apache.spark.sql.errors.QueryExecutionErrors$.cannotEvaluateExpressionError(QueryExecutionErrors.scala:73) > at > org.apache.spark.sql.catalyst.expressions.Unevaluable.eval(Expression.scala:507) > at > org.apache.spark.sql.catalyst.expressions.Unevaluable.eval$(Expression.scala:506) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48706) Python UDF in higher order functions should not throw internal error
[ https://issues.apache.org/jira/browse/SPARK-48706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48706. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47079 [https://github.com/apache/spark/pull/47079] > Python UDF in higher order functions should not throw internal error > > > Key: SPARK-48706 > URL: https://issues.apache.org/jira/browse/SPARK-48706 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 4.0.0 > > > {code} > from pyspark.sql.functions import transform, udf, col, array > spark.range(1).select(transform(array("id"), lambda x: udf(lambda y: > y)(x))).collect() > {code} > throws an internal error: > {code} > at > org.apache.spark.SparkException$.internalError(SparkException.scala:88) > at > org.apache.spark.SparkException$.internalError(SparkException.scala:92) > at > org.apache.spark.sql.errors.QueryExecutionErrors$.cannotEvaluateExpressionError(QueryExecutionErrors.scala:73) > at > org.apache.spark.sql.catalyst.expressions.Unevaluable.eval(Expression.scala:507) > at > org.apache.spark.sql.catalyst.expressions.Unevaluable.eval$(Expression.scala:506) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48573) Upgrade ICU version
[ https://issues.apache.org/jira/browse/SPARK-48573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48573. -- Target Version/s: 4.0.0 Assignee: Mihailo Milosevic Resolution: Fixed > Upgrade ICU version > --- > > Key: SPARK-48573 > URL: https://issues.apache.org/jira/browse/SPARK-48573 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48573) Upgrade ICU version
[ https://issues.apache.org/jira/browse/SPARK-48573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48573: - Fix Version/s: 4.0.0 > Upgrade ICU version > --- > > Key: SPARK-48573 > URL: https://issues.apache.org/jira/browse/SPARK-48573 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48712) Perf Improvement for Encode with empty string and UTF-8 charset
[ https://issues.apache.org/jira/browse/SPARK-48712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48712: - Description: Apple M2 Max encode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative -UTF-8 3672 3697 22 5.4 183.6 1.0X +UTF-8 79270 79698 448 0.3 3963.5 1.0X > Perf Improvement for Encode with empty string and UTF-8 charset > --- > > Key: SPARK-48712 > URL: https://issues.apache.org/jira/browse/SPARK-48712 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > > Apple M2 Max > encode: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > > > -UTF-8 3672 3697 > 22 5.4 183.6 1.0X > +UTF-8 79270 79698 > 448 0.3 3963.5 1.0X -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48712) Perf Improvement for Encode with empty string and UTF-8 charset
Kent Yao created SPARK-48712: Summary: Perf Improvement for Encode with empty string and UTF-8 charset Key: SPARK-48712 URL: https://issues.apache.org/jira/browse/SPARK-48712 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48693) simplify and unify toString of Invoke and StaticInvoke
[ https://issues.apache.org/jira/browse/SPARK-48693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48693. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47066 [https://github.com/apache/spark/pull/47066] > simplify and unify toString of Invoke and StaticInvoke > -- > > Key: SPARK-48693 > URL: https://issues.apache.org/jira/browse/SPARK-48693 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48658) Encode/Decode functions report coding error instead of mojibake
[ https://issues.apache.org/jira/browse/SPARK-48658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48658. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47017 [https://github.com/apache/spark/pull/47017] > Encode/Decode functions report coding error instead of mojibake > --- > > Key: SPARK-48658 > URL: https://issues.apache.org/jira/browse/SPARK-48658 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48658) Encode/Decode functions report coding error instead of mojibake
[ https://issues.apache.org/jira/browse/SPARK-48658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48658: Assignee: Kent Yao > Encode/Decode functions report coding error instead of mojibake > --- > > Key: SPARK-48658 > URL: https://issues.apache.org/jira/browse/SPARK-48658 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48696) Also truncate the schema row for show function
Kent Yao created SPARK-48696: Summary: Also truncate the schema row for show function Key: SPARK-48696 URL: https://issues.apache.org/jira/browse/SPARK-48696 Project: Spark Issue Type: Improvement Components: Connect, SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48680) Add char/varchar doc to language specific tables
[ https://issues.apache.org/jira/browse/SPARK-48680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48680. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47052 [https://github.com/apache/spark/pull/47052] > Add char/varchar doc to language specific tables > > > Key: SPARK-48680 > URL: https://issues.apache.org/jira/browse/SPARK-48680 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48693) simplify and unify toString of Invoke and StaticInvoke
Kent Yao created SPARK-48693: Summary: simplify and unify toString of Invoke and StaticInvoke Key: SPARK-48693 URL: https://issues.apache.org/jira/browse/SPARK-48693 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48684) Print related JIRA summary before proceeding merge
[ https://issues.apache.org/jira/browse/SPARK-48684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48684. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47057 [https://github.com/apache/spark/pull/47057] > Print related JIRA summary before proceeding merge > -- > > Key: SPARK-48684 > URL: https://issues.apache.org/jira/browse/SPARK-48684 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48684) Print related JIRA summary before proceeding merge
[ https://issues.apache.org/jira/browse/SPARK-48684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48684: - Component/s: Project Infra (was: SQL) > Print related JIRA summary before proceeding merge > -- > > Key: SPARK-48684 > URL: https://issues.apache.org/jira/browse/SPARK-48684 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48684) Print related JIRA summary before proceeding merge
[ https://issues.apache.org/jira/browse/SPARK-48684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48684: - Priority: Minor (was: Major) > Print related JIRA summary before proceeding merge > -- > > Key: SPARK-48684 > URL: https://issues.apache.org/jira/browse/SPARK-48684 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48684) Print related JIRA summary before proceeding merge
Kent Yao created SPARK-48684: Summary: Print related JIRA summary before proceeding merge Key: SPARK-48684 URL: https://issues.apache.org/jira/browse/SPARK-48684 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48680) Add char/varchar doc to language specific tables
Kent Yao created SPARK-48680: Summary: Add char/varchar doc to language specific tables Key: SPARK-48680 URL: https://issues.apache.org/jira/browse/SPARK-48680 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48656) ArrayIndexOutOfBoundsException in CartesianRDD getPartitions
[ https://issues.apache.org/jira/browse/SPARK-48656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48656. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47019 [https://github.com/apache/spark/pull/47019] > ArrayIndexOutOfBoundsException in CartesianRDD getPartitions > > > Key: SPARK-48656 > URL: https://issues.apache.org/jira/browse/SPARK-48656 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Nick Young >Assignee: Wei Guo >Priority: Major > Fix For: 4.0.0 > > > ```val rdd1 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = 65536) > val rdd2 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = > 65536)rdd2.cartesian(rdd1).partitions``` > Throws `ArrayIndexOutOfBoundsException: 0` at CartesianRDD.scala:69 because > `s1.index * numPartitionsInRdd2 + s2.index` overflows and wraps to 0. We > should provide a better error message which indicates the number of partition > overflows so it's easier for the user to debug. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org