[jira] [Created] (SPARK-44034) Add a new test group for sql module
Yang Jie created SPARK-44034: Summary: Add a new test group for sql module Key: SPARK-44034 URL: https://issues.apache.org/jira/browse/SPARK-44034 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43885) DataSource V2: Handle MERGE commands for delta-based sources
[ https://issues.apache.org/jira/browse/SPARK-43885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-43885. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41448 [https://github.com/apache/spark/pull/41448] > DataSource V2: Handle MERGE commands for delta-based sources > > > Key: SPARK-43885 > URL: https://issues.apache.org/jira/browse/SPARK-43885 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > Fix For: 3.5.0 > > > We should handle MERGE commands for delta-based sources, just like DELETE and > UPDATE. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43885) DataSource V2: Handle MERGE commands for delta-based sources
[ https://issues.apache.org/jira/browse/SPARK-43885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-43885: - Assignee: Anton Okolnychyi > DataSource V2: Handle MERGE commands for delta-based sources > > > Key: SPARK-43885 > URL: https://issues.apache.org/jira/browse/SPARK-43885 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > > We should handle MERGE commands for delta-based sources, just like DELETE and > UPDATE. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44032) Remove the bytecode version exclusion of threeten-extra when orc-core fixes the violation
[ https://issues.apache.org/jira/browse/SPARK-44032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Liang updated SPARK-44032: Affects Version/s: 3.4.0 (was: 3.4.1) > Remove the bytecode version exclusion of threeten-extra when orc-core fixes > the violation > - > > Key: SPARK-44032 > URL: https://issues.apache.org/jira/browse/SPARK-44032 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.4.0 >Reporter: Bowen Liang >Priority: Minor > > Remove the exclusion of threeten-extra when orc-core fixes the violation. > > `threeten-extra` 1.7.1 is depended by `orc-core` 1.8.0, with > `package-info.class` violating the bytecode version enforcer rule which > enforcing max bytecode version to 52. 1.7.1 and latest available version > 1.7.2 has the similar problem. So exclusion for threeten-extra is added to > the enforcer rule. Remove it when orc-core fixed the upstream dependency > bytecode version violation. > > threeten-extra-1.7.1/org/threeten/extra/scale/package-info.class, major > version:53 > threeten-extra-1.7.1/org/threeten/extra/chrono/package-info.class, major > version:53 > threeten-extra-1.7.1/org/threeten/extra/package-info.class, major version:53 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44032) Remove the bytecode version exclusion of threeten-extra when orc-core fixes the violation
[ https://issues.apache.org/jira/browse/SPARK-44032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Liang updated SPARK-44032: Target Version/s: (was: 3.4.1) > Remove the bytecode version exclusion of threeten-extra when orc-core fixes > the violation > - > > Key: SPARK-44032 > URL: https://issues.apache.org/jira/browse/SPARK-44032 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.4.1 >Reporter: Bowen Liang >Priority: Minor > > Remove the exclusion of threeten-extra when orc-core fixes the violation. > > `threeten-extra` 1.7.1 is depended by `orc-core` 1.8.0, with > `package-info.class` violating the bytecode version enforcer rule which > enforcing max bytecode version to 52. 1.7.1 and latest available version > 1.7.2 has the similar problem. So exclusion for threeten-extra is added to > the enforcer rule. Remove it when orc-core fixed the upstream dependency > bytecode version violation. > > threeten-extra-1.7.1/org/threeten/extra/scale/package-info.class, major > version:53 > threeten-extra-1.7.1/org/threeten/extra/chrono/package-info.class, major > version:53 > threeten-extra-1.7.1/org/threeten/extra/package-info.class, major version:53 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44022) Enforce max bytecode version on Maven dependencies
[ https://issues.apache.org/jira/browse/SPARK-44022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Liang updated SPARK-44022: Target Version/s: (was: 3.4.1) > Enforce max bytecode version on Maven dependencies > -- > > Key: SPARK-44022 > URL: https://issues.apache.org/jira/browse/SPARK-44022 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Bowen Liang >Priority: Major > > To enforce Java's max bytecode version to maven dependencies, by using > `enforceBytecodeVersion` enforcer rule. > Preventing introducing dependencies requiring higher Java version 11+, > including transparent dependencies. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44025) CSV Table Read Error with CharType(length) column
[ https://issues.apache.org/jira/browse/SPARK-44025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17731835#comment-17731835 ] BingKun Pan commented on SPARK-44025: - [~cloud_fan] Can I try to fix it? > CSV Table Read Error with CharType(length) column > - > > Key: SPARK-44025 > URL: https://issues.apache.org/jira/browse/SPARK-44025 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 > Environment: {{apache/spark:v3.4.0 image}} >Reporter: Fengyu Cao >Priority: Major > > Problem: > # read a CSV format table > # table has a `CharType(length)` column > # read table failed with Exception: `org.apache.spark.SparkException: Job > aborted due to stage failure: Task 0 in stage 36.0 failed 4 times, most > recent failure: Lost task 0.3 in stage 36.0 (TID 72) (10.113.9.208 executor > 11): java.lang.IllegalArgumentException: requirement failed: requiredSchema > (struct) should be the subset of dataSchema > (struct).` > > reproduce with official image: > # {{docker run -it apache/spark:v3.4.0 /opt/spark/bin/spark-sql}} > # {{CREATE TABLE csv_bug (name STRING, age INT, job CHAR(4)) USING CSV > OPTIONS ('header' = 'true', 'sep' = ';') LOCATION > "/opt/spark/examples/src/main/resources/people.csv";}} > # SELECT * FROM csv_bug; > # ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.lang.IllegalArgumentException: requirement failed: requiredSchema > (struct) should be the subset of dataSchema > (struct). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44033) Support list-like for binary ops
Haejoon Lee created SPARK-44033: --- Summary: Support list-like for binary ops Key: SPARK-44033 URL: https://issues.apache.org/jira/browse/SPARK-44033 Project: Spark Issue Type: Bug Components: Pandas API on Spark Affects Versions: 3.5.0 Reporter: Haejoon Lee We should fix the error below: {code:java} >>> pser = pd.Series([1, 2, 3, 4, 5, 6], name="x") >>> psser = ps.from_pandas(pser) >>> other = [np.nan, 1, 3, 4, np.nan, 6] >>> psser <= other Traceback (most recent call last): File "", line 1, in File "/Users/haejoon.lee/Desktop/git_store/spark/python/pyspark/pandas/base.py", line 412, in __le__ return self._dtype_op.le(self, other) File "/Users/haejoon.lee/Desktop/git_store/spark/python/pyspark/pandas/data_type_ops/num_ops.py", line 242, in le _sanitize_list_like(right) File "/Users/haejoon.lee/Desktop/git_store/spark/python/pyspark/pandas/data_type_ops/base.py", line 199, in _sanitize_list_like raise TypeError("The operation can not be applied to %s." % type(operand).__name__) TypeError: The operation can not be applied to list.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43798) Initial support for Python UDTFs
[ https://issues.apache.org/jira/browse/SPARK-43798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43798: Assignee: Allison Wang > Initial support for Python UDTFs > > > Key: SPARK-43798 > URL: https://issues.apache.org/jira/browse/SPARK-43798 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > > Support Python user-defined table functions with batch eval. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43798) Initial support for Python UDTFs
[ https://issues.apache.org/jira/browse/SPARK-43798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43798. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41316 [https://github.com/apache/spark/pull/41316] > Initial support for Python UDTFs > > > Key: SPARK-43798 > URL: https://issues.apache.org/jira/browse/SPARK-43798 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.5.0 > > > Support Python user-defined table functions with batch eval. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43944) Add string functions to Scala and Python - part 2
[ https://issues.apache.org/jira/browse/SPARK-43944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17731820#comment-17731820 ] BingKun Pan commented on SPARK-43944: - I work on it. > Add string functions to Scala and Python - part 2 > - > > Key: SPARK-43944 > URL: https://issues.apache.org/jira/browse/SPARK-43944 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > > Add following functions: > * replace > * split_part > * substr > * parse_url > * printf > * url_decode > * url_encode > * position > * endswith > * startswith > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44022) Enforce max bytecode version on Maven dependencies
[ https://issues.apache.org/jira/browse/SPARK-44022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Liang updated SPARK-44022: Summary: Enforce max bytecode version on Maven dependencies (was: Enforce Java max bytecode version to maven dependencies) > Enforce max bytecode version on Maven dependencies > -- > > Key: SPARK-44022 > URL: https://issues.apache.org/jira/browse/SPARK-44022 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Bowen Liang >Priority: Major > > To enforce Java's max bytecode version to maven dependencies, by using > `enforceBytecodeVersion` enforcer rule. > Preventing introducing dependencies requiring higher Java version 11+, > including transparent dependencies. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44022) Enforce Java max bytecode version to maven dependencies
[ https://issues.apache.org/jira/browse/SPARK-44022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Liang updated SPARK-44022: Target Version/s: 3.4.1 Affects Version/s: 3.4.0 (was: 3.5.0) > Enforce Java max bytecode version to maven dependencies > --- > > Key: SPARK-44022 > URL: https://issues.apache.org/jira/browse/SPARK-44022 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Bowen Liang >Priority: Major > > To enforce Java's max bytecode version to maven dependencies, by using > `enforceBytecodeVersion` enforcer rule. > Preventing introducing dependencies requiring higher Java version 11+, > including transparent dependencies. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44032) Remove the bytecode version exclusion of threeten-extra when orc-core fixes the violation
[ https://issues.apache.org/jira/browse/SPARK-44032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Liang updated SPARK-44032: Target Version/s: 3.4.1 > Remove the bytecode version exclusion of threeten-extra when orc-core fixes > the violation > - > > Key: SPARK-44032 > URL: https://issues.apache.org/jira/browse/SPARK-44032 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.4.1 >Reporter: Bowen Liang >Priority: Minor > > Remove the exclusion of threeten-extra when orc-core fixes the violation. > > `threeten-extra` 1.7.1 is depended by `orc-core` 1.8.0, with > `package-info.class` violating the bytecode version enforcer rule which > enforcing max bytecode version to 52. 1.7.1 and latest available version > 1.7.2 has the similar problem. So exclusion for threeten-extra is added to > the enforcer rule. Remove it when orc-core fixed the upstream dependency > bytecode version violation. > > threeten-extra-1.7.1/org/threeten/extra/scale/package-info.class, major > version:53 > threeten-extra-1.7.1/org/threeten/extra/chrono/package-info.class, major > version:53 > threeten-extra-1.7.1/org/threeten/extra/package-info.class, major version:53 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44032) Remove the bytecode version exclusion of threeten-extra when orc-core fixes the violation
[ https://issues.apache.org/jira/browse/SPARK-44032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Liang updated SPARK-44032: Description: Remove the exclusion of threeten-extra when orc-core fixes the violation. `threeten-extra` 1.7.1 is depended by `orc-core` 1.8.0, with `package-info.class` violating the bytecode version enforcer rule which enforcing max bytecode version to 52. 1.7.1 and latest available version 1.7.2 has the similar problem. So exclusion for threeten-extra is added to the enforcer rule. Remove it when orc-core fixed the upstream dependency bytecode version violation. threeten-extra-1.7.1/org/threeten/extra/scale/package-info.class, major version:53 threeten-extra-1.7.1/org/threeten/extra/chrono/package-info.class, major version:53 threeten-extra-1.7.1/org/threeten/extra/package-info.class, major version:53 was: Remove the exclusion of threeten-extra when orc-core fixes the violation. `threeten-extra` 1.7.1 is depended by `orc-core` 1.8.0, with `package-info.class` violating the bytecode version enforcer rule. 1.7.1 and latest available version 1.7.2 has the similar problem. So exclusion for threeten-extra is added to the enforcer rule. Remove it when orc-core fixed the upstream dependency bytecode version violation. threeten-extra-1.7.1/org/threeten/extra/scale/package-info.class, major version:53 threeten-extra-1.7.1/org/threeten/extra/chrono/package-info.class, major version:53 threeten-extra-1.7.1/org/threeten/extra/package-info.class, major version:53 > Remove the bytecode version exclusion of threeten-extra when orc-core fixes > the violation > - > > Key: SPARK-44032 > URL: https://issues.apache.org/jira/browse/SPARK-44032 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.4.1 >Reporter: Bowen Liang >Priority: Minor > > Remove the exclusion of threeten-extra when orc-core fixes the violation. > > `threeten-extra` 1.7.1 is depended by `orc-core` 1.8.0, with > `package-info.class` violating the bytecode version enforcer rule which > enforcing max bytecode version to 52. 1.7.1 and latest available version > 1.7.2 has the similar problem. So exclusion for threeten-extra is added to > the enforcer rule. Remove it when orc-core fixed the upstream dependency > bytecode version violation. > > threeten-extra-1.7.1/org/threeten/extra/scale/package-info.class, major > version:53 > threeten-extra-1.7.1/org/threeten/extra/chrono/package-info.class, major > version:53 > threeten-extra-1.7.1/org/threeten/extra/package-info.class, major version:53 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44032) Remove the bytecode version exclusion of threeten-extra when orc-core fixes the violation
[ https://issues.apache.org/jira/browse/SPARK-44032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Liang updated SPARK-44032: Description: Remove the exclusion of threeten-extra when orc-core fixes the violation. `threeten-extra` 1.7.1 is depended by `orc-core` 1.8.0, with `package-info.class` violating the bytecode version enforcer rule. 1.7.1 and latest available version 1.7.2 has the similar problem. So exclusion for threeten-extra is added to the enforcer rule. Remove it when orc-core fixed the upstream dependency bytecode version violation. threeten-extra-1.7.1/org/threeten/extra/scale/package-info.class, major version:53 threeten-extra-1.7.1/org/threeten/extra/chrono/package-info.class, major version:53 threeten-extra-1.7.1/org/threeten/extra/package-info.class, major version:53 was: Remove the exclusion of threeten-extra when orc-core fixes the violation. `threeten-extra` 1.7.1 is depended by `orc-core` 1.8.0, with `package-info.class` violating the bytecode version enforcer rule. 1.7.1 and latest available version 1.7.2 has the similar problem. So exclusion for threeten-extra is added to the enforcer rule. Remove it when orc-core fixed the upstream dependency bytecode version violation. {{threeten-extra-1.7.1/org/threeten/extra/scale/package-info.class, major version:53 threeten-extra-1.7.1/org/threeten/extra/chrono/package-info.class, major version:53 threeten-extra-1.7.1/org/threeten/extra/package-info.class, major version:53}} > Remove the bytecode version exclusion of threeten-extra when orc-core fixes > the violation > - > > Key: SPARK-44032 > URL: https://issues.apache.org/jira/browse/SPARK-44032 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.4.1 >Reporter: Bowen Liang >Priority: Minor > > Remove the exclusion of threeten-extra when orc-core fixes the violation. > > `threeten-extra` 1.7.1 is depended by `orc-core` 1.8.0, with > `package-info.class` violating the bytecode version enforcer rule. 1.7.1 and > latest available version 1.7.2 has the similar problem. So exclusion for > threeten-extra is added to the enforcer rule. Remove it when orc-core fixed > the upstream dependency bytecode version violation. > > threeten-extra-1.7.1/org/threeten/extra/scale/package-info.class, major > version:53 > threeten-extra-1.7.1/org/threeten/extra/chrono/package-info.class, major > version:53 > threeten-extra-1.7.1/org/threeten/extra/package-info.class, major version:53 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44032) Remove the bytecode version exclusion of threeten-extra when orc-core fixes the violation
Bowen Liang created SPARK-44032: --- Summary: Remove the bytecode version exclusion of threeten-extra when orc-core fixes the violation Key: SPARK-44032 URL: https://issues.apache.org/jira/browse/SPARK-44032 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 3.4.1 Reporter: Bowen Liang Remove the exclusion of threeten-extra when orc-core fixes the violation. `threeten-extra` 1.7.1 is depended by `orc-core` 1.8.0, with `package-info.class` violating the bytecode version enforcer rule. 1.7.1 and latest available version 1.7.2 has the similar problem. So exclusion for threeten-extra is added to the enforcer rule. Remove it when orc-core fixed the upstream dependency bytecode version violation. {{threeten-extra-1.7.1/org/threeten/extra/scale/package-info.class, major version:53 threeten-extra-1.7.1/org/threeten/extra/chrono/package-info.class, major version:53 threeten-extra-1.7.1/org/threeten/extra/package-info.class, major version:53}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44023) Add System.gc at beforeEach in PruneFileSourcePartitionsSuite
[ https://issues.apache.org/jira/browse/SPARK-44023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44023. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41533 [https://github.com/apache/spark/pull/41533] > Add System.gc at beforeEach in PruneFileSourcePartitionsSuite > - > > Key: SPARK-44023 > URL: https://issues.apache.org/jira/browse/SPARK-44023 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44023) Add System.gc at beforeEach in PruneFileSourcePartitionsSuite
[ https://issues.apache.org/jira/browse/SPARK-44023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44023: Assignee: Yang Jie > Add System.gc at beforeEach in PruneFileSourcePartitionsSuite > - > > Key: SPARK-44023 > URL: https://issues.apache.org/jira/browse/SPARK-44023 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44029) Observation.py supports streaming dataframes
[ https://issues.apache.org/jira/browse/SPARK-44029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Liu resolved SPARK-44029. - Resolution: Not A Problem > Observation.py supports streaming dataframes > > > Key: SPARK-44029 > URL: https://issues.apache.org/jira/browse/SPARK-44029 > Project: Spark > Issue Type: Documentation > Components: Documentation, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Wei Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42298) Assign name to _LEGACY_ERROR_TEMP_2132
[ https://issues.apache.org/jira/browse/SPARK-42298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-42298. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40632 [https://github.com/apache/spark/pull/40632] > Assign name to _LEGACY_ERROR_TEMP_2132 > -- > > Key: SPARK-42298 > URL: https://issues.apache.org/jira/browse/SPARK-42298 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Jia Fan >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42298) Assign name to _LEGACY_ERROR_TEMP_2132
[ https://issues.apache.org/jira/browse/SPARK-42298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-42298: Assignee: Jia Fan > Assign name to _LEGACY_ERROR_TEMP_2132 > -- > > Key: SPARK-42298 > URL: https://issues.apache.org/jira/browse/SPARK-42298 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Jia Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44031) Upgrade silencer to 1.7.13
Dongjoon Hyun created SPARK-44031: - Summary: Upgrade silencer to 1.7.13 Key: SPARK-44031 URL: https://issues.apache.org/jira/browse/SPARK-44031 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44030) Move out json and expression private methods from DataType
Rui Wang created SPARK-44030: Summary: Move out json and expression private methods from DataType Key: SPARK-44030 URL: https://issues.apache.org/jira/browse/SPARK-44030 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Rui Wang Assignee: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-43511) Implemented State APIs for Spark Connect Scala
[ https://issues.apache.org/jira/browse/SPARK-43511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722891#comment-17722891 ] Bo Gao edited comment on SPARK-43511 at 6/12/23 6:59 PM: - Created PR [https://github.com/apache/spark/pull/41558] was (Author: JIRAUSER300429): Created PR https://github.com/apache/spark/pull/40959 > Implemented State APIs for Spark Connect Scala > -- > > Key: SPARK-43511 > URL: https://issues.apache.org/jira/browse/SPARK-43511 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Bo Gao >Priority: Major > > Implemented MapGroupsWithState and FlatMapGroupsWithState APIs for Spark > Connect Scala -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44029) Observation.py supports streaming dataframes
Wei Liu created SPARK-44029: --- Summary: Observation.py supports streaming dataframes Key: SPARK-44029 URL: https://issues.apache.org/jira/browse/SPARK-44029 Project: Spark Issue Type: Documentation Components: Documentation, Structured Streaming Affects Versions: 3.5.0 Reporter: Wei Liu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44028) Upgrade commons-io to 2.13.0
Yang Jie created SPARK-44028: Summary: Upgrade commons-io to 2.13.0 Key: SPARK-44028 URL: https://issues.apache.org/jira/browse/SPARK-44028 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: Yang Jie https://commons.apache.org/proper/commons-io/changes-report.html#a2.13.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44027) create *permanent* Spark View from DataFrame via PySpark & Scala DataFrame API
[ https://issues.apache.org/jira/browse/SPARK-44027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Bode updated SPARK-44027: Description: currently only *_temporary_ Spark Views* can be created from a DataFrame: * [DataFrame.createTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createTempView.html#pyspark.sql.DataFrame.createTempView] * [DataFrame.createOrReplaceTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceTempView.html#pyspark.sql.DataFrame.createOrReplaceTempView] * [DataFrame.createGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createGlobalTempView.html#pyspark.sql.DataFrame.createGlobalTempView] * [DataFrame.createOrReplaceGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceGlobalTempView.html#pyspark.sql.DataFrame.createOrReplaceGlobalTempView] When a user needs a _*permanent*_ *Spark View* he has to fall back to Spark SQL ({{{}CREATE VIEW AS SELECT...{}}}). Sometimes it is easier and more readable to specify the desired logic of the view through {_}Scala/PySpark DataFrame API{_}. Therefore, I'd like to suggest to implement a new PySpark method that allows creating a _*permanent*_ *Spark View* from a DataFrame (e.g. {{{}DataFrame.createOrReplaceView{}}}). see also: * [https://community.databricks.com/s/question/0D53f1PANVgCAP/is-there-a-way-to-create-a-nontemporary-spark-view-with-pyspark] * [https://lists.apache.org/thread/jzkznvt7cfjhmo77w1tlksxkwyvmvvfb] was: currently only *_temporary_ Spark Views* can be created from a DataFrame: * [DataFrame.createGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createGlobalTempView.html#pyspark.sql.DataFrame.createGlobalTempView] * [DataFrame.createOrReplaceGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceGlobalTempView.html#pyspark.sql.DataFrame.createOrReplaceGlobalTempView] * [DataFrame.createOrReplaceTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceTempView.html#pyspark.sql.DataFrame.createOrReplaceTempView] * [DataFrame.createTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createTempView.html#pyspark.sql.DataFrame.createTempView] When a user needs a _*permanent*_ *Spark View* he has to fall back to Spark SQL ({{{}CREATE VIEW AS SELECT...{}}}). Sometimes it is easier and more readable to specify the desired logic of the view through {_}Scala/PySpark DataFrame API{_}. Therefore, I'd like to suggest to implement a new PySpark method that allows creating a _*permanent*_ *Spark View* from a DataFrame (e.g. {{{}DataFrame.createOrReplaceView{}}}). see also: * [https://community.databricks.com/s/question/0D53f1PANVgCAP/is-there-a-way-to-create-a-nontemporary-spark-view-with-pyspark] * https://lists.apache.org/thread/jzkznvt7cfjhmo77w1tlksxkwyvmvvfb > create *permanent* Spark View from DataFrame via PySpark & Scala DataFrame API > -- > > Key: SPARK-44027 > URL: https://issues.apache.org/jira/browse/SPARK-44027 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Martin Bode >Priority: Major > Labels: features, newbie > > currently only *_temporary_ Spark Views* can be created from a DataFrame: > * > [DataFrame.createTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createTempView.html#pyspark.sql.DataFrame.createTempView] > * > [DataFrame.createOrReplaceTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceTempView.html#pyspark.sql.DataFrame.createOrReplaceTempView] > * > [DataFrame.createGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createGlobalTempView.html#pyspark.sql.DataFrame.createGlobalTempView] > * > [DataFrame.createOrReplaceGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceGlobalTempView.html#pyspark.sql.DataFrame.createOrReplaceGlobalTempView] > When a user needs a _*permanent*_ *Spark View* he has to fall back to Spark > SQL ({{{}CREATE VIEW AS SELECT...{}}}). > Sometimes it is easier and more readable to specify the desired logic of the > view through {_}Scala/PySpark DataFrame API{_}. > Therefore, I'd like to
[jira] [Updated] (SPARK-44027) create *permanent* Spark View from DataFrame via PySpark & Scala DataFrame API
[ https://issues.apache.org/jira/browse/SPARK-44027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Bode updated SPARK-44027: Affects Version/s: 3.5.0 (was: 3.4.0) > create *permanent* Spark View from DataFrame via PySpark & Scala DataFrame API > -- > > Key: SPARK-44027 > URL: https://issues.apache.org/jira/browse/SPARK-44027 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Martin Bode >Priority: Major > Labels: features, newbie > > currently only *_temporary_ Spark Views* can be created from a DataFrame: > * > [DataFrame.createGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createGlobalTempView.html#pyspark.sql.DataFrame.createGlobalTempView] > * > [DataFrame.createOrReplaceGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceGlobalTempView.html#pyspark.sql.DataFrame.createOrReplaceGlobalTempView] > * > [DataFrame.createOrReplaceTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceTempView.html#pyspark.sql.DataFrame.createOrReplaceTempView] > * > [DataFrame.createTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createTempView.html#pyspark.sql.DataFrame.createTempView] > When a user needs a _*permanent*_ *Spark View* he has to fall back to Spark > SQL ({{{}CREATE VIEW AS SELECT...{}}}). > Sometimes it is easier and more readable to specify the desired logic of the > view through {_}Scala/PySpark DataFrame API{_}. > Therefore, I'd like to suggest to implement a new PySpark method that allows > creating a _*permanent*_ *Spark View* from a DataFrame (e.g. > {{{}DataFrame.createOrReplaceView{}}}). > see also: > * > [https://community.databricks.com/s/question/0D53f1PANVgCAP/is-there-a-way-to-create-a-nontemporary-spark-view-with-pyspark] > * https://lists.apache.org/thread/jzkznvt7cfjhmo77w1tlksxkwyvmvvfb -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44027) create *permanent* Spark View from DataFrame via PySpark & Scala DataFrame API
[ https://issues.apache.org/jira/browse/SPARK-44027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Bode updated SPARK-44027: Description: currently only *_temporary_ Spark Views* can be created from a DataFrame: * [DataFrame.createGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createGlobalTempView.html#pyspark.sql.DataFrame.createGlobalTempView] * [DataFrame.createOrReplaceGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceGlobalTempView.html#pyspark.sql.DataFrame.createOrReplaceGlobalTempView] * [DataFrame.createOrReplaceTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceTempView.html#pyspark.sql.DataFrame.createOrReplaceTempView] * [DataFrame.createTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createTempView.html#pyspark.sql.DataFrame.createTempView] When a user needs a _*permanent*_ *Spark View* he has to fall back to Spark SQL ({{{}CREATE VIEW AS SELECT...{}}}). Sometimes it is easier and more readable to specify the desired logic of the view through {_}Scala/PySpark DataFrame API{_}. Therefore, I'd like to suggest to implement a new PySpark method that allows creating a _*permanent*_ *Spark View* from a DataFrame (e.g. {{{}DataFrame.createOrReplaceView{}}}). see also: * [https://community.databricks.com/s/question/0D53f1PANVgCAP/is-there-a-way-to-create-a-nontemporary-spark-view-with-pyspark] * https://lists.apache.org/thread/jzkznvt7cfjhmo77w1tlksxkwyvmvvfb was: currently only *_temporary_ Spark Views* can be created from a DataFrame: * [DataFrame.createGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createGlobalTempView.html#pyspark.sql.DataFrame.createGlobalTempView] * [DataFrame.createOrReplaceGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceGlobalTempView.html#pyspark.sql.DataFrame.createOrReplaceGlobalTempView] * [DataFrame.createOrReplaceTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceTempView.html#pyspark.sql.DataFrame.createOrReplaceTempView] * [DataFrame.createTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createTempView.html#pyspark.sql.DataFrame.createTempView] When a user needs a _*permanent*_ *Spark View* he has to fall back to Spark SQL ({{{}CREATE VIEW AS SELECT...{}}}). Sometimes it is easier and more readable to specify the desired logic of the view through {_}Scala/PySpark DataFrame API{_}. Therefore, I'd like to suggest to implement a new PySpark method that allows creating a _*permanent*_ *Spark View* from a DataFrame (e.g. {{{}DataFrame.createOrReplaceView{}}}). see also: [https://community.databricks.com/s/question/0D53f1PANVgCAP/is-there-a-way-to-create-a-nontemporary-spark-view-with-pyspark] > create *permanent* Spark View from DataFrame via PySpark & Scala DataFrame API > -- > > Key: SPARK-44027 > URL: https://issues.apache.org/jira/browse/SPARK-44027 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Martin Bode >Priority: Major > Labels: features, newbie > > currently only *_temporary_ Spark Views* can be created from a DataFrame: > * > [DataFrame.createGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createGlobalTempView.html#pyspark.sql.DataFrame.createGlobalTempView] > * > [DataFrame.createOrReplaceGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceGlobalTempView.html#pyspark.sql.DataFrame.createOrReplaceGlobalTempView] > * > [DataFrame.createOrReplaceTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceTempView.html#pyspark.sql.DataFrame.createOrReplaceTempView] > * > [DataFrame.createTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createTempView.html#pyspark.sql.DataFrame.createTempView] > When a user needs a _*permanent*_ *Spark View* he has to fall back to Spark > SQL ({{{}CREATE VIEW AS SELECT...{}}}). > Sometimes it is easier and more readable to specify the desired logic of the > view through {_}Scala/PySpark DataFrame API{_}. > Therefore, I'd like to suggest to implement a new PySpark method that allows > creating a
[jira] [Created] (SPARK-44027) create *permanent* Spark View from DataFrame via PySpark & Scala DataFrame API
Martin Bode created SPARK-44027: --- Summary: create *permanent* Spark View from DataFrame via PySpark & Scala DataFrame API Key: SPARK-44027 URL: https://issues.apache.org/jira/browse/SPARK-44027 Project: Spark Issue Type: New Feature Components: PySpark Affects Versions: 3.4.0 Reporter: Martin Bode currently only *_temporary_ Spark Views* can be created from a DataFrame: * [DataFrame.createGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createGlobalTempView.html#pyspark.sql.DataFrame.createGlobalTempView] * [DataFrame.createOrReplaceGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceGlobalTempView.html#pyspark.sql.DataFrame.createOrReplaceGlobalTempView] * [DataFrame.createOrReplaceTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceTempView.html#pyspark.sql.DataFrame.createOrReplaceTempView] * [DataFrame.createTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createTempView.html#pyspark.sql.DataFrame.createTempView] When a user needs a _*permanent*_ *Spark View* he has to fall back to Spark SQL ({{{}CREATE VIEW AS SELECT...{}}}). Sometimes it is easier and more readable to specify the desired logic of the view through {_}Scala/PySpark DataFrame API{_}. Therefore, I'd like to suggest to implement a new PySpark method that allows creating a _*permanent*_ *Spark View* from a DataFrame (e.g. {{{}DataFrame.createOrReplaceView{}}}). see also: [https://community.databricks.com/s/question/0D53f1PANVgCAP/is-there-a-way-to-create-a-nontemporary-spark-view-with-pyspark] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44018) Improve the hashCode for Some DS V2 Expression
[ https://issues.apache.org/jira/browse/SPARK-44018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17731695#comment-17731695 ] Dongjoon Hyun commented on SPARK-44018: --- Hi, [~beliefer]. Thank you for reporting. BTW, it seems that the `Affected Version` number is 3.5.0. Could you update it? > Improve the hashCode for Some DS V2 Expression > -- > > Key: SPARK-44018 > URL: https://issues.apache.org/jira/browse/SPARK-44018 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > > The hashCode() of UserDefinedScalarFunc and GeneralScalarExpression is not > good enough. UserDefinedAggregateFunc and GeneralAggregateFunc missing > hashCode() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43744) Spark Connect scala UDF serialization pulling in unrelated classes not available on server
[ https://issues.apache.org/jira/browse/SPARK-43744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-43744: -- Summary: Spark Connect scala UDF serialization pulling in unrelated classes not available on server (was: Maven test failure when "interrupt all" tests are moved to ClientE2ETestSuite) > Spark Connect scala UDF serialization pulling in unrelated classes not > available on server > -- > > Key: SPARK-43744 > URL: https://issues.apache.org/jira/browse/SPARK-43744 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > Labels: SPARK-43745 > > [https://github.com/apache/spark/pull/41487] moved "interrupt all - > background queries, foreground interrupt" and "interrupt all - foreground > queries, background interrupt" tests from ClientE2ETestSuite into a new > isolated suite SparkSessionE2ESuite to avoid an unexplicable UDF > serialization issue. > > When these tests are moved back to ClientE2ETestSuite and when testing with > {code:java} > build/mvn clean install -DskipTests -Phive > build/mvn test -pl connector/connect/client/jvm -Dtest=none > -DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite{code} > > the tests fails with > {code:java} > 23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . > SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf. > java.lang.NoClassDefFoundError: > org/apache/spark/sql/connect/client/SparkResult > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.getDeclaredMethod(Class.java:2128) > at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643) > at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79) > at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520) > at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:494) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681) > at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852) > at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) > at org.apache.spark.util.Utils$.deserialize(Utils.scala:148) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at
[jira] [Updated] (SPARK-43744) Maven test failure when "interrupt all" tests are moved to ClientE2ETestSuite
[ https://issues.apache.org/jira/browse/SPARK-43744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-43744: -- Description: [https://github.com/apache/spark/pull/41487] moved "interrupt all - background queries, foreground interrupt" and "interrupt all - foreground queries, background interrupt" tests from ClientE2ETestSuite into a new isolated suite SparkSessionE2ESuite to avoid an unexplicable UDF serialization issue. When these tests are moved back to ClientE2ETestSuite and when testing with {code:java} build/mvn clean install -DskipTests -Phive build/mvn test -pl connector/connect/client/jvm -Dtest=none -DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite{code} the tests fails with {code:java} 23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf. java.lang.NoClassDefFoundError: org/apache/spark/sql/connect/client/SparkResult at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.getDeclaredMethod(Class.java:2128) at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643) at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79) at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520) at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.(ObjectStreamClass.java:494) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852) at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) at org.apache.spark.util.Utils$.deserialize(Utils.scala:148) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353) at org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:825) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$1(SparkConnectStreamHandler.scala:53) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:209) at org.apache.spark.sql.connect.artifact.SparkConnectArtifactManager$.withArtifactClassLoader(SparkConnectArtifactManager.scala:178) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:48) at org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:166) at org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:611) at
[jira] [Reopened] (SPARK-43744) Maven test failure of ClientE2ETestSuite "interrupt all" tests
[ https://issues.apache.org/jira/browse/SPARK-43744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski reopened SPARK-43744: --- [https://github.com/apache/spark/pull/41487] has swept the issue under the rug by moving the tests around to a different suite. Reopening this to get to the bottom of the issue, as I believe it remains a real issue that is not covered. > Maven test failure of ClientE2ETestSuite "interrupt all" tests > -- > > Key: SPARK-43744 > URL: https://issues.apache.org/jira/browse/SPARK-43744 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > Labels: SPARK-43745 > > When testing with > {code:java} > build/mvn clean install -DskipTests -Phive > build/mvn test -pl connector/connect/client/jvm -Dtest=none > -DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite{code} > > the tests fails with > {code:java} > 23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . > SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf. > java.lang.NoClassDefFoundError: > org/apache/spark/sql/connect/client/SparkResult > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.getDeclaredMethod(Class.java:2128) > at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643) > at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79) > at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520) > at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:494) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681) > at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852) > at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) > at org.apache.spark.util.Utils$.deserialize(Utils.scala:148) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:825) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$1(SparkConnectStreamHandler.scala:53) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:209) > at >
[jira] [Updated] (SPARK-43744) Maven test failure when "interrupt all" tests are moved to ClientE2ETestSuite
[ https://issues.apache.org/jira/browse/SPARK-43744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-43744: -- Summary: Maven test failure when "interrupt all" tests are moved to ClientE2ETestSuite (was: Maven test failure of ClientE2ETestSuite "interrupt all" tests) > Maven test failure when "interrupt all" tests are moved to ClientE2ETestSuite > - > > Key: SPARK-43744 > URL: https://issues.apache.org/jira/browse/SPARK-43744 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > Labels: SPARK-43745 > > When testing with > {code:java} > build/mvn clean install -DskipTests -Phive > build/mvn test -pl connector/connect/client/jvm -Dtest=none > -DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite{code} > > the tests fails with > {code:java} > 23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . > SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf. > java.lang.NoClassDefFoundError: > org/apache/spark/sql/connect/client/SparkResult > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.getDeclaredMethod(Class.java:2128) > at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643) > at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79) > at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520) > at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:494) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681) > at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852) > at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) > at org.apache.spark.util.Utils$.deserialize(Utils.scala:148) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:825) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$1(SparkConnectStreamHandler.scala:53) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:209) > at >
[jira] [Created] (SPARK-44026) Update helper methods to create SQLMetric with initial value
Johan Lasperas created SPARK-44026: -- Summary: Update helper methods to create SQLMetric with initial value Key: SPARK-44026 URL: https://issues.apache.org/jira/browse/SPARK-44026 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: Johan Lasperas The helper methods in [SQLMetric.scala|https://github.com/apache/spark/blob/7107742a381cde2e6de9425e3e436282a8c0d27c/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala#L38] all use a fixed default value `-1`. Callers may want the metric to start with a different initial value. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43781) IllegalStateException when cogrouping two datasets derived from the same source
[ https://issues.apache.org/jira/browse/SPARK-43781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jia Fan updated SPARK-43781: Affects Version/s: 3.4.0 > IllegalStateException when cogrouping two datasets derived from the same > source > --- > > Key: SPARK-43781 > URL: https://issues.apache.org/jira/browse/SPARK-43781 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1, 3.4.0 > Environment: Reproduces in a unit test, using Spark 3.3.1, the Java > API, and a {{local[2]}} SparkSession. >Reporter: Derek Murray >Priority: Major > > Attempting to {{cogroup}} two datasets derived from the same source dataset > yields an {{IllegalStateException}} when the query is executed. > Minimal reproducer: > {code:java} > StructType inputType = DataTypes.createStructType( > new StructField[]{ > DataTypes.createStructField("id", DataTypes.LongType, false), > DataTypes.createStructField("type", DataTypes.StringType, false) > } > ); > StructType keyType = DataTypes.createStructType( > new StructField[]{ > DataTypes.createStructField("id", DataTypes.LongType, false) > } > ); > List inputRows = new ArrayList<>(); > inputRows.add(RowFactory.create(1L, "foo")); > inputRows.add(RowFactory.create(1L, "bar")); > inputRows.add(RowFactory.create(2L, "foo")); > Dataset input = sparkSession.createDataFrame(inputRows, inputType); > KeyValueGroupedDataset fooGroups = input > .filter("type = 'foo'") > .groupBy("id") > .as(RowEncoder.apply(keyType), RowEncoder.apply(inputType)); > KeyValueGroupedDataset barGroups = input > .filter("type = 'bar'") > .groupBy("id") > .as(RowEncoder.apply(keyType), RowEncoder.apply(inputType)); > Dataset result = fooGroups.cogroup( > barGroups, > (CoGroupFunction) (row, iterator, iterator1) -> new > ArrayList().iterator(), > RowEncoder.apply(inputType)); > result.explain(); > result.show();{code} > Explain output (note mismatch in column IDs between Sort/Exchagne and > LocalTableScan on the first input to the CoGroup): > {code:java} > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- SerializeFromObject > [validateexternaltype(getexternalrowfield(assertnotnull(input[0, > org.apache.spark.sql.Row, true]), 0, id), LongType, false) AS id#37L, > staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, > fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, > org.apache.spark.sql.Row, true]), 1, type), StringType, false), true, false, > true) AS type#38] > +- CoGroup > org.apache.spark.sql.KeyValueGroupedDataset$$Lambda$1478/1869116781@77856cc5, > createexternalrow(id#16L, StructField(id,LongType,false)), > createexternalrow(id#16L, type#17.toString, StructField(id,LongType,false), > StructField(type,StringType,false)), createexternalrow(id#16L, > type#17.toString, StructField(id,LongType,false), > StructField(type,StringType,false)), [id#39L], [id#39L], [id#39L, type#40], > [id#39L, type#40], obj#36: org.apache.spark.sql.Row > :- !Sort [id#39L ASC NULLS FIRST], false, 0 > : +- !Exchange hashpartitioning(id#39L, 2), ENSURE_REQUIREMENTS, > [plan_id=19] > : +- LocalTableScan [id#16L, type#17] > +- Sort [id#39L ASC NULLS FIRST], false, 0 > +- Exchange hashpartitioning(id#39L, 2), ENSURE_REQUIREMENTS, > [plan_id=20] > +- LocalTableScan [id#39L, type#40]{code} > Exception: > {code:java} > java.lang.IllegalStateException: Couldn't find id#39L in [id#16L,type#17] > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589) > at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:75) > at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:35) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:698) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589) > at >
[jira] [Updated] (SPARK-43915) Assign names to the error class _LEGACY_ERROR_TEMP_[2438-2445]
[ https://issues.apache.org/jira/browse/SPARK-43915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-43915: --- Summary: Assign names to the error class _LEGACY_ERROR_TEMP_[2438-2445] (was: Assign a name to the error class _LEGACY_ERROR_TEMP_2428) > Assign names to the error class _LEGACY_ERROR_TEMP_[2438-2445] > -- > > Key: SPARK-43915 > URL: https://issues.apache.org/jira/browse/SPARK-43915 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17731503#comment-17731503 ] Enrico Minack commented on SPARK-38200: --- Related: SPARK-19335 > [SQL] Spark JDBC Savemode Supports Upsert > - > > Key: SPARK-38200 > URL: https://issues.apache.org/jira/browse/SPARK-38200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > upsert sql for different databases, Most databases support merge sql: > sqlserver merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] > mysql: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] > oracle merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] > postgres: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] > postgres merg into sql : > [https://www.postgresql.org/docs/current/sql-merge.html] > db2 merge into sql : > [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] > derby merge into sql: > [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] > he merg into sql : > [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] > > [~yao] > > https://github.com/melin/datatunnel/tree/master/plugins/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/dialect > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17731502#comment-17731502 ] Enrico Minack commented on SPARK-38200: --- Sadly, MERGE is shown to perform worse than UPDATE+INSERT in some databases: http://www.dba-oracle.com/t_merge_upsert_performance.htm https://michalmolka.medium.com/sql-server-merge-vs-upsert-877702d23674 > [SQL] Spark JDBC Savemode Supports Upsert > - > > Key: SPARK-38200 > URL: https://issues.apache.org/jira/browse/SPARK-38200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > upsert sql for different databases, Most databases support merge sql: > sqlserver merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] > mysql: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] > oracle merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] > postgres: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] > postgres merg into sql : > [https://www.postgresql.org/docs/current/sql-merge.html] > db2 merge into sql : > [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] > derby merge into sql: > [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] > he merg into sql : > [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] > > [~yao] > > https://github.com/melin/datatunnel/tree/master/plugins/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/dialect > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43398) Executor timeout should be max of idleTimeout rddTimeout shuffleTimeout
[ https://issues.apache.org/jira/browse/SPARK-43398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-43398. --- Fix Version/s: 3.3.3 3.5.0 3.4.1 Resolution: Fixed Issue resolved by pull request 41082 [https://github.com/apache/spark/pull/41082] > Executor timeout should be max of idleTimeout rddTimeout shuffleTimeout > --- > > Key: SPARK-43398 > URL: https://issues.apache.org/jira/browse/SPARK-43398 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.1.3, 3.2.4, 3.3.2, 3.4.0 >Reporter: Zhongwei Zhu >Assignee: Zhongwei Zhu >Priority: Major > Fix For: 3.3.3, 3.5.0, 3.4.1 > > > When dynamic allocation enabled, Executor timeout should be max of > idleTimeout, rddTimeout and shuffleTimeout. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43398) Executor timeout should be max of idleTimeout rddTimeout shuffleTimeout
[ https://issues.apache.org/jira/browse/SPARK-43398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-43398: - Assignee: Zhongwei Zhu > Executor timeout should be max of idleTimeout rddTimeout shuffleTimeout > --- > > Key: SPARK-43398 > URL: https://issues.apache.org/jira/browse/SPARK-43398 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.1.3, 3.2.4, 3.3.2, 3.4.0 >Reporter: Zhongwei Zhu >Assignee: Zhongwei Zhu >Priority: Major > > When dynamic allocation enabled, Executor timeout should be max of > idleTimeout, rddTimeout and shuffleTimeout. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43617) Enable pyspark.pandas.spark.functions.product in Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-43617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43617. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41547 [https://github.com/apache/spark/pull/41547] > Enable pyspark.pandas.spark.functions.product in Spark Connect. > --- > > Key: SPARK-43617 > URL: https://issues.apache.org/jira/browse/SPARK-43617 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > > Enable pyspark.pandas.spark.functions.product in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org