[jira] [Updated] (SPARK-45274) Implementation of a new DAG drawing approach to avoid fork
[ https://issues.apache.org/jira/browse/SPARK-45274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45274: --- Labels: pull-request-available (was: ) > Implementation of a new DAG drawing approach to avoid fork > --- > > Key: SPARK-45274 > URL: https://issues.apache.org/jira/browse/SPARK-45274 > Project: Spark > Issue Type: Improvement > Components: UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45274) Implementation of a new DAG drawing approach to avoid fork
Kent Yao created SPARK-45274: Summary: Implementation of a new DAG drawing approach to avoid fork Key: SPARK-45274 URL: https://issues.apache.org/jira/browse/SPARK-45274 Project: Spark Issue Type: Improvement Components: UI Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45270) Upgrade `Volcano` to 1.8.0
[ https://issues.apache.org/jira/browse/SPARK-45270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45270: - Assignee: Dongjoon Hyun > Upgrade `Volcano` to 1.8.0 > -- > > Key: SPARK-45270 > URL: https://issues.apache.org/jira/browse/SPARK-45270 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45270) Upgrade `Volcano` to 1.8.0
[ https://issues.apache.org/jira/browse/SPARK-45270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45270. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43050 [https://github.com/apache/spark/pull/43050] > Upgrade `Volcano` to 1.8.0 > -- > > Key: SPARK-45270 > URL: https://issues.apache.org/jira/browse/SPARK-45270 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45269) Use Java 21-jre in K8s Dockerfile
[ https://issues.apache.org/jira/browse/SPARK-45269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45269. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43048 [https://github.com/apache/spark/pull/43048] > Use Java 21-jre in K8s Dockerfile > - > > Key: SPARK-45269 > URL: https://issues.apache.org/jira/browse/SPARK-45269 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45269) Use Java 21-jre in K8s Dockerfile
[ https://issues.apache.org/jira/browse/SPARK-45269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45269: - Assignee: Dongjoon Hyun > Use Java 21-jre in K8s Dockerfile > - > > Key: SPARK-45269 > URL: https://issues.apache.org/jira/browse/SPARK-45269 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45273) Http header Attack【HttpSecurityFilter】
chenyu created SPARK-45273: -- Summary: Http header Attack【HttpSecurityFilter】 Key: SPARK-45273 URL: https://issues.apache.org/jira/browse/SPARK-45273 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.5.0 Reporter: chenyu There is an HTTP host header attack vulnerability in the target URL -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43655) Enable NamespaceParityTests.test_get_index_map
[ https://issues.apache.org/jira/browse/SPARK-43655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43655: --- Labels: pull-request-available (was: ) > Enable NamespaceParityTests.test_get_index_map > -- > > Key: SPARK-43655 > URL: https://issues.apache.org/jira/browse/SPARK-43655 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Enable NamespaceParityTests.test_get_index_map -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43877) Fix behavior difference for compare binary functions.
[ https://issues.apache.org/jira/browse/SPARK-43877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-43877: Epic Link: SPARK-39375 > Fix behavior difference for compare binary functions. > - > > Key: SPARK-43877 > URL: https://issues.apache.org/jira/browse/SPARK-43877 > Project: Spark > Issue Type: Improvement > Components: Pandas API on Spark, PySpark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > In [https://github.com/apache/spark/pull/41362,] we add `result = > result.fillna(False)` for filling the gap between pandas <> pandas API on > Spark, but it should be internally fixed from Spark Connect side. Please > refer to the reproducible code below: > > {code:java} > import pandas as pd > import pyspark.pandas as ps > from pyspark.sql.utils import pyspark_column_op > pser = pd.Series([None, None, None]) > psser = ps.from_pandas(pser) > pyspark_column_op("__ge__")(psser, psser) > # Wrong result: > # 0 None > # 1 None > # 2 None > # dtype: object > # Expected result: > pser > pser > # 0 False > # 1 False > # 2 False > dtype: bool{code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43877) Fix behavior difference for compare binary functions.
[ https://issues.apache.org/jira/browse/SPARK-43877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-43877: Parent: (was: SPARK-42497) Issue Type: Improvement (was: Sub-task) > Fix behavior difference for compare binary functions. > - > > Key: SPARK-43877 > URL: https://issues.apache.org/jira/browse/SPARK-43877 > Project: Spark > Issue Type: Improvement > Components: Pandas API on Spark, PySpark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > In [https://github.com/apache/spark/pull/41362,] we add `result = > result.fillna(False)` for filling the gap between pandas <> pandas API on > Spark, but it should be internally fixed from Spark Connect side. Please > refer to the reproducible code below: > > {code:java} > import pandas as pd > import pyspark.pandas as ps > from pyspark.sql.utils import pyspark_column_op > pser = pd.Series([None, None, None]) > psser = ps.from_pandas(pser) > pyspark_column_op("__ge__")(psser, psser) > # Wrong result: > # 0 None > # 1 None > # 2 None > # dtype: object > # Expected result: > pser > pser > # 0 False > # 1 False > # 2 False > dtype: bool{code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45209) Flame Graph Support For Executor Thread Dump Page
[ https://issues.apache.org/jira/browse/SPARK-45209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45209: --- Labels: pull-request-available (was: ) > Flame Graph Support For Executor Thread Dump Page > - > > Key: SPARK-45209 > URL: https://issues.apache.org/jira/browse/SPARK-45209 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-45227) Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an executor process randomly gets stuck
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767843#comment-17767843 ] Bo Xiong edited comment on SPARK-45227 at 9/22/23 6:12 AM: --- I've submitted [a fix|https://github.com/apache/spark/pull/43021]. Please help get it merged. If possible, please also help patch v3.3.1 and above. Thanks! was (Author: JIRAUSER302302): I've submitted a fix. Please help get it merged. If possible, please also help patch v3.3.1 and above. Thanks! > Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an > executor process randomly gets stuck > > > Key: SPARK-45227 > URL: https://issues.apache.org/jira/browse/SPARK-45227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.1, 3.5.0, 4.0.0 >Reporter: Bo Xiong >Priority: Critical > Labels: hang, infinite-loop, pull-request-available, > race-condition, stuck, threadsafe > Attachments: hashtable1.png, hashtable2.png > > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very > last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking > at Spark UI, we saw that an executor process hung over 1 hour. After we > manually killed the executor process, the app succeeded. Note that the same > EMR cluster with two worker nodes was able to run the same app without any > issue before and after the incident. > h2. Observations > Below is what's observed from relevant container logs and thread dump. > * A regular task that's sent to the executor, which also reported back to > the driver upon the task completion. > {quote}$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID > 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID > 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) > $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 > $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) > 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). > 4495 bytes result sent to driver}} > {quote} * Another task that's sent to the executor but didn't get launched > since the single-threaded dispatcher was stuck (presumably in an "infinite > loop" as explained later). > {quote}$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID > 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 > $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz > >> note that the above command has no matching result, indicating that task > >> 153.0 in stage 23.0 (TID 924) was never launched}} > {quote}* Thread dump shows that the dispatcher-Executor thread has the > following stack trace. > {quote}"dispatcher-Executor" #40 daemon prio=5 os_prio=0 > tid=0x98e37800 nid=0x1aff runnable [0x73bba000] > java.lang.Thread.State: RUNNABLE > at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) > at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) > at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) > at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) > at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) > at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) > at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) > at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) > at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) > at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.put(HashMap.scala:126) > at scala.collection.mutable.HashMap.update(HashMap.scala:131) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) > at > org.apache.spar
[jira] [Comment Edited] (SPARK-45227) Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an executor process randomly gets stuck
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767843#comment-17767843 ] Bo Xiong edited comment on SPARK-45227 at 9/22/23 6:11 AM: --- I've submitted a fix. Please help get it merged. If possible, please also help patch v3.3.1 and above. Thanks! was (Author: JIRAUSER302302): I've submitted a fix. Please help get it merged. If possible, please also help patch v3.3.1 and above. Thanks, Bo > Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an > executor process randomly gets stuck > > > Key: SPARK-45227 > URL: https://issues.apache.org/jira/browse/SPARK-45227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.1, 3.5.0, 4.0.0 >Reporter: Bo Xiong >Priority: Critical > Labels: hang, infinite-loop, pull-request-available, > race-condition, stuck, threadsafe > Attachments: hashtable1.png, hashtable2.png > > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very > last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking > at Spark UI, we saw that an executor process hung over 1 hour. After we > manually killed the executor process, the app succeeded. Note that the same > EMR cluster with two worker nodes was able to run the same app without any > issue before and after the incident. > h2. Observations > Below is what's observed from relevant container logs and thread dump. > * A regular task that's sent to the executor, which also reported back to > the driver upon the task completion. > {quote}$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID > 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID > 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) > $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 > $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) > 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). > 4495 bytes result sent to driver}} > {quote} * Another task that's sent to the executor but didn't get launched > since the single-threaded dispatcher was stuck (presumably in an "infinite > loop" as explained later). > {quote}$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID > 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 > $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz > >> note that the above command has no matching result, indicating that task > >> 153.0 in stage 23.0 (TID 924) was never launched}} > {quote}* Thread dump shows that the dispatcher-Executor thread has the > following stack trace. > {quote}"dispatcher-Executor" #40 daemon prio=5 os_prio=0 > tid=0x98e37800 nid=0x1aff runnable [0x73bba000] > java.lang.Thread.State: RUNNABLE > at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) > at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) > at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) > at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) > at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) > at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) > at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) > at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) > at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) > at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.put(HashMap.scala:126) > at scala.collection.mutable.HashMap.update(HashMap.scala:131) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) > at > org.apache.spark.rpc.netty.Inbox$$Lambda$323/193082670
[jira] [Commented] (SPARK-45227) Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an executor process randomly gets stuck
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767843#comment-17767843 ] Bo Xiong commented on SPARK-45227: -- I've submitted a fix. Please help get it merged. If possible, please also help patch v3.3.1 and above. Thanks, Bo > Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an > executor process randomly gets stuck > > > Key: SPARK-45227 > URL: https://issues.apache.org/jira/browse/SPARK-45227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.1, 3.5.0, 4.0.0 >Reporter: Bo Xiong >Priority: Critical > Labels: hang, infinite-loop, pull-request-available, > race-condition, stuck, threadsafe > Attachments: hashtable1.png, hashtable2.png > > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very > last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking > at Spark UI, we saw that an executor process hung over 1 hour. After we > manually killed the executor process, the app succeeded. Note that the same > EMR cluster with two worker nodes was able to run the same app without any > issue before and after the incident. > h2. Observations > Below is what's observed from relevant container logs and thread dump. > * A regular task that's sent to the executor, which also reported back to > the driver upon the task completion. > {quote}$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID > 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID > 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) > $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 > $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) > 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). > 4495 bytes result sent to driver}} > {quote} * Another task that's sent to the executor but didn't get launched > since the single-threaded dispatcher was stuck (presumably in an "infinite > loop" as explained later). > {quote}$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID > 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 > $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz > >> note that the above command has no matching result, indicating that task > >> 153.0 in stage 23.0 (TID 924) was never launched}} > {quote}* Thread dump shows that the dispatcher-Executor thread has the > following stack trace. > {quote}"dispatcher-Executor" #40 daemon prio=5 os_prio=0 > tid=0x98e37800 nid=0x1aff runnable [0x73bba000] > java.lang.Thread.State: RUNNABLE > at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) > at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) > at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) > at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) > at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) > at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) > at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) > at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) > at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) > at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.put(HashMap.scala:126) > at scala.collection.mutable.HashMap.update(HashMap.scala:131) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) > at > org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown > Source) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.net
[jira] [Resolved] (SPARK-43623) Enable DefaultIndexParityTests.test_index_distributed_sequence_cleanup.
[ https://issues.apache.org/jira/browse/SPARK-43623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee resolved SPARK-43623. - Resolution: Duplicate > Enable DefaultIndexParityTests.test_index_distributed_sequence_cleanup. > --- > > Key: SPARK-43623 > URL: https://issues.apache.org/jira/browse/SPARK-43623 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Enable DefaultIndexParityTests.test_index_distributed_sequence_cleanup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45227) Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an executor process randomly gets stuck
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-45227: - Summary: Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an executor process randomly gets stuck (was: Fix an issue where an executor process randomly gets stuck, by making CoarseGrainedExecutorBackend.taskResources thread-safe) > Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an > executor process randomly gets stuck > > > Key: SPARK-45227 > URL: https://issues.apache.org/jira/browse/SPARK-45227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.1, 3.5.0, 4.0.0 >Reporter: Bo Xiong >Priority: Critical > Labels: hang, infinite-loop, pull-request-available, > race-condition, stuck, threadsafe > Attachments: hashtable1.png, hashtable2.png > > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very > last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking > at Spark UI, we saw that an executor process hung over 1 hour. After we > manually killed the executor process, the app succeeded. Note that the same > EMR cluster with two worker nodes was able to run the same app without any > issue before and after the incident. > h2. Observations > Below is what's observed from relevant container logs and thread dump. > * A regular task that's sent to the executor, which also reported back to > the driver upon the task completion. > {quote}$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID > 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID > 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) > $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 > $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) > 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). > 4495 bytes result sent to driver}} > {quote} * Another task that's sent to the executor but didn't get launched > since the single-threaded dispatcher was stuck (presumably in an "infinite > loop" as explained later). > {quote}$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID > 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 > $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz > >> note that the above command has no matching result, indicating that task > >> 153.0 in stage 23.0 (TID 924) was never launched}} > {quote}* Thread dump shows that the dispatcher-Executor thread has the > following stack trace. > {quote}"dispatcher-Executor" #40 daemon prio=5 os_prio=0 > tid=0x98e37800 nid=0x1aff runnable [0x73bba000] > java.lang.Thread.State: RUNNABLE > at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) > at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) > at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) > at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) > at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) > at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) > at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) > at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) > at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) > at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.put(HashMap.scala:126) > at scala.collection.mutable.HashMap.update(HashMap.scala:131) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) > at > org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown > Source) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) > at org.apa
[jira] [Updated] (SPARK-45272) Remove Scala version specific comments, and scala-2.13 profile usage
[ https://issues.apache.org/jira/browse/SPARK-45272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45272: --- Labels: pull-request-available (was: ) > Remove Scala version specific comments, and scala-2.13 profile usage > > > Key: SPARK-45272 > URL: https://issues.apache.org/jira/browse/SPARK-45272 > Project: Spark > Issue Type: Improvement > Components: Build, Documentation, Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > SPARK-44113 applied some changes directly from > {{dev/change-scala-version.sh}}. We should clean them up. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45269) Use Java 21-jre in K8s Dockerfile
[ https://issues.apache.org/jira/browse/SPARK-45269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45269: -- Summary: Use Java 21-jre in K8s Dockerfile (was: Use 21-jre in K8s Dockerfile) > Use Java 21-jre in K8s Dockerfile > - > > Key: SPARK-45269 > URL: https://issues.apache.org/jira/browse/SPARK-45269 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43159) Refine `column_op` to use lambda function instead of Column API.
[ https://issues.apache.org/jira/browse/SPARK-43159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee resolved SPARK-43159. - Resolution: Won't Fix > Refine `column_op` to use lambda function instead of Column API. > > > Key: SPARK-43159 > URL: https://issues.apache.org/jira/browse/SPARK-43159 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Refining `column_op(Column.__eq__)(left, right)` to use lambda function such > as `column_op(lambda x, y: x.__eq__(y))(left, right)` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45272) Remove Scala version specific comments, and scala-2.13 profile usage
Hyukjin Kwon created SPARK-45272: Summary: Remove Scala version specific comments, and scala-2.13 profile usage Key: SPARK-45272 URL: https://issues.apache.org/jira/browse/SPARK-45272 Project: Spark Issue Type: Improvement Components: Build, Documentation, Project Infra Affects Versions: 4.0.0 Reporter: Hyukjin Kwon SPARK-44113 applied some changes directly from {{dev/change-scala-version.sh}}. We should clean them up. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43711) Support `pyspark.ml.feature.Bucketizer` and `pyspark.mllib.stat.KernelDensity` to work with Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-43711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43711: --- Labels: pull-request-available (was: ) > Support `pyspark.ml.feature.Bucketizer` and > `pyspark.mllib.stat.KernelDensity` to work with Spark Connect. > -- > > Key: SPARK-43711 > URL: https://issues.apache.org/jira/browse/SPARK-43711 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, MLlib >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Repro: run `DataFramePlotParityTests.test_compute_hist_multi_columns` or ` > SeriesPlotMatplotlibParityTests.test_kde_plot` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45271) Merge _LEGACY_ERROR_TEMP_1113 into TABLE_OPERATION & delete some unused method in QueryCompilationErrors
[ https://issues.apache.org/jira/browse/SPARK-45271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-45271: Summary: Merge _LEGACY_ERROR_TEMP_1113 into TABLE_OPERATION & delete some unused method in QueryCompilationErrors (was: Merge _LEGACY_ERROR_TEMP_1113 into UNSUPPORTED_FEATURE.TABLE_OPERATION & delete some unused method in QueryCompilationErrors) > Merge _LEGACY_ERROR_TEMP_1113 into TABLE_OPERATION & delete some unused > method in QueryCompilationErrors > > > Key: SPARK-45271 > URL: https://issues.apache.org/jira/browse/SPARK-45271 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45271) Merge _LEGACY_ERROR_TEMP_1113 into UNSUPPORTED_FEATURE.TABLE_OPERATION & delete some unused method in QueryCompilationErrors
BingKun Pan created SPARK-45271: --- Summary: Merge _LEGACY_ERROR_TEMP_1113 into UNSUPPORTED_FEATURE.TABLE_OPERATION & delete some unused method in QueryCompilationErrors Key: SPARK-45271 URL: https://issues.apache.org/jira/browse/SPARK-45271 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42965) metadata mismatch for StructField when running some tests.
[ https://issues.apache.org/jira/browse/SPARK-42965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-42965: Parent: (was: SPARK-42497) Issue Type: Improvement (was: Sub-task) > metadata mismatch for StructField when running some tests. > -- > > Key: SPARK-42965 > URL: https://issues.apache.org/jira/browse/SPARK-42965 > Project: Spark > Issue Type: Improvement > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > For some reason, the metadata of `StructField` is different in a few tests > when using Spark Connect. However, the function works properly. > For example, when running `python/run-tests --testnames > 'pyspark.pandas.tests.connect.data_type_ops.test_parity_binary_ops > BinaryOpsParityTests.test_add'` it complains `AssertionError: > ([InternalField(dtype=int64, struct_field=StructField('bool', LongType(), > False))], [StructField('bool', LongType(), False)])` because metadata is > different something like `\{'__autoGeneratedAlias': 'true'}` but they have > same name, type and nullable, so the function just works well. > Therefore, we have temporarily added a branch for Spark Connect in the code > so that we can create InternalFrame properly to provide more pandas APIs in > Spark Connect. If a clear cause is found, we may need to revert it back to > its original state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42965) metadata mismatch for StructField when running some tests.
[ https://issues.apache.org/jira/browse/SPARK-42965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-42965: Epic Link: SPARK-39375 > metadata mismatch for StructField when running some tests. > -- > > Key: SPARK-42965 > URL: https://issues.apache.org/jira/browse/SPARK-42965 > Project: Spark > Issue Type: Improvement > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > For some reason, the metadata of `StructField` is different in a few tests > when using Spark Connect. However, the function works properly. > For example, when running `python/run-tests --testnames > 'pyspark.pandas.tests.connect.data_type_ops.test_parity_binary_ops > BinaryOpsParityTests.test_add'` it complains `AssertionError: > ([InternalField(dtype=int64, struct_field=StructField('bool', LongType(), > False))], [StructField('bool', LongType(), False)])` because metadata is > different something like `\{'__autoGeneratedAlias': 'true'}` but they have > same name, type and nullable, so the function just works well. > Therefore, we have temporarily added a branch for Spark Connect in the code > so that we can create InternalFrame properly to provide more pandas APIs in > Spark Connect. If a clear cause is found, we may need to revert it back to > its original state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45270) Upgrade `Volcano` to 1.8.0
[ https://issues.apache.org/jira/browse/SPARK-45270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45270: --- Labels: pull-request-available (was: ) > Upgrade `Volcano` to 1.8.0 > -- > > Key: SPARK-45270 > URL: https://issues.apache.org/jira/browse/SPARK-45270 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45270) Upgrade `Volcano` to 1.8.0
Dongjoon Hyun created SPARK-45270: - Summary: Upgrade `Volcano` to 1.8.0 Key: SPARK-45270 URL: https://issues.apache.org/jira/browse/SPARK-45270 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45269) Use 21-jre in K8s Dockerfile
[ https://issues.apache.org/jira/browse/SPARK-45269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45269: --- Labels: pull-request-available (was: ) > Use 21-jre in K8s Dockerfile > > > Key: SPARK-45269 > URL: https://issues.apache.org/jira/browse/SPARK-45269 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45269) Use 21-jre in K8s Dockerfile
Dongjoon Hyun created SPARK-45269: - Summary: Use 21-jre in K8s Dockerfile Key: SPARK-45269 URL: https://issues.apache.org/jira/browse/SPARK-45269 Project: Spark Issue Type: Sub-task Components: Kubernetes Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44113) Make Scala 2.13+ as default Scala version
[ https://issues.apache.org/jira/browse/SPARK-44113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44113. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43008 [https://github.com/apache/spark/pull/43008] > Make Scala 2.13+ as default Scala version > - > > Key: SPARK-44113 > URL: https://issues.apache.org/jira/browse/SPARK-44113 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45268) python function categories should be consistent with SQL function groups
[ https://issues.apache.org/jira/browse/SPARK-45268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45268: --- Labels: pull-request-available (was: ) > python function categories should be consistent with SQL function groups > > > Key: SPARK-45268 > URL: https://issues.apache.org/jira/browse/SPARK-45268 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45268) python function categories should be consistent with SQL function groups
Ruifeng Zheng created SPARK-45268: - Summary: python function categories should be consistent with SQL function groups Key: SPARK-45268 URL: https://issues.apache.org/jira/browse/SPARK-45268 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45251) Add client_type field for FetchErrorDetails
[ https://issues.apache.org/jira/browse/SPARK-45251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45251. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43031 [https://github.com/apache/spark/pull/43031] > Add client_type field for FetchErrorDetails > --- > > Key: SPARK-45251 > URL: https://issues.apache.org/jira/browse/SPARK-45251 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yihong He >Assignee: Yihong He >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45251) Add client_type field for FetchErrorDetails
[ https://issues.apache.org/jira/browse/SPARK-45251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45251: Assignee: Yihong He > Add client_type field for FetchErrorDetails > --- > > Key: SPARK-45251 > URL: https://issues.apache.org/jira/browse/SPARK-45251 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yihong He >Assignee: Yihong He >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45253) Correct the group of `ShiftLeft` and `ArraySize`
[ https://issues.apache.org/jira/browse/SPARK-45253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45253: Assignee: Ruifeng Zheng > Correct the group of `ShiftLeft` and `ArraySize` > > > Key: SPARK-45253 > URL: https://issues.apache.org/jira/browse/SPARK-45253 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45253) Correct the group of `ShiftLeft` and `ArraySize`
[ https://issues.apache.org/jira/browse/SPARK-45253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45253. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43033 [https://github.com/apache/spark/pull/43033] > Correct the group of `ShiftLeft` and `ArraySize` > > > Key: SPARK-45253 > URL: https://issues.apache.org/jira/browse/SPARK-45253 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43433) Match `GroupBy.nth` behavior with new pandas behavior
[ https://issues.apache.org/jira/browse/SPARK-43433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43433. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42994 [https://github.com/apache/spark/pull/42994] > Match `GroupBy.nth` behavior with new pandas behavior > - > > Key: SPARK-43433 > URL: https://issues.apache.org/jira/browse/SPARK-43433 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Match behavior with > https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#dataframegroupby-nth-and-seriesgroupby-nth-now-behave-as-filtrations -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43433) Match `GroupBy.nth` behavior with new pandas behavior
[ https://issues.apache.org/jira/browse/SPARK-43433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43433: Assignee: Haejoon Lee > Match `GroupBy.nth` behavior with new pandas behavior > - > > Key: SPARK-43433 > URL: https://issues.apache.org/jira/browse/SPARK-43433 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Match behavior with > https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#dataframegroupby-nth-and-seriesgroupby-nth-now-behave-as-filtrations -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43433) Match `GroupBy.nth` behavior with new pandas behavior
[ https://issues.apache.org/jira/browse/SPARK-43433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43433: --- Labels: pull-request-available (was: ) > Match `GroupBy.nth` behavior with new pandas behavior > - > > Key: SPARK-43433 > URL: https://issues.apache.org/jira/browse/SPARK-43433 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Match behavior with > https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#dataframegroupby-nth-and-seriesgroupby-nth-now-behave-as-filtrations -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44444) Enabled ANSI mode by default
[ https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767804#comment-17767804 ] Dongjoon Hyun commented on SPARK-4: --- +1 > Enabled ANSI mode by default > > > Key: SPARK-4 > URL: https://issues.apache.org/jira/browse/SPARK-4 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > > To avoid data issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44442) Drop mesos support
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767803#comment-17767803 ] Dongjoon Hyun commented on SPARK-2: --- +1 for the removal. We need some discussions as the final step in the dev mailing list. > Drop mesos support > -- > > Key: SPARK-2 > URL: https://issues.apache.org/jira/browse/SPARK-2 > Project: Spark > Issue Type: Sub-task > Components: Mesos >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > [https://spark.apache.org/docs/latest/running-on-mesos.html] > > {_}Note{_}: Apache Mesos support is deprecated as of Apache Spark 3.2.0. It > will be removed in a future version. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45093) AddArtifacts should give proper error messages if it fails
[ https://issues.apache.org/jira/browse/SPARK-45093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45093. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42949 [https://github.com/apache/spark/pull/42949] > AddArtifacts should give proper error messages if it fails > -- > > Key: SPARK-45093 > URL: https://issues.apache.org/jira/browse/SPARK-45093 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Alice Sayutina >Assignee: Alice Sayutina >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > I've been trying to do some testing of udf's using code in other module, so > that AddArtifact is necessary. > > I got the following error: > > > {code:java} > Traceback (most recent call last): > File "/Users/alice.sayutina/db-connect-playground/udf2.py", line 5, in > > spark.addArtifacts("udf2_support.py", pyfile=True) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/session.py", > line 744, in addArtifacts > self._client.add_artifacts(*path, pyfile=pyfile, archive=archive, > file=file) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py", > line 1582, in add_artifacts > self._artifact_manager.add_artifacts(*path, pyfile=pyfile, > archive=archive, file=file) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py", > line 283, in add_artifacts > self._request_add_artifacts(requests) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py", > line 259, in _request_add_artifacts > response: proto.AddArtifactsResponse = self._retrieve_responses(requests) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py", > line 256, in _retrieve_responses > return self._stub.AddArtifacts(requests, metadata=self._metadata) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/grpc/_channel.py", > line 1246, in __call__ > return _end_unary_response_blocking(state, call, False, None) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/grpc/_channel.py", > line 910, in _end_unary_response_blocking > raise _InactiveRpcError(state) # pytype: disable=not-instantiable > grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated > with: > status = StatusCode.UNKNOWN > details = "Exception iterating requests!" > debug_error_string = "None" > {code} > > Which doesn't give any clue about what happens. > Only after noticeable investigation I found the problem: I'm specifying the > wrong path and the artifact fails to upload. Specifically what happens is > that ArtifactManager doesn't read the file immediately, but rather creates > iterator object which will incrementally generate requests to send. This > iterator is passed to grpc's stream_unary to consume and actually send, and > while grpc catches the error (see above), it suppresses the underlying > exception. > I think we should improve pyspark user experience. One of the possible ways > to fix this is to wrap ArtifactsManager._create_requests with an iterator > wrapper which would log the throwable into spark connect logger so that user > would see something like below at least when the debug mode is on. > > {code:java} > FileNotFoundError: [Errno 2] No such file or directory: > '/Users/alice.sayutina/udf2_support.py' {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45093) AddArtifacts should give proper error messages if it fails
[ https://issues.apache.org/jira/browse/SPARK-45093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45093: Assignee: Alice Sayutina > AddArtifacts should give proper error messages if it fails > -- > > Key: SPARK-45093 > URL: https://issues.apache.org/jira/browse/SPARK-45093 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Alice Sayutina >Assignee: Alice Sayutina >Priority: Major > Labels: pull-request-available > > I've been trying to do some testing of udf's using code in other module, so > that AddArtifact is necessary. > > I got the following error: > > > {code:java} > Traceback (most recent call last): > File "/Users/alice.sayutina/db-connect-playground/udf2.py", line 5, in > > spark.addArtifacts("udf2_support.py", pyfile=True) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/session.py", > line 744, in addArtifacts > self._client.add_artifacts(*path, pyfile=pyfile, archive=archive, > file=file) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py", > line 1582, in add_artifacts > self._artifact_manager.add_artifacts(*path, pyfile=pyfile, > archive=archive, file=file) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py", > line 283, in add_artifacts > self._request_add_artifacts(requests) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py", > line 259, in _request_add_artifacts > response: proto.AddArtifactsResponse = self._retrieve_responses(requests) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py", > line 256, in _retrieve_responses > return self._stub.AddArtifacts(requests, metadata=self._metadata) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/grpc/_channel.py", > line 1246, in __call__ > return _end_unary_response_blocking(state, call, False, None) > File > "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/grpc/_channel.py", > line 910, in _end_unary_response_blocking > raise _InactiveRpcError(state) # pytype: disable=not-instantiable > grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated > with: > status = StatusCode.UNKNOWN > details = "Exception iterating requests!" > debug_error_string = "None" > {code} > > Which doesn't give any clue about what happens. > Only after noticeable investigation I found the problem: I'm specifying the > wrong path and the artifact fails to upload. Specifically what happens is > that ArtifactManager doesn't read the file immediately, but rather creates > iterator object which will incrementally generate requests to send. This > iterator is passed to grpc's stream_unary to consume and actually send, and > while grpc catches the error (see above), it suppresses the underlying > exception. > I think we should improve pyspark user experience. One of the possible ways > to fix this is to wrap ArtifactsManager._create_requests with an iterator > wrapper which would log the throwable into spark connect logger so that user > would see something like below at least when the debug mode is on. > > {code:java} > FileNotFoundError: [Errno 2] No such file or directory: > '/Users/alice.sayutina/udf2_support.py' {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45257) Enable spark.eventLog.compress by default
[ https://issues.apache.org/jira/browse/SPARK-45257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45257. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43036 [https://github.com/apache/spark/pull/43036] > Enable spark.eventLog.compress by default > - > > Key: SPARK-45257 > URL: https://issues.apache.org/jira/browse/SPARK-45257 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45257) Enable spark.eventLog.compress by default
[ https://issues.apache.org/jira/browse/SPARK-45257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45257: - Assignee: Dongjoon Hyun > Enable spark.eventLog.compress by default > - > > Key: SPARK-45257 > URL: https://issues.apache.org/jira/browse/SPARK-45257 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41086) Consolidate SecondArgumentXXX error to INVALID_PARAMETER_VALUE
[ https://issues.apache.org/jira/browse/SPARK-41086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-41086. - Fix Version/s: 3.5.1 4.0.0 Resolution: Fixed Issue resolved by pull request 43010 [https://github.com/apache/spark/pull/43010] > Consolidate SecondArgumentXXX error to INVALID_PARAMETER_VALUE > -- > > Key: SPARK-41086 > URL: https://issues.apache.org/jira/browse/SPARK-41086 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.5.1, 4.0.0 > > > SECOND_FUNCTION_ARGUMENT_NOT_INTEGER > _LEGACY_ERROR_TEMP_1104 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45267) Change the default value for `numeric_only`.
[ https://issues.apache.org/jira/browse/SPARK-45267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45267: --- Labels: pull-request-available (was: ) > Change the default value for `numeric_only`. > > > Key: SPARK-45267 > URL: https://issues.apache.org/jira/browse/SPARK-45267 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > To follow the Pandas 2.0.0 and above. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45267) Change the default value for `numeric_only`.
[ https://issues.apache.org/jira/browse/SPARK-45267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-45267: Summary: Change the default value for `numeric_only`. (was: Changed the default value for `numeric_only`.) > Change the default value for `numeric_only`. > > > Key: SPARK-45267 > URL: https://issues.apache.org/jira/browse/SPARK-45267 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > To follow the Pandas 2.0.0 and above. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45267) Changed the default value for `numeric_only`.
Haejoon Lee created SPARK-45267: --- Summary: Changed the default value for `numeric_only`. Key: SPARK-45267 URL: https://issues.apache.org/jira/browse/SPARK-45267 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee To follow the Pandas 2.0.0 and above. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45244) Correct spelling in VolcanoTestsSuite
[ https://issues.apache.org/jira/browse/SPARK-45244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You reassigned SPARK-45244: - Assignee: Binjie Yang > Correct spelling in VolcanoTestsSuite > - > > Key: SPARK-45244 > URL: https://issues.apache.org/jira/browse/SPARK-45244 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 3.5.0 >Reporter: Binjie Yang >Assignee: Binjie Yang >Priority: Minor > Labels: pull-request-available > > Typo in method naming checkAnnotaion -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45244) Correct spelling in VolcanoTestsSuite
[ https://issues.apache.org/jira/browse/SPARK-45244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You resolved SPARK-45244. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43026 [https://github.com/apache/spark/pull/43026] > Correct spelling in VolcanoTestsSuite > - > > Key: SPARK-45244 > URL: https://issues.apache.org/jira/browse/SPARK-45244 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 3.5.0 >Reporter: Binjie Yang >Assignee: Binjie Yang >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > Typo in method naming checkAnnotaion -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45266) Refactor ResolveFunctions analyzer rule to delay making lateral join when table arguments are used
[ https://issues.apache.org/jira/browse/SPARK-45266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45266: --- Labels: pull-request-available (was: ) > Refactor ResolveFunctions analyzer rule to delay making lateral join when > table arguments are used > -- > > Key: SPARK-45266 > URL: https://issues.apache.org/jira/browse/SPARK-45266 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45191) InMemoryTableScanExec simpleStringWithNodeId adds columnar info
[ https://issues.apache.org/jira/browse/SPARK-45191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You resolved SPARK-45191. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42967 [https://github.com/apache/spark/pull/42967] > InMemoryTableScanExec simpleStringWithNodeId adds columnar info > --- > > Key: SPARK-45191 > URL: https://issues.apache.org/jira/browse/SPARK-45191 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: XiDuo You >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > InMemoryTableScanExec supports both row-based and columnar input and output > which is based on the cache serialzier. It would be more friendly for user if > we can provide the columnar info to show whether it is columnar in/out. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44307) Bloom filter is not added for left outer join if the left side table is smaller than broadcast threshold.
[ https://issues.apache.org/jira/browse/SPARK-44307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44307: --- Labels: pull-request-available (was: ) > Bloom filter is not added for left outer join if the left side table is > smaller than broadcast threshold. > - > > Key: SPARK-44307 > URL: https://issues.apache.org/jira/browse/SPARK-44307 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.4.1 >Reporter: mahesh kumar behera >Priority: Major > Labels: pull-request-available > > In case of left outer join, even if the left side table is small enough to be > broadcasted, shuffle join is used. This is because of the property of the > left outer join. If the left side is broadcasted in left outer join, the > result generated will be wrong. But this is not taken care of in bloom > filter. While injecting the bloom filter, if lest side is smaller than > broadcast threshold, bloom filter is not added. It assumes that the left side > will be broadcast and there is no need for a bloom filter. This causes bloom > filter optimization to be missed in case of left outer join with small left > side and huge right-side table. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45163) Merge TABLE_OPERATION & _LEGACY_ERROR_TEMP_1113 into UNSUPPORTED_TABLE_OPERATION and refactor some logic
[ https://issues.apache.org/jira/browse/SPARK-45163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-45163: --- Assignee: BingKun Pan > Merge TABLE_OPERATION & _LEGACY_ERROR_TEMP_1113 into > UNSUPPORTED_TABLE_OPERATION and refactor some logic > > > Key: SPARK-45163 > URL: https://issues.apache.org/jira/browse/SPARK-45163 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45163) Merge TABLE_OPERATION & _LEGACY_ERROR_TEMP_1113 into UNSUPPORTED_TABLE_OPERATION and refactor some logic
[ https://issues.apache.org/jira/browse/SPARK-45163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-45163. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42917 [https://github.com/apache/spark/pull/42917] > Merge TABLE_OPERATION & _LEGACY_ERROR_TEMP_1113 into > UNSUPPORTED_TABLE_OPERATION and refactor some logic > > > Key: SPARK-45163 > URL: https://issues.apache.org/jira/browse/SPARK-45163 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45266) Refactor ResolveFunctions analyzer rule to delay making lateral join when table arguments are used
Takuya Ueshin created SPARK-45266: - Summary: Refactor ResolveFunctions analyzer rule to delay making lateral join when table arguments are used Key: SPARK-45266 URL: https://issues.apache.org/jira/browse/SPARK-45266 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-45264) Configurable error when generating Python docs
[ https://issues.apache.org/jira/browse/SPARK-45264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767764#comment-17767764 ] Haejoon Lee edited comment on SPARK-45264 at 9/22/23 12:28 AM: --- Currently the PySpark documentation build requires installing the latest Pandas version specified from [https://github.com/apache/spark/blob/master/python/pyspark/pandas/supported_api_gen.py#L101] to generate [Supported pandas API page|https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/supported_pandas_api.html#supported-pandas-api]. Current supported Pandas version is 2.1.0, so we should install the Pandas 2.1.0 instead of 2.0.3 for building the documentation to get the proper supported API list. was (Author: itholic): Currently the PySpark documentation build requires installing the latest Pandas version specified from [https://github.com/apache/spark/blob/master/python/pyspark/pandas/supported_api_gen.py#L101] to generate [Supported pandas API page|https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/supported_pandas_api.html#supported-pandas-api]. > Configurable error when generating Python docs > -- > > Key: SPARK-45264 > URL: https://issues.apache.org/jira/browse/SPARK-45264 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > > {{cd python/docs}} > {{make html }} > > Gives a Configuration error: > There is a programmable error in your configuration file: > ImportError: Warning: Latest version of pandas (2.1.0) is required to > generate the documentation; however, your version was 2.0.3 > make: *** [html] Error 2 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45264) Configurable error when generating Python docs
[ https://issues.apache.org/jira/browse/SPARK-45264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767764#comment-17767764 ] Haejoon Lee commented on SPARK-45264: - Currently the PySpark documentation build requires installing the latest Pandas version specified from [https://github.com/apache/spark/blob/master/python/pyspark/pandas/supported_api_gen.py#L101] to generate [Supported pandas API page|https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/supported_pandas_api.html#supported-pandas-api]. > Configurable error when generating Python docs > -- > > Key: SPARK-45264 > URL: https://issues.apache.org/jira/browse/SPARK-45264 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > > {{cd python/docs}} > {{make html }} > > Gives a Configuration error: > There is a programmable error in your configuration file: > ImportError: Warning: Latest version of pandas (2.1.0) is required to > generate the documentation; however, your version was 2.0.3 > make: *** [html] Error 2 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43152) User-defined output metadata path (_spark_metadata)
[ https://issues.apache.org/jira/browse/SPARK-43152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43152: --- Labels: pull-request-available (was: ) > User-defined output metadata path (_spark_metadata) > --- > > Key: SPARK-43152 > URL: https://issues.apache.org/jira/browse/SPARK-43152 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Wojciech Indyk >Priority: Major > Labels: pull-request-available > > Currently path of metadata of output checkpoint is hardcoded. The metadata is > saved in output path in _spark_metadata folder. It's a constraint on > structure of paths, that might be easily relaxed by parametrisable path of > output metadata. It would help with issues like [changing output directory of > spark streaming > job|https://kb.databricks.com/en_US/streaming/file-sink-streaming], [two jobs > writing to the same output > path|https://issues.apache.org/jira/browse/SPARK-30542] or [partition > discovery|https://stackoverflow.com/questions/61904732/is-it-possible-to-change-location-of-spark-metadata-folder-in-spark-structured/61905158]. > It would also help with separation of metadata from data in path structure. > The main target of change is getMetadataLogPath method in FileStreamSink. It > has got access to sqlConf, so this method can override the default > _spark_metadata path if defined it config. Introduction of parametrised > metadata path needs reconsidering of meaning of hasMetadata method in > FileStreamSink. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44838) Enhance raise_error() to exploit the new error framework
[ https://issues.apache.org/jira/browse/SPARK-44838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767762#comment-17767762 ] Hudson commented on SPARK-44838: User 'srielau' has created a pull request for this issue: https://github.com/apache/spark/pull/42985 > Enhance raise_error() to exploit the new error framework > > > Key: SPARK-44838 > URL: https://issues.apache.org/jira/browse/SPARK-44838 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Serge Rielau >Priority: Major > > raise_error() and assert_true() do not presently utilize the new error > framework. > We want to generalize raise_error() to take an error class, sqlstate and > message parameters as arguments to compose a well-formed error condition. > The existing assert_true(0 and raise_error() versions should return an error > class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45265) Support Hive 4.0 metastore
Attila Zsolt Piros created SPARK-45265: -- Summary: Support Hive 4.0 metastore Key: SPARK-45265 URL: https://issues.apache.org/jira/browse/SPARK-45265 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Attila Zsolt Piros Assignee: Attila Zsolt Piros Although Hive 4.0.0 is still beta I would like to work on this as Hive 4.0.0 will support support the pushdowns of partition column filters with VARCHAR/CHAR types. For details please see HIVE-26661: Support partition filter for char and varchar types on Hive metastore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45264) Configurable error when generating Python docs
[ https://issues.apache.org/jira/browse/SPARK-45264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767758#comment-17767758 ] Ruifeng Zheng commented on SPARK-45264: --- I think we already support 2.1.0? [~itholic] > Configurable error when generating Python docs > -- > > Key: SPARK-45264 > URL: https://issues.apache.org/jira/browse/SPARK-45264 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > > {{cd python/docs}} > {{make html }} > > Gives a Configuration error: > There is a programmable error in your configuration file: > ImportError: Warning: Latest version of pandas (2.1.0) is required to > generate the documentation; however, your version was 2.0.3 > make: *** [html] Error 2 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45262) Improve the description for `LIKE`
[ https://issues.apache.org/jira/browse/SPARK-45262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45262: --- Labels: pull-request-available (was: ) > Improve the description for `LIKE` > -- > > Key: SPARK-45262 > URL: https://issues.apache.org/jira/browse/SPARK-45262 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > > The description of `LIKE` says: > {code} > ... in order to match "\abc", the pattern should be "\\abc" > {code} > but in Spark SQL shell: > {code:sql} > spark-sql (default)> SELECT c FROM t; > \abc > spark-sql (default)> SELECT c LIKE "\\abc" FROM t; > [INVALID_FORMAT.ESC_IN_THE_MIDDLE] The format is invalid: '\\abc'. The escape > character is not allowed to precede 'a'. > spark-sql (default)> SELECT c LIKE "abc" FROM t; > true > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44937) Add SSL/TLS support for RPC and Shuffle communications
[ https://issues.apache.org/jira/browse/SPARK-44937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44937: --- Labels: pull-request-available (was: ) > Add SSL/TLS support for RPC and Shuffle communications > -- > > Key: SPARK-44937 > URL: https://issues.apache.org/jira/browse/SPARK-44937 > Project: Spark > Issue Type: New Feature > Components: Block Manager, Security, Shuffle, Spark Core >Affects Versions: 4.0.0 >Reporter: Hasnain Lakhani >Priority: Major > Labels: pull-request-available > > Add support for SSL/TLS based communication for Spark RPCs and block > transfers - providing an alternative to the existing encryption / > authentication implementation documented at > [https://spark.apache.org/docs/latest/security.html#spark-rpc-communication-protocol-between-spark-processes] > This is a superset of the functionality discussed in > https://issues.apache.org/jira/browse/SPARK-6373 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45252) `sbt doc` execution failed.
[ https://issues.apache.org/jira/browse/SPARK-45252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45252: - Assignee: Yang Jie > `sbt doc` execution failed. > --- > > Key: SPARK-45252 > URL: https://issues.apache.org/jira/browse/SPARK-45252 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.2, 4.0.0, 3.5.1, 3.3.4 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > > run > > {code:java} > build/sbt clean doc -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl > -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pvolcano > {code} > > {code:java} > [info] Main Scala API documentation successful. > [error] sbt.inc.Doc$JavadocGenerationFailed > [error] at > sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$cachedJavadoc$1(Doc.scala:51) > [error] at sbt.inc.Doc$$anonfun$cachedJavadoc$2.run(Doc.scala:41) > [error] at sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$prepare$1(Doc.scala:62) > [error] at sbt.inc.Doc$$anonfun$prepare$5.run(Doc.scala:57) > [error] at sbt.inc.Doc$.go$1(Doc.scala:73) > [error] at sbt.inc.Doc$.$anonfun$cached$5(Doc.scala:82) > [error] at sbt.inc.Doc$.$anonfun$cached$5$adapted(Doc.scala:81) > [error] at > sbt.util.Tracked$.$anonfun$inputChangedW$1(Tracked.scala:220) > [error] at sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$cached$1(Doc.scala:85) > [error] at sbt.inc.Doc$$anonfun$cached$7.run(Doc.scala:68) > [error] at > sbt.Defaults$.$anonfun$docTaskSettings$4(Defaults.scala:2178) > [error] at scala.Function1.$anonfun$compose$1(Function1.scala:49) > [error] at > sbt.internal.util.$tilde$greater.$anonfun$$u2219$1(TypeFunctions.scala:63) > [error] at sbt.std.Transform$$anon$4.work(Transform.scala:69) > [error] at sbt.Execute.$anonfun$submit$2(Execute.scala:283) > [error] at > sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:24) > [error] at sbt.Execute.work(Execute.scala:292) > [error] at sbt.Execute.$anonfun$submit$1(Execute.scala:283) > [error] at > sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:265) > [error] at > sbt.CompletionService$$anon$2.call(CompletionService.scala:65) > [error] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [error] at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [error] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [error] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [error] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [error] at java.lang.Thread.run(Thread.java:750) > [error] sbt.inc.Doc$JavadocGenerationFailed > [error] at > sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$cachedJavadoc$1(Doc.scala:51) > [error] at sbt.inc.Doc$$anonfun$cachedJavadoc$2.run(Doc.scala:41) > [error] at sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$prepare$1(Doc.scala:62) > [error] at sbt.inc.Doc$$anonfun$prepare$5.run(Doc.scala:57) > [error] at sbt.inc.Doc$.go$1(Doc.scala:73) > [error] at sbt.inc.Doc$.$anonfun$cached$5(Doc.scala:82) > [error] at sbt.inc.Doc$.$anonfun$cached$5$adapted(Doc.scala:81) > [error] at > sbt.util.Tracked$.$anonfun$inputChangedW$1(Tracked.scala:220) > [error] at sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$cached$1(Doc.scala:85) > [error] at sbt.inc.Doc$$anonfun$cached$7.run(Doc.scala:68) > [error] at > sbt.Defaults$.$anonfun$docTaskSettings$4(Defaults.scala:2178) > [error] at scala.Function1.$anonfun$compose$1(Function1.scala:49) > [error] at > sbt.internal.util.$tilde$greater.$anonfun$$u2219$1(TypeFunctions.scala:63) > [error] at sbt.std.Transform$$anon$4.work(Transform.scala:69) > [error] at sbt.Execute.$anonfun$submit$2(Execute.scala:283) > [error] at > sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:24) > [error] at sbt.Execute.work(Execute.scala:292) > [error] at sbt.Execute.$anonfun$submit$1(Execute.scala:283) > [error] at > sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:265) > [error] at > sbt.CompletionService$$anon$2.call(CompletionService.scala:65) > [error] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [error] at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [error] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [error] at > java.util.concurrent.ThreadPoolExecutor.r
[jira] [Resolved] (SPARK-45252) `sbt doc` execution failed.
[ https://issues.apache.org/jira/browse/SPARK-45252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45252. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43032 [https://github.com/apache/spark/pull/43032] > `sbt doc` execution failed. > --- > > Key: SPARK-45252 > URL: https://issues.apache.org/jira/browse/SPARK-45252 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.2, 4.0.0, 3.5.1, 3.3.4 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > run > > {code:java} > build/sbt clean doc -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl > -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pvolcano > {code} > > {code:java} > [info] Main Scala API documentation successful. > [error] sbt.inc.Doc$JavadocGenerationFailed > [error] at > sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$cachedJavadoc$1(Doc.scala:51) > [error] at sbt.inc.Doc$$anonfun$cachedJavadoc$2.run(Doc.scala:41) > [error] at sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$prepare$1(Doc.scala:62) > [error] at sbt.inc.Doc$$anonfun$prepare$5.run(Doc.scala:57) > [error] at sbt.inc.Doc$.go$1(Doc.scala:73) > [error] at sbt.inc.Doc$.$anonfun$cached$5(Doc.scala:82) > [error] at sbt.inc.Doc$.$anonfun$cached$5$adapted(Doc.scala:81) > [error] at > sbt.util.Tracked$.$anonfun$inputChangedW$1(Tracked.scala:220) > [error] at sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$cached$1(Doc.scala:85) > [error] at sbt.inc.Doc$$anonfun$cached$7.run(Doc.scala:68) > [error] at > sbt.Defaults$.$anonfun$docTaskSettings$4(Defaults.scala:2178) > [error] at scala.Function1.$anonfun$compose$1(Function1.scala:49) > [error] at > sbt.internal.util.$tilde$greater.$anonfun$$u2219$1(TypeFunctions.scala:63) > [error] at sbt.std.Transform$$anon$4.work(Transform.scala:69) > [error] at sbt.Execute.$anonfun$submit$2(Execute.scala:283) > [error] at > sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:24) > [error] at sbt.Execute.work(Execute.scala:292) > [error] at sbt.Execute.$anonfun$submit$1(Execute.scala:283) > [error] at > sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:265) > [error] at > sbt.CompletionService$$anon$2.call(CompletionService.scala:65) > [error] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [error] at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [error] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [error] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [error] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [error] at java.lang.Thread.run(Thread.java:750) > [error] sbt.inc.Doc$JavadocGenerationFailed > [error] at > sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$cachedJavadoc$1(Doc.scala:51) > [error] at sbt.inc.Doc$$anonfun$cachedJavadoc$2.run(Doc.scala:41) > [error] at sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$prepare$1(Doc.scala:62) > [error] at sbt.inc.Doc$$anonfun$prepare$5.run(Doc.scala:57) > [error] at sbt.inc.Doc$.go$1(Doc.scala:73) > [error] at sbt.inc.Doc$.$anonfun$cached$5(Doc.scala:82) > [error] at sbt.inc.Doc$.$anonfun$cached$5$adapted(Doc.scala:81) > [error] at > sbt.util.Tracked$.$anonfun$inputChangedW$1(Tracked.scala:220) > [error] at sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$cached$1(Doc.scala:85) > [error] at sbt.inc.Doc$$anonfun$cached$7.run(Doc.scala:68) > [error] at > sbt.Defaults$.$anonfun$docTaskSettings$4(Defaults.scala:2178) > [error] at scala.Function1.$anonfun$compose$1(Function1.scala:49) > [error] at > sbt.internal.util.$tilde$greater.$anonfun$$u2219$1(TypeFunctions.scala:63) > [error] at sbt.std.Transform$$anon$4.work(Transform.scala:69) > [error] at sbt.Execute.$anonfun$submit$2(Execute.scala:283) > [error] at > sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:24) > [error] at sbt.Execute.work(Execute.scala:292) > [error] at sbt.Execute.$anonfun$submit$1(Execute.scala:283) > [error] at > sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:265) > [error] at > sbt.CompletionService$$anon$2.call(CompletionService.scala:65) > [error] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [error] at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [erro
[jira] [Resolved] (SPARK-45263) Make EventLoggingListenerSuite independent from spark.eventLog.compress conf
[ https://issues.apache.org/jira/browse/SPARK-45263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45263. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43040 [https://github.com/apache/spark/pull/43040] > Make EventLoggingListenerSuite independent from spark.eventLog.compress conf > > > Key: SPARK-45263 > URL: https://issues.apache.org/jira/browse/SPARK-45263 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45263) Make EventLoggingListenerSuite independent from spark.eventLog.compress conf
[ https://issues.apache.org/jira/browse/SPARK-45263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45263: - Assignee: Dongjoon Hyun > Make EventLoggingListenerSuite independent from spark.eventLog.compress conf > > > Key: SPARK-45263 > URL: https://issues.apache.org/jira/browse/SPARK-45263 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45264) Configurable error when generating Python docs
[ https://issues.apache.org/jira/browse/SPARK-45264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767743#comment-17767743 ] Allison Wang commented on SPARK-45264: -- [~podongfeng] do we have ways to bypass such pandas version error when generating documentations? > Configurable error when generating Python docs > -- > > Key: SPARK-45264 > URL: https://issues.apache.org/jira/browse/SPARK-45264 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > > {{cd python/docs}} > {{make html }} > > Gives a Configuration error: > There is a programmable error in your configuration file: > ImportError: Warning: Latest version of pandas (2.1.0) is required to > generate the documentation; however, your version was 2.0.3 > make: *** [html] Error 2 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45264) Configurable error when generating Python docs
Allison Wang created SPARK-45264: Summary: Configurable error when generating Python docs Key: SPARK-45264 URL: https://issues.apache.org/jira/browse/SPARK-45264 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Allison Wang {{cd python/docs}} {{make html }} Gives a Configuration error: There is a programmable error in your configuration file: ImportError: Warning: Latest version of pandas (2.1.0) is required to generate the documentation; however, your version was 2.0.3 make: *** [html] Error 2 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45263) Make EventLoggingListenerSuite independent from spark.eventLog.compress conf
[ https://issues.apache.org/jira/browse/SPARK-45263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45263: -- Affects Version/s: (was: 3.5.0) (was: 3.4.1) > Make EventLoggingListenerSuite independent from spark.eventLog.compress conf > > > Key: SPARK-45263 > URL: https://issues.apache.org/jira/browse/SPARK-45263 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45263) Make EventLoggingListenerSuite independent from spark.eventLog.compress conf
[ https://issues.apache.org/jira/browse/SPARK-45263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45263: -- Affects Version/s: 3.5.0 > Make EventLoggingListenerSuite independent from spark.eventLog.compress conf > > > Key: SPARK-45263 > URL: https://issues.apache.org/jira/browse/SPARK-45263 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 3.4.1, 3.5.0, 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45261) Fix EventLogFileWriters to handle `none` codec case
[ https://issues.apache.org/jira/browse/SPARK-45261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45261: - Assignee: Dongjoon Hyun > Fix EventLogFileWriters to handle `none` codec case > --- > > Key: SPARK-45261 > URL: https://issues.apache.org/jira/browse/SPARK-45261 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45261) Fix EventLogFileWriters to handle `none` codec case
[ https://issues.apache.org/jira/browse/SPARK-45261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45261. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43038 [https://github.com/apache/spark/pull/43038] > Fix EventLogFileWriters to handle `none` codec case > --- > > Key: SPARK-45261 > URL: https://issues.apache.org/jira/browse/SPARK-45261 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45263) Make EventLoggingListenerSuite independent from spark.eventLog.compress conf
[ https://issues.apache.org/jira/browse/SPARK-45263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45263: --- Labels: pull-request-available (was: ) > Make EventLoggingListenerSuite independent from spark.eventLog.compress conf > > > Key: SPARK-45263 > URL: https://issues.apache.org/jira/browse/SPARK-45263 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 3.4.1, 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45263) Make EventLoggingListenerSuite independent from spark.eventLog.compress conf
Dongjoon Hyun created SPARK-45263: - Summary: Make EventLoggingListenerSuite independent from spark.eventLog.compress conf Key: SPARK-45263 URL: https://issues.apache.org/jira/browse/SPARK-45263 Project: Spark Issue Type: Test Components: Spark Core Affects Versions: 3.4.1, 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45220) Refine docstring of `DataFrame.join`
[ https://issues.apache.org/jira/browse/SPARK-45220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45220: --- Labels: pull-request-available (was: ) > Refine docstring of `DataFrame.join` > > > Key: SPARK-45220 > URL: https://issues.apache.org/jira/browse/SPARK-45220 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > > Refine the docstring of `DataFrame.join`. > The examples should also include: left join, left anit join, join on multiple > columns and column names, join on multiple conditions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45262) Improve the description for `LIKE`
Max Gekk created SPARK-45262: Summary: Improve the description for `LIKE` Key: SPARK-45262 URL: https://issues.apache.org/jira/browse/SPARK-45262 Project: Spark Issue Type: Documentation Components: SQL Affects Versions: 4.0.0 Reporter: Max Gekk Assignee: Max Gekk The description of `LIKE` says: {code} ... in order to match "\abc", the pattern should be "\\abc" {code} but in Spark SQL shell: {code:sql} spark-sql (default)> SELECT c FROM t; \abc spark-sql (default)> SELECT c LIKE "\\abc" FROM t; [INVALID_FORMAT.ESC_IN_THE_MIDDLE] The format is invalid: '\\abc'. The escape character is not allowed to precede 'a'. spark-sql (default)> SELECT c LIKE "abc" FROM t; true {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45261) Fix EventLogFileWriters to handle `none` codec case
[ https://issues.apache.org/jira/browse/SPARK-45261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45261: --- Labels: pull-request-available (was: ) > Fix EventLogFileWriters to handle `none` codec case > --- > > Key: SPARK-45261 > URL: https://issues.apache.org/jira/browse/SPARK-45261 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45261) Fix EventLogFileWriters to handle `none` codec case
Dongjoon Hyun created SPARK-45261: - Summary: Fix EventLogFileWriters to handle `none` codec case Key: SPARK-45261 URL: https://issues.apache.org/jira/browse/SPARK-45261 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45260) Refine docstring of count_distinct
Allison Wang created SPARK-45260: Summary: Refine docstring of count_distinct Key: SPARK-45260 URL: https://issues.apache.org/jira/browse/SPARK-45260 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Allison Wang Refine docstring of the function `count_distinct`, (e.g provide examples with groupBy) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45259) Refine docstring of `count`
Allison Wang created SPARK-45259: Summary: Refine docstring of `count` Key: SPARK-45259 URL: https://issues.apache.org/jira/browse/SPARK-45259 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Allison Wang Refine the docstring of the function `count` (e.g provide examples with groupby) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45258) Refine docstring of `sum`
Allison Wang created SPARK-45258: Summary: Refine docstring of `sum` Key: SPARK-45258 URL: https://issues.apache.org/jira/browse/SPARK-45258 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Allison Wang Refine the docstring of function `sum` (e.g provide examples with groupBy) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-45255: --- Description: java 1.8, sbt 1.9, scala 2.12 I have a very simple repo with the following dependency in `build.sbt` ``` {{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0")}} ``` A simple application ``` {{object Main extends App {}} {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} {{}}} ``` But when I run it, I get the following error ``` {{Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} {{ at Main$delayedInit$body.apply(Main.scala:3)}} {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} {{ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} {{ at scala.collection.immutable.List.foreach(List.scala:431)}} {{ at scala.App.main(App.scala:80)}} {{ at scala.App.main$(App.scala:78)}} {{ at Main$.main(Main.scala:3)}} {{ at Main.main(Main.scala)}} {{Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} {{ ... 11 more}} ``` I know the connect does a bunch of shading during assembly so it could be related to that. This application is not started via spark-submit or anything. It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of connect client ) EDIT Not sure if it's the right mitigation but explicitly adding guava worked but now I am in the 2nd territory of error {{Sep 21, 2023 8:21:59 PM org.sparkproject.connect.client.io.grpc.NameResolverRegistry getDefaultRegistry}} {{WARNING: No NameResolverProviders found via ServiceLoader, including for DNS. This is probably due to a broken build. If using ProGuard, check your configuration}} {{Exception in thread "main" org.sparkproject.connect.client.com.google.common.util.concurrent.UncheckedExecutionException: org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry$ProviderNotFoundException: No functional channel service provider found. Try adding a dependency on the grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact}} {{ at org.sparkproject.connect.client.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2085)}} {{ at org.sparkproject.connect.client.com.google.common.cache.LocalCache.get(LocalCache.java:4011)}} {{ at org.sparkproject.connect.client.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034)}} {{ at org.sparkproject.connect.client.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)}} {{ at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:945)}} {{ at scala.Option.getOrElse(Option.scala:189)}} {{ at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:945)}} {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} {{ at Main$delayedInit$body.apply(Main.scala:3)}} {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} {{ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} {{ at scala.collection.immutable.List.foreach(List.scala:431)}} {{ at scala.App.main(App.scala:80)}} {{ at scala.App.main$(App.scala:78)}} {{ at Main$.main(Main.scala:3)}} {{ at Main.main(Main.scala)}} {{Caused by: org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry$ProviderNotFoundException: No functional channel service provider found. Try adding a dependency on the grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact}} {{ at org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry.newChannelBuilder(ManagedChannelRegistry.java:179)}} {{ at org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry.newChannelBuilder(ManagedChannelRegistry.java:155)}} {{ at org.sparkproject.connect.client.io.grpc.Grpc.newChannelBuilder(Grpc.java:101)}} {{ at org.sparkproject.connect.client.io.grpc.Grpc.newChannelBuilderForAddress(Grpc.java:111)}} {{ at org.apache.spark.sql.connect.client.SparkConnectClient$Configuration.createChannel(SparkConnectClient.scala:633)}} {{ at org.apache.spark.sql.connect.client.SparkConnectClient$Configu
[jira] [Updated] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-45255: --- Description: java 1.8, sbt 1.9, scala 2.12 I have a very simple repo with the following dependency in `build.sbt` ``` {{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0")}} ``` A simple application ``` {{object Main extends App {}} {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} {{}}} ``` But when I run it, I get the following error ``` {{Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} {{ at Main$delayedInit$body.apply(Main.scala:3)}} {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} {{ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} {{ at scala.collection.immutable.List.foreach(List.scala:431)}} {{ at scala.App.main(App.scala:80)}} {{ at scala.App.main$(App.scala:78)}} {{ at Main$.main(Main.scala:3)}} {{ at Main.main(Main.scala)}} {{Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} {{ ... 11 more}} ``` I know the connect does a bunch of shading during assembly so it could be related to that. This application is not started via spark-submit or anything. It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of connect client ) I followed the doc exactly as described. Can somebody help was: I have a very simple repo with the following dependency in `build.sbt` ``` {{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0")}} ``` A simple application ``` {{object Main extends App {}} {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} {{}}} ``` But when I run it, I get the following error ``` {{Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} {{ at Main$delayedInit$body.apply(Main.scala:3)}} {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} {{ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} {{ at scala.collection.immutable.List.foreach(List.scala:431)}} {{ at scala.App.main(App.scala:80)}} {{ at scala.App.main$(App.scala:78)}} {{ at Main$.main(Main.scala:3)}} {{ at Main.main(Main.scala)}} {{Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} {{ ... 11 more}} ``` I know the connect does a bunch of shading during assembly so it could be related to that. This application is not started via spark-submit or anything. It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of connect client ) I followed the doc exactly as described. Can somebody help > Spark connect client failing with java.lang.NoClassDefFoundError > > > Key: SPARK-45255 > URL: https://issues.apache.org/jira/browse/SPARK-45255 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > java 1.8, sbt 1.9, scala 2.12 > > I have a very simple repo with the following dependency in `build.sbt` > ``` > {{libraryDependencies ++= Seq("org.apache.spark" %% > "spark-connect-client-jvm" % "3.5.0")}} > ``` > A simple application > ``` > {{object Main extends App {}} > {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} > {{}}} > ``` > But when I run it, I get the following error > > ``` > {{Exception in thread "main" ja
[jira] [Assigned] (SPARK-44113) Make Scala 2.13+ as default Scala version
[ https://issues.apache.org/jira/browse/SPARK-44113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44113: - Assignee: Yang Jie > Make Scala 2.13+ as default Scala version > - > > Key: SPARK-44113 > URL: https://issues.apache.org/jira/browse/SPARK-44113 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45257) Enable spark.eventLog.compress by default
[ https://issues.apache.org/jira/browse/SPARK-45257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45257: --- Labels: pull-request-available (was: ) > Enable spark.eventLog.compress by default > - > > Key: SPARK-45257 > URL: https://issues.apache.org/jira/browse/SPARK-45257 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45257) Enable spark.eventLog.compress by default
Dongjoon Hyun created SPARK-45257: - Summary: Enable spark.eventLog.compress by default Key: SPARK-45257 URL: https://issues.apache.org/jira/browse/SPARK-45257 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-45255: --- Issue Type: Bug (was: New Feature) > Spark connect client failing with java.lang.NoClassDefFoundError > > > Key: SPARK-45255 > URL: https://issues.apache.org/jira/browse/SPARK-45255 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > I have a very simple repo with the following dependency in `build.sbt` > ``` > {{libraryDependencies ++= Seq("org.apache.spark" %% > "spark-connect-client-jvm" % "3.5.0")}} > ``` > A simple application > ``` > {{object Main extends App {}} > {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} > {{}}} > ``` > But when I run it, I get the following error > > ``` > {{Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} > {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} > {{ at Main$delayedInit$body.apply(Main.scala:3)}} > {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} > {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} > {{ at > scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} > {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} > {{ at scala.collection.immutable.List.foreach(List.scala:431)}} > {{ at scala.App.main(App.scala:80)}} > {{ at scala.App.main$(App.scala:78)}} > {{ at Main$.main(Main.scala:3)}} > {{ at Main.main(Main.scala)}} > {{Caused by: java.lang.ClassNotFoundException: > org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} > {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} > {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} > {{ ... 11 more}} > ``` > I know the connect does a bunch of shading during assembly so it could be > related to that. This application is not started via spark-submit or > anything. It's not run neither under a `SPARK_HOME` ( I guess that's the > whole point of connect client ) > > I followed the doc exactly as described. Can somebody help -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767622#comment-17767622 ] Sebastian Daberdaku edited comment on SPARK-45201 at 9/21/23 4:59 PM: -- After spending hours analyzing the project pom files, I discovered two things. First, the shade plugin is relocating the guava/failureaccess package twice in the connect jars (once by the module shade plugin, once by the base project plugin). I created a simple patch to prevent the relocation of failureacces by the base plugin. I am adding the patch file [^spark-3.5.0.patch] to this Jira issue, I do not have time to create a pull request, you can apply the patch by navigating inside the source folder and run: {{patch -p1 NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0 > > > Key: SPARK-45201 > URL: https://issues.apache.org/jira/browse/SPARK-45201 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Sebastian Daberdaku >Priority: Major > Attachments: Dockerfile, spark-3.5.0.patch > > > I am trying to compile Spark 3.5.0 and make a distribution that supports > Spark Connect and Kubernetes. The compilation seems to complete correctly, > but when I try to run the Spark Connect server on kubernetes I get a > "NoClassDefFoundError" as follows: > {code:java} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168) > at > org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4011) > at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4034) > at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010) > at > org.apache.spark.storage.BlockManagerId$.getCachedBlockM
[jira] [Updated] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Daberdaku updated SPARK-45201: Attachment: spark-3.5.0.patch > NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0 > > > Key: SPARK-45201 > URL: https://issues.apache.org/jira/browse/SPARK-45201 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Sebastian Daberdaku >Priority: Major > Attachments: Dockerfile, spark-3.5.0.patch > > > I am trying to compile Spark 3.5.0 and make a distribution that supports > Spark Connect and Kubernetes. The compilation seems to complete correctly, > but when I try to run the Spark Connect server on kubernetes I get a > "NoClassDefFoundError" as follows: > {code:java} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168) > at > org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4011) > at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4034) > at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010) > at > org.apache.spark.storage.BlockManagerId$.getCachedBlockManagerId(BlockManagerId.scala:146) > at > org.apache.spark.storage.BlockManagerId$.apply(BlockManagerId.scala:127) > at > org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:536) > at org.apache.spark.SparkContext.(SparkContext.scala:625) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2888) > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1099) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1093) > at > org
[jira] [Comment Edited] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767622#comment-17767622 ] Sebastian Daberdaku edited comment on SPARK-45201 at 9/21/23 4:57 PM: -- After spending hours analyzing the project pom files, I discovered two things. First, the shade plugin is relocating the guava-failureaccess package twice in the connect jars (once by the module shade plugin, once by the base project plugin). I created a simple patch to prevent the relocation of failureacces by the base plugin. (I am adding the patch file [^spark-3.5.0.patch] to this Jira issue, I do not have time to create a pull request). Second, the spark-connect-common jar produced by make-distribution is redundant and was the cause of the class loading issues. Removing it resolves all these issues I had. was (Author: JIRAUSER302265): After spending hours analyzing the project pom files, I discovered two things. First, the shade plugin is relocating the guava-failureaccess package twice in the connect jars (once by the module shade plugin, once by the base project plugin). I created a simple patch to prevent the relocation of failureacces by the base plugin. (I am adding the patch file to this Jira issue, I do not have time to create a pull request). Second, the spark-connect-common jar produced by make-distribution is redundant and was the cause of the class loading issues. Removing it resolves all these issues I had. > NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0 > > > Key: SPARK-45201 > URL: https://issues.apache.org/jira/browse/SPARK-45201 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Sebastian Daberdaku >Priority: Major > Attachments: Dockerfile, spark-3.5.0.patch > > > I am trying to compile Spark 3.5.0 and make a distribution that supports > Spark Connect and Kubernetes. The compilation seems to complete correctly, > but when I try to run the Spark Connect server on kubernetes I get a > "NoClassDefFoundError" as follows: > {code:java} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoade
[jira] [Comment Edited] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767622#comment-17767622 ] Sebastian Daberdaku edited comment on SPARK-45201 at 9/21/23 4:56 PM: -- After spending hours analyzing the project pom files, I discovered two things. First, the shade plugin is relocating the guava-failureaccess package twice in the connect jars (once by the module shade plugin, once by the base project plugin). I created a simple patch to prevent the relocation of failureacces by the base plugin. (I am adding the patch file to this Jira issue, I do not have time to create a pull request). Second, the spark-connect-common jar produced by make-distribution is redundant and was the cause of the class loading issues. Removing it resolves all these issues I had. was (Author: JIRAUSER302265): After spending hours analyzing the project pom files, I discovered that by simply deleting the spark-connect-common jar all class loading issues are gone. I hope this might be usefult to others as well. > NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0 > > > Key: SPARK-45201 > URL: https://issues.apache.org/jira/browse/SPARK-45201 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Sebastian Daberdaku >Priority: Major > Attachments: Dockerfile > > > I am trying to compile Spark 3.5.0 and make a distribution that supports > Spark Connect and Kubernetes. The compilation seems to complete correctly, > but when I try to run the Spark Connect server on kubernetes I get a > "NoClassDefFoundError" as follows: > {code:java} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168) > at > org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079) > at org.sparkproject.g
[jira] [Updated] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-45255: --- Description: I have a very simple repo with the following dependency in `build.sbt` ``` {{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0")}} ``` A simple application ``` {{object Main extends App {}} {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} {{}}} ``` But when I run it, I get the following error ``` {{Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} {{ at Main$delayedInit$body.apply(Main.scala:3)}} {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} {{ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} {{ at scala.collection.immutable.List.foreach(List.scala:431)}} {{ at scala.App.main(App.scala:80)}} {{ at scala.App.main$(App.scala:78)}} {{ at Main$.main(Main.scala:3)}} {{ at Main.main(Main.scala)}} {{Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} {{ ... 11 more}} ``` I know the connect does a bunch of shading during assembly so it could be related to that. This application is not started via spark-submit or anything. It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of connect client ) I followed the doc exactly as described. Can somebody help was: I have a very simple repo with the following dependency in `build.sbt` ``` {{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0")}} ``` A simple application ``` {{object Main extends App {}} {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} {{}}} ``` But when I run it, I get the following error ``` {{Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} {{ at Main$delayedInit$body.apply(Main.scala:3)}} {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} {{ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} {{ at scala.collection.immutable.List.foreach(List.scala:431)}} {{ at scala.App.main(App.scala:80)}} {{ at scala.App.main$(App.scala:78)}} {{ at Main$.main(Main.scala:3)}} {{ at Main.main(Main.scala)}} {{Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} {{ ... 11 more}} ``` I know the connect does a bunch of shading during assembly so it could be related to that. This application is not started via spark-submit or anything. It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of connect client ) I followed the doc exactly as described. Can somebody help? BTW it did work if I copied the exact shading rules in my project but I wonder if that's the right thing to do? > Spark connect client failing with java.lang.NoClassDefFoundError > > > Key: SPARK-45255 > URL: https://issues.apache.org/jira/browse/SPARK-45255 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > I have a very simple repo with the following dependency in `build.sbt` > ``` > {{libraryDependencies ++= Seq("org.apache.spark" %% > "spark-connect-client-jvm" % "3.5.0")}} > ``` > A simple application > ``` > {{object Main extends App {}} > {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} > {{}}} > ``` > But when I run it, I get the follow
[jira] [Commented] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767622#comment-17767622 ] Sebastian Daberdaku commented on SPARK-45201: - After spending hours analyzing the project pom files, I discovered that by simply deleting the spark-connect-common jar all class loading issues are gone. I hope this might be usefult to others as well. > NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0 > > > Key: SPARK-45201 > URL: https://issues.apache.org/jira/browse/SPARK-45201 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Sebastian Daberdaku >Priority: Major > Attachments: Dockerfile > > > I am trying to compile Spark 3.5.0 and make a distribution that supports > Spark Connect and Kubernetes. The compilation seems to complete correctly, > but when I try to run the Spark Connect server on kubernetes I get a > "NoClassDefFoundError" as follows: > {code:java} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168) > at > org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4011) > at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4034) > at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010) > at > org.apache.spark.storage.BlockManagerId$.getCachedBlockManagerId(BlockManagerId.scala:146) > at > org.apache.spark.storage.BlockManagerId$.apply(BlockManagerId.scala:127) > at > org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:536) > at org.apache.spark.SparkContext.(SparkContext.scala:625) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2888) > at > org.apache.spark.sql.SparkSession$B
[jira] [Updated] (SPARK-45256) Arrow DurationWriter fails when vector is at capacity
[ https://issues.apache.org/jira/browse/SPARK-45256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45256: --- Labels: pull-request-available (was: ) > Arrow DurationWriter fails when vector is at capacity > - > > Key: SPARK-45256 > URL: https://issues.apache.org/jira/browse/SPARK-45256 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.2, 3.4.0, 3.4.1, 3.5.0, 3.5.1 >Reporter: Sander Goos >Priority: Major > Labels: pull-request-available > > The DurationWriter fails if more values are written than the initial capacity > of the DurationVector (4032). Fix by using `setSafe` instead of `set` method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45256) Arrow DurationWriter fails when vector is at capacity
Sander Goos created SPARK-45256: --- Summary: Arrow DurationWriter fails when vector is at capacity Key: SPARK-45256 URL: https://issues.apache.org/jira/browse/SPARK-45256 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0, 3.4.1, 3.4.0, 3.4.2, 3.5.1 Reporter: Sander Goos The DurationWriter fails if more values are written than the initial capacity of the DurationVector (4032). Fix by using `setSafe` instead of `set` method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45240) Implement Error Enrichment for Python Client
[ https://issues.apache.org/jira/browse/SPARK-45240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45240: --- Labels: pull-request-available (was: ) > Implement Error Enrichment for Python Client > > > Key: SPARK-45240 > URL: https://issues.apache.org/jira/browse/SPARK-45240 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yihong He >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-45255: --- Description: I have a very simple repo with the following dependency in `build.sbt` ``` {{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0")}} ``` A simple application ``` {{object Main extends App {}} {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} {{}}} ``` But when I run it, I get the following error ``` {{Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} {{ at Main$delayedInit$body.apply(Main.scala:3)}} {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} {{ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} {{ at scala.collection.immutable.List.foreach(List.scala:431)}} {{ at scala.App.main(App.scala:80)}} {{ at scala.App.main$(App.scala:78)}} {{ at Main$.main(Main.scala:3)}} {{ at Main.main(Main.scala)}} {{Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} {{ ... 11 more}} ``` I know the connect does a bunch of shading during assembly so it could be related to that. This application is not started via spark-submit or anything. It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of connect client ) I followed the doc exactly as described. Can somebody help? BTW it did work if I copied the exact shading rules in my project but I wonder if that's the right thing to do? was: I have a very simple repo with the following dependency in `build.sbt` ``` {{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0")}} ``` A simple application ``` {{object Main extends App {}} {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} {{}}} ``` But when I run it, I get the following error ``` {{Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} {{ at Main$delayedInit$body.apply(Main.scala:3)}} {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} {{ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} {{ at scala.collection.immutable.List.foreach(List.scala:431)}} {{ at scala.App.main(App.scala:80)}} {{ at scala.App.main$(App.scala:78)}} {{ at Main$.main(Main.scala:3)}} {{ at Main.main(Main.scala)}} {{Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} {{ ... 11 more}} ``` I know the connect does a bunch of shading during assembly so it could be related to that. This application is not started via spark-submit or anything. Neither under `SPARK_HOME` ( I guess that's the whole point of connect client ) I followed the doc exactly as described. Can somebody help? BTW it did work if I copied the exact shading rules in my project but I wonder if that's the right thing to do? > Spark connect client failing with java.lang.NoClassDefFoundError > > > Key: SPARK-45255 > URL: https://issues.apache.org/jira/browse/SPARK-45255 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > I have a very simple repo with the following dependency in `build.sbt` > ``` > {{libraryDependencies ++= Seq("org.apache.spark" %% > "spark-connect-client-jvm" % "3.5.0")}} > ``` > A simple application > ``` > {{object Main extends App {}} > {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{ s.read.json("