[jira] [Created] (SPARK-48040) Spark connect supports set scheduler pool name
xie shuiahu created SPARK-48040: --- Summary: Spark connect supports set scheduler pool name Key: SPARK-48040 URL: https://issues.apache.org/jira/browse/SPARK-48040 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.1 Reporter: xie shuiahu Spark supports fair scheduler and grouping job into pools, but spark connect don't support this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46732) Propagate ArtifactSet to broadcast execution thread
xie shuiahu created SPARK-46732: --- Summary: Propagate ArtifactSet to broadcast execution thread Key: SPARK-46732 URL: https://issues.apache.org/jira/browse/SPARK-46732 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.0 Reporter: xie shuiahu Similar with SPARK-44794,the JobArtifactState is not propagated to broadcast execution thread -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46732) Propagate JobArtifactSet to broadcast execution thread
[ https://issues.apache.org/jira/browse/SPARK-46732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xie shuiahu updated SPARK-46732: Summary: Propagate JobArtifactSet to broadcast execution thread (was: Propagate ArtifactSet to broadcast execution thread) > Propagate JobArtifactSet to broadcast execution thread > -- > > Key: SPARK-46732 > URL: https://issues.apache.org/jira/browse/SPARK-46732 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: xie shuiahu >Priority: Major > > Similar with SPARK-44794,the JobArtifactState is not propagated to broadcast > execution thread -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45821) SparkSession._apply_options shouldn't catch all exceptions
xie shuiahu created SPARK-45821: --- Summary: SparkSession._apply_options shouldn't catch all exceptions Key: SPARK-45821 URL: https://issues.apache.org/jira/browse/SPARK-45821 Project: Spark Issue Type: Bug Components: Connect, PySpark Affects Versions: 3.5.0 Reporter: xie shuiahu `SparkSession._apply_options` catches all exceptions, which is unexpected. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45814) ArrowConverters.createEmptyArrowBatch may cause memory leak
xie shuiahu created SPARK-45814: --- Summary: ArrowConverters.createEmptyArrowBatch may cause memory leak Key: SPARK-45814 URL: https://issues.apache.org/jira/browse/SPARK-45814 Project: Spark Issue Type: Bug Components: Connect, SQL Affects Versions: 3.5.0, 3.4.1 Reporter: xie shuiahu ArrowConverters.createEmptyArrowBatch don't call hasNext, if TaskContext.get is None, then memory leak happens -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45738) client will wait forever if session in spark connect server is evicted
[ https://issues.apache.org/jira/browse/SPARK-45738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xie shuiahu resolved SPARK-45738. - Resolution: Implemented > client will wait forever if session in spark connect server is evicted > -- > > Key: SPARK-45738 > URL: https://issues.apache.org/jira/browse/SPARK-45738 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: xie shuiahu >Priority: Critical > Labels: pull-request-available, query-lifecycle > > Step1. start a spark connect server > Step2. submit a spark job which will run long > {code:java} > from pyspark.sql import SparkSession > spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create() > spark.sql("A SQL will run longer than creating 100 sessions").show() {code} > > Step3. create more than 100 sessions > Tips: Run concurrently with step2 > {code:java} > for i in range(0, 200): > spark = > SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create() > spark.sql("show databases") {code} > > *When the python code in step3 is executed, the session created in step2 will > be evicted, and the client will wait forever* > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45738) client will wait forever if session in spark connect server is evicted
[ https://issues.apache.org/jira/browse/SPARK-45738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xie shuiahu updated SPARK-45738: Labels: pull-request-available query-lifecycle (was: pull-request-available) > client will wait forever if session in spark connect server is evicted > -- > > Key: SPARK-45738 > URL: https://issues.apache.org/jira/browse/SPARK-45738 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: xie shuiahu >Priority: Critical > Labels: pull-request-available, query-lifecycle > > Step1. start a spark connect server > Step2. submit a spark job which will run long > {code:java} > from pyspark.sql import SparkSession > spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create() > spark.sql("A SQL will run longer than creating 100 sessions").show() {code} > > Step3. create more than 100 sessions > Tips: Run concurrently with step2 > {code:java} > for i in range(0, 200): > spark = > SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create() > spark.sql("show databases") {code} > > *When the python code in step3 is executed, the session created in step2 will > be evicted, and the client will wait forever* > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45738) client will wait forever if session in spark connect server is evicted
[ https://issues.apache.org/jira/browse/SPARK-45738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xie shuiahu updated SPARK-45738: Description: Step1. start a spark connect server Step2. submit a spark job which will run long {code:java} from pyspark.sql import SparkSession spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create() spark.sql("A SQL will run longer than creating 100 sessions").show() {code} Step3. create more than 100 sessions Tips: Run concurrently with step2 {code:java} for i in range(0, 200): spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create() spark.sql("show databases") {code} *When the python code in step3 is executed, the session created in step2 will be evicted, and the client will wait forever* was: Step1. start a spark connect server Step2. submit a spark job which will run long {code:java} from pyspark.sql import SparkSession spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create() spark.sql("A SQL will run longer than creating 100 sessions").show() {code} Step3. create more than 100 sessions Tips: Run concurrently with step2 {code:java} for i in range(0, 200): spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create() spark.sql("show databases") {code} *When the python code in step3 is executed, the session created in step2 will be evicted, and the client will wait forever* The server will log exception like this: > client will wait forever if session in spark connect server is evicted > -- > > Key: SPARK-45738 > URL: https://issues.apache.org/jira/browse/SPARK-45738 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: xie shuiahu >Priority: Critical > > Step1. start a spark connect server > Step2. submit a spark job which will run long > {code:java} > from pyspark.sql import SparkSession > spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create() > spark.sql("A SQL will run longer than creating 100 sessions").show() {code} > > Step3. create more than 100 sessions > Tips: Run concurrently with step2 > {code:java} > for i in range(0, 200): > spark = > SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create() > spark.sql("show databases") {code} > > *When the python code in step3 is executed, the session created in step2 will > be evicted, and the client will wait forever* > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45738) client will wait forever if session in spark connect server is evicted
xie shuiahu created SPARK-45738: --- Summary: client will wait forever if session in spark connect server is evicted Key: SPARK-45738 URL: https://issues.apache.org/jira/browse/SPARK-45738 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: xie shuiahu Step1. start a spark connect server Step2. submit a spark job which will run long {code:java} from pyspark.sql import SparkSession spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create() spark.sql("A SQL will run longer than creating 100 sessions").show() {code} Step3. create more than 100 sessions Tips: Run concurrently with step2 {code:java} for i in range(0, 200): spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create() spark.sql("show databases") {code} *When the python code in step3 is executed, the session created in step2 will be evicted, and the client will wait forever* The server will log exception like this: -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45538) pyspark connect overwrite_partitions bug
xie shuiahu created SPARK-45538: --- Summary: pyspark connect overwrite_partitions bug Key: SPARK-45538 URL: https://issues.apache.org/jira/browse/SPARK-45538 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Environment: pyspark connect 3.5.0 Reporter: xie shuiahu DataFrameWriterV2.overwritePartitions set mode as *overwrite_partitions* [pyspark/sql/connect/readwriter.py, line 825], but WirteOperationV2 take it as *overwrite_partition* [pyspark/sql/connect/plan.py, line 1660] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773976#comment-17773976 ] xie shuiahu commented on SPARK-45201: - [~sdaberdaku] I alse have the same issue. I solved it by putting spark-connect.jar in spark-submit --jars, instead of SPARK_HOME/jars > NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0 > > > Key: SPARK-45201 > URL: https://issues.apache.org/jira/browse/SPARK-45201 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Sebastian Daberdaku >Priority: Major > Attachments: Dockerfile, spark-3.5.0.patch > > > I am trying to compile Spark 3.5.0 and make a distribution that supports > Spark Connect and Kubernetes. The compilation seems to complete correctly, > but when I try to run the Spark Connect server on kubernetes I get a > "NoClassDefFoundError" as follows: > {code:java} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168) > at > org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4011) > at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4034) > at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010) > at > org.apache.spark.storage.BlockManagerId$.getCachedBlockManagerId(BlockManagerId.scala:146) > at > org.apache.spark.storage.BlockManagerId$.apply(BlockManagerId.scala:127) > at > org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:536) > at org.apache.spark.SparkContext.(SparkContext.scala:625) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2888) > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1099) > at scal