[jira] [Created] (SPARK-48040) Spark connect supports set scheduler pool name

2024-04-29 Thread xie shuiahu (Jira)
xie shuiahu created SPARK-48040:
---

 Summary: Spark connect supports set scheduler pool name
 Key: SPARK-48040
 URL: https://issues.apache.org/jira/browse/SPARK-48040
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.5.1
Reporter: xie shuiahu


Spark supports fair scheduler and grouping job into pools, but spark connect 
don't support this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46732) Propagate ArtifactSet to broadcast execution thread

2024-01-16 Thread xie shuiahu (Jira)
xie shuiahu created SPARK-46732:
---

 Summary: Propagate ArtifactSet to broadcast execution thread
 Key: SPARK-46732
 URL: https://issues.apache.org/jira/browse/SPARK-46732
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.5.0
Reporter: xie shuiahu


Similar with SPARK-44794,the JobArtifactState is not propagated to broadcast 
execution thread



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46732) Propagate JobArtifactSet to broadcast execution thread

2024-01-16 Thread xie shuiahu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xie shuiahu updated SPARK-46732:

Summary: Propagate JobArtifactSet to broadcast execution thread  (was: 
Propagate ArtifactSet to broadcast execution thread)

> Propagate JobArtifactSet to broadcast execution thread
> --
>
> Key: SPARK-46732
> URL: https://issues.apache.org/jira/browse/SPARK-46732
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: xie shuiahu
>Priority: Major
>
> Similar with SPARK-44794,the JobArtifactState is not propagated to broadcast 
> execution thread



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45821) SparkSession._apply_options shouldn't catch all exceptions

2023-11-07 Thread xie shuiahu (Jira)
xie shuiahu created SPARK-45821:
---

 Summary: SparkSession._apply_options shouldn't catch all exceptions
 Key: SPARK-45821
 URL: https://issues.apache.org/jira/browse/SPARK-45821
 Project: Spark
  Issue Type: Bug
  Components: Connect, PySpark
Affects Versions: 3.5.0
Reporter: xie shuiahu


`SparkSession._apply_options`  catches all exceptions, which is unexpected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45814) ArrowConverters.createEmptyArrowBatch may cause memory leak

2023-11-06 Thread xie shuiahu (Jira)
xie shuiahu created SPARK-45814:
---

 Summary: ArrowConverters.createEmptyArrowBatch may cause memory 
leak
 Key: SPARK-45814
 URL: https://issues.apache.org/jira/browse/SPARK-45814
 Project: Spark
  Issue Type: Bug
  Components: Connect, SQL
Affects Versions: 3.5.0, 3.4.1
Reporter: xie shuiahu


ArrowConverters.createEmptyArrowBatch don't call hasNext, if TaskContext.get is 
None, then memory leak happens



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45738) client will wait forever if session in spark connect server is evicted

2023-11-05 Thread xie shuiahu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xie shuiahu resolved SPARK-45738.
-
Resolution: Implemented

> client will wait forever if session in spark connect server is evicted
> --
>
> Key: SPARK-45738
> URL: https://issues.apache.org/jira/browse/SPARK-45738
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: xie shuiahu
>Priority: Critical
>  Labels: pull-request-available, query-lifecycle
>
> Step1. start a spark connect server
> Step2. submit a spark job which will run long
> {code:java}
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create()
> spark.sql("A SQL will run longer than creating 100 sessions").show() {code}
>  
> Step3. create more than 100 sessions
> Tips: Run concurrently with step2
> {code:java}
> for i in range(0, 200):
>     spark = 
> SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create()
>     spark.sql("show databases") {code}
>  
> *When the python code in step3 is executed, the session created in step2 will 
> be evicted, and the client will wait forever*
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45738) client will wait forever if session in spark connect server is evicted

2023-11-03 Thread xie shuiahu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xie shuiahu updated SPARK-45738:

Labels: pull-request-available query-lifecycle  (was: 
pull-request-available)

> client will wait forever if session in spark connect server is evicted
> --
>
> Key: SPARK-45738
> URL: https://issues.apache.org/jira/browse/SPARK-45738
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: xie shuiahu
>Priority: Critical
>  Labels: pull-request-available, query-lifecycle
>
> Step1. start a spark connect server
> Step2. submit a spark job which will run long
> {code:java}
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create()
> spark.sql("A SQL will run longer than creating 100 sessions").show() {code}
>  
> Step3. create more than 100 sessions
> Tips: Run concurrently with step2
> {code:java}
> for i in range(0, 200):
>     spark = 
> SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create()
>     spark.sql("show databases") {code}
>  
> *When the python code in step3 is executed, the session created in step2 will 
> be evicted, and the client will wait forever*
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45738) client will wait forever if session in spark connect server is evicted

2023-10-31 Thread xie shuiahu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xie shuiahu updated SPARK-45738:

Description: 
Step1. start a spark connect server

Step2. submit a spark job which will run long
{code:java}
from pyspark.sql import SparkSession
spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create()
spark.sql("A SQL will run longer than creating 100 sessions").show() {code}
 

Step3. create more than 100 sessions

Tips: Run concurrently with step2
{code:java}
for i in range(0, 200):
    spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create()
    spark.sql("show databases") {code}
 

*When the python code in step3 is executed, the session created in step2 will 
be evicted, and the client will wait forever*

 

  was:
Step1. start a spark connect server


Step2. submit a spark job which will run long
{code:java}
from pyspark.sql import SparkSession
spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create()
spark.sql("A SQL will run longer than creating 100 sessions").show() {code}
 

Step3. create more than 100 sessions

Tips: Run concurrently with step2
{code:java}
for i in range(0, 200):
    spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create()
    spark.sql("show databases") {code}
 

*When the python code in step3 is executed, the session created in step2 will 
be evicted, and the client will wait forever*

The server will log exception like this:


> client will wait forever if session in spark connect server is evicted
> --
>
> Key: SPARK-45738
> URL: https://issues.apache.org/jira/browse/SPARK-45738
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: xie shuiahu
>Priority: Critical
>
> Step1. start a spark connect server
> Step2. submit a spark job which will run long
> {code:java}
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create()
> spark.sql("A SQL will run longer than creating 100 sessions").show() {code}
>  
> Step3. create more than 100 sessions
> Tips: Run concurrently with step2
> {code:java}
> for i in range(0, 200):
>     spark = 
> SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create()
>     spark.sql("show databases") {code}
>  
> *When the python code in step3 is executed, the session created in step2 will 
> be evicted, and the client will wait forever*
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45738) client will wait forever if session in spark connect server is evicted

2023-10-31 Thread xie shuiahu (Jira)
xie shuiahu created SPARK-45738:
---

 Summary: client will wait forever if session in spark connect 
server is evicted
 Key: SPARK-45738
 URL: https://issues.apache.org/jira/browse/SPARK-45738
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: xie shuiahu


Step1. start a spark connect server


Step2. submit a spark job which will run long
{code:java}
from pyspark.sql import SparkSession
spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create()
spark.sql("A SQL will run longer than creating 100 sessions").show() {code}
 

Step3. create more than 100 sessions

Tips: Run concurrently with step2
{code:java}
for i in range(0, 200):
    spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create()
    spark.sql("show databases") {code}
 

*When the python code in step3 is executed, the session created in step2 will 
be evicted, and the client will wait forever*

The server will log exception like this:



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45538) pyspark connect overwrite_partitions bug

2023-10-13 Thread xie shuiahu (Jira)
xie shuiahu created SPARK-45538:
---

 Summary: pyspark connect overwrite_partitions bug
 Key: SPARK-45538
 URL: https://issues.apache.org/jira/browse/SPARK-45538
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
 Environment: pyspark connect 3.5.0
Reporter: xie shuiahu


DataFrameWriterV2.overwritePartitions set mode as *overwrite_partitions* 
[pyspark/sql/connect/readwriter.py, line 825], but WirteOperationV2 take it as 
*overwrite_partition* [pyspark/sql/connect/plan.py, line 1660]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0

2023-10-11 Thread xie shuiahu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773976#comment-17773976
 ] 

xie shuiahu commented on SPARK-45201:
-

[~sdaberdaku] I alse have the same issue. I solved it by putting 
spark-connect.jar in spark-submit --jars, instead of SPARK_HOME/jars

> NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
> 
>
> Key: SPARK-45201
> URL: https://issues.apache.org/jira/browse/SPARK-45201
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Sebastian Daberdaku
>Priority: Major
> Attachments: Dockerfile, spark-3.5.0.patch
>
>
> I am trying to compile Spark 3.5.0 and make a distribution that supports 
> Spark Connect and Kubernetes. The compilation seems to complete correctly, 
> but when I try to run the Spark Connect server on kubernetes I get a 
> "NoClassDefFoundError" as follows:
> {code:java}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079)
>     at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4011)
>     at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4034)
>     at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)
>     at 
> org.apache.spark.storage.BlockManagerId$.getCachedBlockManagerId(BlockManagerId.scala:146)
>     at 
> org.apache.spark.storage.BlockManagerId$.apply(BlockManagerId.scala:127)
>     at 
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:536)
>     at org.apache.spark.SparkContext.(SparkContext.scala:625)
>     at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2888)
>     at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1099)
>     at scal