[jira] [Created] (SPARK-47341) Replace commands with relations in a few tests in SparkConnectClientSuite

2024-03-11 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-47341:


 Summary: Replace commands with relations in a few tests in 
SparkConnectClientSuite
 Key: SPARK-47341
 URL: https://issues.apache.org/jira/browse/SPARK-47341
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Venkata Sai Akhil Gudesa


A few 
[tests|https://github.com/apache/spark/blob/master/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala#L481-L527]
 in SparkConnectClientSuite attempt to test the result collection of a 
reattachable execution through the use of a SQL command. The SQL command, on a 
real server, is not executed eagerly (since it is a select command) and thus, 
is not entirely accurate. The test itself is non-problematic since a dummy 
server with dummy responses is used but a small improvement here would be to 
construct a relation rather than a command.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46660) ReattachExecute requests do not refresh aliveness of SessionHolder

2024-01-10 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-46660:


 Summary: ReattachExecute requests do not refresh aliveness of 
SessionHolder
 Key: SPARK-46660
 URL: https://issues.apache.org/jira/browse/SPARK-46660
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.0.0
Reporter: Venkata Sai Akhil Gudesa


In the first executePlan request, creating the {{ExecuteHolder}} triggers  
{{getOrCreateIsolatedSession}} which refreshes the aliveness of 
{{{}SessionHolder{}}}. However in {{ReattachExecute}} , we fetch the 
{{ExecuteHolder}} directly without going through the {{SessionHolder}} (and 
hence making it seem like the {{SessionHolder}} is idle).

 

This would result in long-running queries (which do not send release execute 
requests since that refreshes aliveness) failing because the {{SessionHolder}} 
would expire during active query execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46202) Expose new ArtifactManager API to support in-memory artifacts and sub-directory structure

2023-12-01 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-46202:


 Summary: Expose new ArtifactManager API to support in-memory 
artifacts and sub-directory structure
 Key: SPARK-46202
 URL: https://issues.apache.org/jira/browse/SPARK-46202
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Venkata Sai Akhil Gudesa


Currently, without the use of a REPL/Class finder, there is no API to support 
adding in-memory artifacts to the remote Spark Connect session. 

Further, there is currently no API to preserve/impose a sub-directory structure 
on the files we send over the wire.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45155) Add API Docs

2023-09-13 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-45155:


 Summary: Add API Docs
 Key: SPARK-45155
 URL: https://issues.apache.org/jira/browse/SPARK-45155
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


Similar to the pages listed here for regular Spark - 
[https://spark.apache.org/docs/latest/api/scala/org/apache/spark/index.html] 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44851) Update SparkConnectClientParser usage() method to match implementation

2023-09-09 Thread Venkata Sai Akhil Gudesa (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763343#comment-17763343
 ] 

Venkata Sai Akhil Gudesa commented on SPARK-44851:
--

[~harry] Go ahead :) 

> Update SparkConnectClientParser usage() method to match implementation
> --
>
> Key: SPARK-44851
> URL: https://issues.apache.org/jira/browse/SPARK-44851
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> Several missing options as well as inconsistent ones (`enable-ssl` vs 
> `use_ssl`)  
> https://github.com/apache/spark/blob/7af4e358f3f4902cc9601e56c2662b8921a925d6/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClientParser.scala#L31-L42



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44867) Refactor Spark Connect Docs to incorporate Scala setup

2023-08-18 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-44867:


 Summary: Refactor Spark Connect Docs to incorporate Scala setup
 Key: SPARK-44867
 URL: https://issues.apache.org/jira/browse/SPARK-44867
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


The current Spark Connect 
[overview|https://spark.apache.org/docs/latest/spark-connect-overview.html] 
does not include instructions to setup the Scala REPL as well using the Scala 
client in applications.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44851) Update SparkConnectClientParser usage() method to match implementation

2023-08-17 Thread Venkata Sai Akhil Gudesa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Sai Akhil Gudesa updated SPARK-44851:
-
Epic Link: SPARK-42554

> Update SparkConnectClientParser usage() method to match implementation
> --
>
> Key: SPARK-44851
> URL: https://issues.apache.org/jira/browse/SPARK-44851
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> Several missing options as well as inconsistent ones (`enable-ssl` vs 
> `use_ssl`)  
> https://github.com/apache/spark/blob/7af4e358f3f4902cc9601e56c2662b8921a925d6/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClientParser.scala#L31-L42



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44851) Update SparkConnectClientParser usage() method to match implementation

2023-08-17 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-44851:


 Summary: Update SparkConnectClientParser usage() method to match 
implementation
 Key: SPARK-44851
 URL: https://issues.apache.org/jira/browse/SPARK-44851
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


Several missing options as well as inconsistent ones (`enable-ssl` vs 
`use_ssl`)  

https://github.com/apache/spark/blob/7af4e358f3f4902cc9601e56c2662b8921a925d6/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClientParser.scala#L31-L42



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44829) Expose uploadAllArtifactClasses in ArtifactManager to `sql` package

2023-08-16 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-44829:


 Summary: Expose uploadAllArtifactClasses in ArtifactManager to 
`sql` package
 Key: SPARK-44829
 URL: https://issues.apache.org/jira/browse/SPARK-44829
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.1
Reporter: Venkata Sai Akhil Gudesa


Currently, the 
[uploadAllClassFilesArtifacts|https://github.com/apache/spark/blob/master/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala#L144-L146]
 method is private[client] but this limits the ability of non-client features 
to use UDFs (which require the class files). Currently, this is not an issue 
because classfiles are uploaded in all analyze/execute operations. Any new code 
paths would suffer from CNFE if they are not able to call this method. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44657) Incorrect limit handling and config parsing in Arrow collect

2023-08-03 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-44657:


 Summary: Incorrect limit handling and config parsing in Arrow 
collect
 Key: SPARK-44657
 URL: https://issues.apache.org/jira/browse/SPARK-44657
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.4.1, 3.4.0, 3.4.2, 3.5.0
Reporter: Venkata Sai Akhil Gudesa


In the arrow writer 
[code|https://github.com/apache/spark/blob/6161bf44f40f8146ea4c115c788fd4eaeb128769/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala#L154-L163]
 , the conditions don’t seem to hold what the documentation says regd 
"{_}maxBatchSize and maxRecordsPerBatch, respect whatever smaller"{_} since it 
seems to actually respect the conf which is "larger" (i.e less restrictive) due 
to _||_ operator.

 

Further, when the `{_}CONNECT_GRPC_ARROW_MAX_BATCH_SIZE{_}` conf is read, the 
value is not converted to bytes from Mib 
([example|https://github.com/apache/spark/blob/3e5203c64c06cc8a8560dfa0fb6f52e74589b583/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/SparkConnectPlanExecution.scala#L103]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44584) AddArtifactsRequest and ArtifactStatusesRequest do not set client_type information

2023-07-28 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-44584:


 Summary: AddArtifactsRequest and ArtifactStatusesRequest do not 
set client_type information
 Key: SPARK-44584
 URL: https://issues.apache.org/jira/browse/SPARK-44584
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44476) JobArtifactSet is populated with all artifacts if it is not associated with an artifact

2023-07-18 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-44476:


 Summary: JobArtifactSet is populated with all artifacts if it is 
not associated with an artifact
 Key: SPARK-44476
 URL: https://issues.apache.org/jira/browse/SPARK-44476
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0, 4.0.0
Reporter: Venkata Sai Akhil Gudesa
 Fix For: 3.5.0, 4.0.0


Consider each artifact type - files/jars/archives. For each artifact type, the 
following bug exists:
 # Initialise a `JobArtifactState` with no artifacts added to it.
 # Create a  `JobArtifactSet` from the `JobArtifactState`.
 # Add an artifact with the same active `JobArtifactState`.
 # Create another `JobArtifactSet`

In the current behaviour, the set created in step 2 contains all the artifacts 
(through `sc.allAddedFiles` for example) while step 3 contains only the single 
artifact added in step 3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44388) Using an updated instance of ScalarUserDefinedFunction causes protobuf cast failures on server

2023-07-12 Thread Venkata Sai Akhil Gudesa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Sai Akhil Gudesa updated SPARK-44388:
-
Epic Link: SPARK-42554

> Using an updated instance of ScalarUserDefinedFunction causes protobuf cast 
> failures on server
> --
>
> Key: SPARK-44388
> URL: https://issues.apache.org/jira/browse/SPARK-44388
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> When running the following code-
> {code:java}
> class A(x: Int) { def get = x * 7 }
> val myUdf = udf((x: Int) => new A(x).get)
> val modifiedUdf = myUdf.withName("myUdf")
> spark.range(5).select(modifiedUdf(col("id"))).as[Int].collect(){code}
> which modifies the original myUdf instance through the `withName` method 
> causes the following error to occur during execution:
> {noformat}
> java.lang.ClassCastException: org.apache.spark.connect.proto.ScalarScalaUDF 
> cannot be cast to com.google.protobuf.MessageLite
>     at 
> com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:1462)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274)
>     at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196)
>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
>     at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
>     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2285)
>     at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
>     at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
>     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
>     at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
>     at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093)
>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655)
>     at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
>     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
>     at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
>     at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
>     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
>     at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
>     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
>     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
>     at org.apache.spark.util.Utils$.deserialize(Utils.scala:148){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44388) Using an updated instance of ScalarUserDefinedFunction causes protobuf cast failures on server

2023-07-12 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-44388:


 Summary: Using an updated instance of ScalarUserDefinedFunction 
causes protobuf cast failures on server
 Key: SPARK-44388
 URL: https://issues.apache.org/jira/browse/SPARK-44388
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


When running the following code-
{code:java}
class A(x: Int) { def get = x * 7 }
val myUdf = udf((x: Int) => new A(x).get)
val modifiedUdf = myUdf.withName("myUdf")
spark.range(5).select(modifiedUdf(col("id"))).as[Int].collect(){code}
which modifies the original myUdf instance through the `withName` method causes 
the following error to occur during execution:
{noformat}
java.lang.ClassCastException: org.apache.spark.connect.proto.ScalarScalaUDF 
cannot be cast to com.google.protobuf.MessageLite
    at 
com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:1462)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
    at org.apache.spark.util.Utils$.deserialize(Utils.scala:148){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44300) SparkConnectArtifactManager#cleanUpResources deletes all artifacts instead of session-specific artifacts

2023-07-04 Thread Venkata Sai Akhil Gudesa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Sai Akhil Gudesa updated SPARK-44300:
-
Epic Link: SPARK-42554

> SparkConnectArtifactManager#cleanUpResources deletes all artifacts instead of 
> session-specific artifacts
> 
>
> Key: SPARK-44300
> URL: https://issues.apache.org/jira/browse/SPARK-44300
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> _SparkConnectArtifactManager#cleanUpResources_ deletes all resources instead 
> of session-specific resources. 
> This method is triggered through the _userSessionMapping_ cache when an entry 
> is removed 
> ([code|https://github.com/apache/spark/blob/b02ea4cd370ce6a066561dfde9d517ea70805a2b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala#L304]).
>  Once triggered, further artifact transfers and existing artifact usage would 
> fail. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44300) SparkConnectArtifactManager#cleanUpResources deletes all artifacts instead of session-specific artifacts

2023-07-04 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-44300:


 Summary: SparkConnectArtifactManager#cleanUpResources deletes all 
artifacts instead of session-specific artifacts
 Key: SPARK-44300
 URL: https://issues.apache.org/jira/browse/SPARK-44300
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


_SparkConnectArtifactManager#cleanUpResources_ deletes all resources instead of 
session-specific resources. 

This method is triggered through the _userSessionMapping_ cache when an entry 
is removed 
([code|https://github.com/apache/spark/blob/b02ea4cd370ce6a066561dfde9d517ea70805a2b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala#L304]).
 Once triggered, further artifact transfers and existing artifact usage would 
fail. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44293) Task failures during custom JAR fetch in executors

2023-07-04 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-44293:


 Summary: Task failures during custom JAR fetch in executors
 Key: SPARK-44293
 URL: https://issues.apache.org/jira/browse/SPARK-44293
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


When attempting to use a custom JAR in a Spark Connect session, the tasks fail 
due to the following error:
{code:java}
23/07/03 17:00:15 INFO Executor: Fetching 
spark://ip-10-110-22-170.us-west-2.compute.internal:43743/artifacts/d9548b02-ff3b-4278-ab52-aef5d1fc724e//home/venkata.gudesa/spark/artifacts/spark-d6141194-c487-40fd-ba40-444d922808ea/d9548b02-ff3b-4278-ab52-aef5d1fc724e/jars/TestHelloV2.jar
 with timestamp 0 23/07/03 17:00:15 ERROR Executor: Exception in task 6.0 in 
stage 4.0 (TID 55) java.lang.RuntimeException: Stream 
'/artifacts/d9548b02-ff3b-4278-ab52-aef5d1fc724e//home/venkata.gudesa/spark/artifacts/spark-d6141194-c487-40fd-ba40-444d922808ea/d9548b02-ff3b-4278-ab52-aef5d1fc724e/jars/TestHelloV2.jar'
 was not found.     at 
org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:260)
     at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:142)
     at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
     at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
 {code}
 

*Root Cause: The URI for the JAR file is invalid.* (Instead of the URI being in 
the form of {_}/artifacts//jars/{_}, its instead 
\{_}/artifacts//{_})



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44246) Follow-ups for Jar/Classfile Isolation

2023-06-29 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-44246:


 Summary: Follow-ups for Jar/Classfile Isolation
 Key: SPARK-44246
 URL: https://issues.apache.org/jira/browse/SPARK-44246
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


Related to https://issues.apache.org/jira/browse/SPARK-44146 
([PR|https://github.com/apache/spark/pull/41701]), this ticket is for the 
general follow-ups mentioned by [~hvanhovell] 
[here.|https://github.com/apache/spark/pull/41701#issuecomment-1608577372]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44146) Isolate Spark Connect session/artifacts

2023-06-29 Thread Venkata Sai Akhil Gudesa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Sai Akhil Gudesa updated SPARK-44146:
-
Epic Link: SPARK-42554

> Isolate Spark Connect session/artifacts
> ---
>
> Key: SPARK-44146
> URL: https://issues.apache.org/jira/browse/SPARK-44146
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Venkata Sai Akhil Gudesa
>Priority: Major
> Fix For: 3.5.0
>
>
> Following up on https://issues.apache.org/jira/browse/SPARK-44078, with the 
> support for classloader isolation implemented, we can now utilise it to 
> isolate Spark Connect sessions from each other. Here, isolation refers to 
> isolation of artifacts from each Spark Connect session which enables us to 
> have multi-user UDFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44146) Isolate Spark Connect session/artifacts

2023-06-22 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-44146:


 Summary: Isolate Spark Connect session/artifacts
 Key: SPARK-44146
 URL: https://issues.apache.org/jira/browse/SPARK-44146
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


Following up on https://issues.apache.org/jira/browse/SPARK-44078, with the 
support for classloader isolation implemented, we can now utilise it to isolate 
Spark Connect sessions from each other. Here, isolation refers to isolation of 
artifacts from each Spark Connect session which enables us to have multi-user 
UDFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44078) Add support for classloader/resource isolation

2023-06-16 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-44078:


 Summary: Add support for classloader/resource isolation
 Key: SPARK-44078
 URL: https://issues.apache.org/jira/browse/SPARK-44078
 Project: Spark
  Issue Type: New Feature
  Components: Connect, Spark Core
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


A current limitation of Scala UDFs is that a Spark cluster would only be able 
to support a single REPL at a time due to the fact that classloaders of 
different Spark Sessions (and therefore, Spark Connect sessions) aren't 
isolated from each other. Without isolation, REPL-generated classfiles as well 
as user-added JARs may conflict if there are multiple users of the cluster.

Thus, we need a mechanism to support isolated sessions (i.e isolated 
resources/classloader) so that each REPL user does not conflict with other 
users on the same cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44016) Artifacts with name as an absolute path may overwrite other files

2023-06-09 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-44016:


 Summary: Artifacts with name as an absolute path may overwrite 
other files 
 Key: SPARK-44016
 URL: https://issues.apache.org/jira/browse/SPARK-44016
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa
 Fix For: 3.5.0


In `SparkConnectAddArtifactsHandler`, an artifact being moved to a staging 
location may overwrite another file when the `name`/`path` of the artifact is 
an `absolute` path. 

This happens when the 
[stagedPath|https://github.com/apache/spark/blob/master/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectAddArtifactsHandler.scala#L172]
 is being computed with the help of the `.resolve(...)` method where the 
`resolve` method returns the `other` path (in this case, the name of the 
artifact) if the `other` path is an absolute path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43998) Add support for UDAF

2023-06-07 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-43998:


 Summary: Add support for UDAF
 Key: SPARK-43998
 URL: https://issues.apache.org/jira/browse/SPARK-43998
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


Reference: 
[https://github.com/apache/spark/blob/4547c9c90e3d35436afe89b10c794050ed8d04d7/sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala#L136-L175]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43995) Implement UDFRegistration

2023-06-07 Thread Venkata Sai Akhil Gudesa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Sai Akhil Gudesa updated SPARK-43995:
-
Description: 
Reference file - 
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala]

API to be implemented:
 * 
{noformat}
def register(name: String, udf: UserDefinedFunction): 
UserDefinedFunction{noformat}

 * 
 ** 
[Reference|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L112-L123]

 * 
{noformat}
def register[RT: TypeTag](name: String, func: Function0[RT]): 
UserDefinedFunction{noformat}

 * 
 ** From [0 to 22 
arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L125-L642]

 * 
{noformat}
def register(name: String, f: UDF0[_], returnType: DataType): Unit{noformat}

 * 
 ** From [0 to 22 
arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L735-L1076]

 

We currently do not support UDAFs so the relevant UDAF APIs may be skipped as 
well as the python/pyspark (in the context of the scala client) related APIs.

  was:
Reference file - 
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala]

API to be implemented:
 * def register(name: String, udf: UserDefinedFunction): UserDefinedFunction

 ** 
[Reference|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L112-L123]

 * def register[RT: TypeTag](name: String, func: Function0[RT]): 
UserDefinedFunction

 ** From [0 to 22 
arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L125-L642]

 * def register(name: String, f: UDF0[_], returnType: DataType): Unit

 ** From [0 to 22 
arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L735-L1076]

 

We currently do not support UDAFs so the relevant UDAF APIs may be skipped as 
well as the python/pyspark (in the context of the scala client) related APIs.


> Implement UDFRegistration
> -
>
> Key: SPARK-43995
> URL: https://issues.apache.org/jira/browse/SPARK-43995
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> Reference file - 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala]
> API to be implemented:
>  * 
> {noformat}
> def register(name: String, udf: UserDefinedFunction): 
> UserDefinedFunction{noformat}
>  * 
>  ** 
> [Reference|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L112-L123]
>  * 
> {noformat}
> def register[RT: TypeTag](name: String, func: Function0[RT]): 
> UserDefinedFunction{noformat}
>  * 
>  ** From [0 to 22 
> arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L125-L642]
>  * 
> {noformat}
> def register(name: String, f: UDF0[_], returnType: DataType): Unit{noformat}
>  * 
>  ** From [0 to 22 
> arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L735-L1076]
>  
> We currently do not support UDAFs so the relevant UDAF APIs may be skipped as 
> well as the python/pyspark (in the context of the scala client) related APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43996) Implement call_udf function

2023-06-07 Thread Venkata Sai Akhil Gudesa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Sai Akhil Gudesa updated SPARK-43996:
-
Description: 
Reference:[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L5824-L5842]

API: 
{noformat}
def call_udf(udfName: String, cols: Column*): Column{noformat}

  was:
Reference:[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L5824-L5842]

API: def call_udf(udfName: String, cols: Column*): Column


> Implement call_udf function
> ---
>
> Key: SPARK-43996
> URL: https://issues.apache.org/jira/browse/SPARK-43996
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> Reference:[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L5824-L5842]
> API: 
> {noformat}
> def call_udf(udfName: String, cols: Column*): Column{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43997) Add support for Java UDFs

2023-06-07 Thread Venkata Sai Akhil Gudesa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Sai Akhil Gudesa updated SPARK-43997:
-
Description: 
API: 
{code:java}
def udf(f: UDF0[_], returnType: DataType): UserDefinedFunction{code}
For 0 - 10 arguments, 
[reference|https://github.com/apache/spark/blob/747db6675da86e79d04e2fcc531b3b72b22ebf04/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L5624-L5780]

  was:
API: def udf(f: UDF0[_], returnType: DataType): UserDefinedFunction

For 0 - 10 arguments, 
[reference|https://github.com/apache/spark/blob/747db6675da86e79d04e2fcc531b3b72b22ebf04/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L5624-L5780]


> Add support for Java UDFs
> -
>
> Key: SPARK-43997
> URL: https://issues.apache.org/jira/browse/SPARK-43997
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> API: 
> {code:java}
> def udf(f: UDF0[_], returnType: DataType): UserDefinedFunction{code}
> For 0 - 10 arguments, 
> [reference|https://github.com/apache/spark/blob/747db6675da86e79d04e2fcc531b3b72b22ebf04/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L5624-L5780]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43997) Add support for Java UDFs

2023-06-07 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-43997:


 Summary: Add support for Java UDFs
 Key: SPARK-43997
 URL: https://issues.apache.org/jira/browse/SPARK-43997
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


API: def udf(f: UDF0[_], returnType: DataType): UserDefinedFunction

For 0 - 10 arguments, 
[reference|https://github.com/apache/spark/blob/747db6675da86e79d04e2fcc531b3b72b22ebf04/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L5624-L5780]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43996) Implement call_udf function

2023-06-07 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-43996:


 Summary: Implement call_udf function
 Key: SPARK-43996
 URL: https://issues.apache.org/jira/browse/SPARK-43996
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


Reference:[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L5824-L5842]

API: def call_udf(udfName: String, cols: Column*): Column



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43995) Implement UDFRegistration

2023-06-07 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-43995:


 Summary: Implement UDFRegistration
 Key: SPARK-43995
 URL: https://issues.apache.org/jira/browse/SPARK-43995
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


Reference file - 
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala]

API to be implemented:
 * def register(name: String, udf: UserDefinedFunction): UserDefinedFunction

 ** 
[Reference|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L112-L123]

 * def register[RT: TypeTag](name: String, func: Function0[RT]): 
UserDefinedFunction

 ** From [0 to 22 
arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L125-L642]

 * def register(name: String, f: UDF0[_], returnType: DataType): Unit

 ** From [0 to 22 
arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L735-L1076]

 

We currently do not support UDAFs so the relevant UDAF APIs may be skipped as 
well as the python/pyspark (in the context of the scala client) related APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43285) ReplE2ESuite consistently fails with JDK 17

2023-04-25 Thread Venkata Sai Akhil Gudesa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Sai Akhil Gudesa updated SPARK-43285:
-
Description: 
[[Comment|https://github.com/apache/spark/pull/40675#discussion_r1174696470] 
from [~gurwls223]]

This test consistently fails with JDK 17:
{code:java}
[info] ReplE2ESuite:
[info] - Simple query *** FAILED *** (10 seconds, 4 milliseconds)
[info] java.lang.RuntimeException: REPL Timed out while running command: 
[info] spark.sql("select 1").collect()
[info] 
[info] Console output: 
[info] Error output: Compiling (synthetic)/ammonite/predef/ArgsPredef.sc
[info] at 
org.apache.spark.sql.application.ReplE2ESuite.runCommandsInShell(ReplE2ESuite.scala:87)
[info] at 
org.apache.spark.sql.application.ReplE2ESuite.$anonfun$new$1(ReplE2ESuite.scala:102)
[info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info] at org.scalatest.Transformer.apply(Transformer.scala:22)
[info] at org.scalatest.Transformer.apply(Transformer.scala:20)
[info] at 
org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
[info] at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
[info] at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
[info] at org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
[info] at 
org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224){code}

[https://github.com/apache/spark/actions/runs/4780630672/jobs/8498505928#step:9:4647]
[https://github.com/apache/spark/actions/runs/4774942961/jobs/8488946907]
[https://github.com/apache/spark/actions/runs/4769162286/jobs/8479293802]
[https://github.com/apache/spark/actions/runs/4759278349/jobs/8458399201]
[https://github.com/apache/spark/actions/runs/4748319019/jobs/8434392414]

  was:
[[Comment|https://github.com/apache/spark/pull/40675#discussion_r1174696470] 
from [~gurwls223]]

This test consistently fails with JDK 17:
[info] ReplE2ESuite:
[info] - Simple query *** FAILED *** (10 seconds, 4 milliseconds)
[info]   java.lang.RuntimeException: REPL Timed out while running command: 
[info] spark.sql("select 1").collect()
[info]   
[info] Console output: 
[info] Error output: Compiling (synthetic)/ammonite/predef/ArgsPredef.sc
[info]   at 
org.apache.spark.sql.application.ReplE2ESuite.runCommandsInShell(ReplE2ESuite.scala:87)
[info]   at 
org.apache.spark.sql.application.ReplE2ESuite.$anonfun$new$1(ReplE2ESuite.scala:102)
[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
[info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
[info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
[info]   at 
org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
[https://github.com/apache/spark/actions/runs/4780630672/jobs/8498505928#step:9:4647]
[https://github.com/apache/spark/actions/runs/4774942961/jobs/8488946907]
[https://github.com/apache/spark/actions/runs/4769162286/jobs/8479293802]
[https://github.com/apache/spark/actions/runs/4759278349/jobs/8458399201]
[https://github.com/apache/spark/actions/runs/4748319019/jobs/8434392414]


> ReplE2ESuite consistently fails with JDK 17
> ---
>
> Key: SPARK-43285
> URL: https://issues.apache.org/jira/browse/SPARK-43285
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> [[Comment|https://github.com/apache/spark/pull/40675#discussion_r1174696470] 
> from [~gurwls223]]
> This test consistently fails with JDK 17:
> {code:java}
> [info] ReplE2ESuite:
> [info] - Simple query *** FAILED *** (10 seconds, 4 milliseconds)
> [info] java.lang.RuntimeException: REPL Timed out while running command: 
> [info] spark.sql("select 1").collect()
> [info] 
> [info] Console output: 
> [info] Error output: Compiling (synthetic)/ammonite/predef/ArgsPredef.sc
> [info] at 
> org.apache.spark.sql.application.ReplE2ESuite.runCommandsInShell(ReplE2ESuite.scala:87)
> [info] at 
> 

[jira] [Created] (SPARK-43285) ReplE2ESuite consistently fails with JDK 17

2023-04-25 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-43285:


 Summary: ReplE2ESuite consistently fails with JDK 17
 Key: SPARK-43285
 URL: https://issues.apache.org/jira/browse/SPARK-43285
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


[[Comment|https://github.com/apache/spark/pull/40675#discussion_r1174696470] 
from [~gurwls223]]

This test consistently fails with JDK 17:
[info] ReplE2ESuite:
[info] - Simple query *** FAILED *** (10 seconds, 4 milliseconds)
[info]   java.lang.RuntimeException: REPL Timed out while running command: 
[info] spark.sql("select 1").collect()
[info]   
[info] Console output: 
[info] Error output: Compiling (synthetic)/ammonite/predef/ArgsPredef.sc
[info]   at 
org.apache.spark.sql.application.ReplE2ESuite.runCommandsInShell(ReplE2ESuite.scala:87)
[info]   at 
org.apache.spark.sql.application.ReplE2ESuite.$anonfun$new$1(ReplE2ESuite.scala:102)
[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
[info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
[info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
[info]   at 
org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
[https://github.com/apache/spark/actions/runs/4780630672/jobs/8498505928#step:9:4647]
[https://github.com/apache/spark/actions/runs/4774942961/jobs/8488946907]
[https://github.com/apache/spark/actions/runs/4769162286/jobs/8479293802]
[https://github.com/apache/spark/actions/runs/4759278349/jobs/8458399201]
[https://github.com/apache/spark/actions/runs/4748319019/jobs/8434392414]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43227) Fix deserialisation issue when UDFs contain a lambda expression

2023-04-20 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-43227:


 Summary: Fix deserialisation issue when UDFs contain a lambda 
expression
 Key: SPARK-43227
 URL: https://issues.apache.org/jira/browse/SPARK-43227
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


The following code:
{code:java}
class A(x: Int) { def get = x * 20 + 5 }
val dummyUdf = (x: Int) => new A(x).get
val myUdf = udf(dummyUdf)
spark.range(5).select(myUdf(col("id"))).as[Int].collect() {code}
hits the following error:
{noformat}
io.grpc.StatusRuntimeException: INTERNAL: cannot assign instance of 
java.lang.invoke.SerializedLambda to field ammonite.$sess.cmd26$Helper.dummyUdf 
of type scala.Function1 in instance of ammonite.$sess.cmd26$Helper
  io.grpc.Status.asRuntimeException(Status.java:535)
  io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
  
org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:62)
  org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:114)
  org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:131)
  org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2687)
  org.apache.spark.sql.Dataset.withResult(Dataset.scala:3088)
  org.apache.spark.sql.Dataset.collect(Dataset.scala:2686)
  ammonite.$sess.cmd28$Helper.(cmd28.sc:1)
  ammonite.$sess.cmd28$.(cmd28.sc:7)
  ammonite.$sess.cmd28$.(cmd28.sc){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43198) Fix "Could not initialise class ammonite..." error when using filter

2023-04-19 Thread Venkata Sai Akhil Gudesa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Sai Akhil Gudesa updated SPARK-43198:
-
Description: 
When
{code:java}
spark.range(10).filter(n => n % 2 == 0).collectAsList()`{code}
 is run in the ammonite REPL (Spark Connect), the following error is thrown:
{noformat}
io.grpc.StatusRuntimeException: UNKNOWN: ammonite/repl/ReplBridge$
  io.grpc.Status.asRuntimeException(Status.java:535)
  io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
  
org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:62)
  org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:114)
  org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:131)
  org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2687)
  org.apache.spark.sql.Dataset.withResult(Dataset.scala:3088)
  org.apache.spark.sql.Dataset.collect(Dataset.scala:2686)
  org.apache.spark.sql.Dataset.collectAsList(Dataset.scala:2700)
  ammonite.$sess.cmd0$.(cmd0.sc:1)
  ammonite.$sess.cmd0$.(cmd0.sc){noformat}

  was:
When `spark.range(10).filter(n => n % 2 == 0).collectAsList()` is run in the 
ammonite REPL (Spark Connect), the following error is thrown:

```

io.grpc.StatusRuntimeException: UNKNOWN: ammonite/repl/ReplBridge$
  io.grpc.Status.asRuntimeException(Status.java:535)
  io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
  
org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:62)
  org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:114)
  org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:131)
  org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2687)
  org.apache.spark.sql.Dataset.withResult(Dataset.scala:3088)
  org.apache.spark.sql.Dataset.collect(Dataset.scala:2686)
  org.apache.spark.sql.Dataset.collectAsList(Dataset.scala:2700)
  ammonite.$sess.cmd0$.(cmd0.sc:1)
  ammonite.$sess.cmd0$.(cmd0.sc)

```


> Fix "Could not initialise class ammonite..." error when using filter
> 
>
> Key: SPARK-43198
> URL: https://issues.apache.org/jira/browse/SPARK-43198
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> When
> {code:java}
> spark.range(10).filter(n => n % 2 == 0).collectAsList()`{code}
>  is run in the ammonite REPL (Spark Connect), the following error is thrown:
> {noformat}
> io.grpc.StatusRuntimeException: UNKNOWN: ammonite/repl/ReplBridge$
>   io.grpc.Status.asRuntimeException(Status.java:535)
>   
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
>   
> org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:62)
>   
> org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:114)
>   
> org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:131)
>   org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2687)
>   org.apache.spark.sql.Dataset.withResult(Dataset.scala:3088)
>   org.apache.spark.sql.Dataset.collect(Dataset.scala:2686)
>   org.apache.spark.sql.Dataset.collectAsList(Dataset.scala:2700)
>   ammonite.$sess.cmd0$.(cmd0.sc:1)
>   ammonite.$sess.cmd0$.(cmd0.sc){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43198) Fix "Could not initialise class ammonite..." error when using filter

2023-04-19 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-43198:


 Summary: Fix "Could not initialise class ammonite..." error when 
using filter
 Key: SPARK-43198
 URL: https://issues.apache.org/jira/browse/SPARK-43198
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


When `spark.range(10).filter(n => n % 2 == 0).collectAsList()` is run in the 
ammonite REPL (Spark Connect), the following error is thrown:

```

io.grpc.StatusRuntimeException: UNKNOWN: ammonite/repl/ReplBridge$
  io.grpc.Status.asRuntimeException(Status.java:535)
  io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
  
org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:62)
  org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:114)
  org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:131)
  org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2687)
  org.apache.spark.sql.Dataset.withResult(Dataset.scala:3088)
  org.apache.spark.sql.Dataset.collect(Dataset.scala:2686)
  org.apache.spark.sql.Dataset.collectAsList(Dataset.scala:2700)
  ammonite.$sess.cmd0$.(cmd0.sc:1)
  ammonite.$sess.cmd0$.(cmd0.sc)

```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42812) client_type is missing from AddArtifactsRequest proto message

2023-03-15 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-42812:


 Summary: client_type is missing from AddArtifactsRequest proto 
message
 Key: SPARK-42812
 URL: https://issues.apache.org/jira/browse/SPARK-42812
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.4.0
Reporter: Venkata Sai Akhil Gudesa


The client_type is missing from AddArtifactsRequest proto message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42748) Server-side Artifact Management

2023-03-10 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-42748:


 Summary: Server-side Artifact Management
 Key: SPARK-42748
 URL: https://issues.apache.org/jira/browse/SPARK-42748
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.4.0
Reporter: Venkata Sai Akhil Gudesa


https://issues.apache.org/jira/browse/SPARK-42653 implements the client-side 
transfer of artifacts to the server but currently, the server does not process 
these requests.

 

We need to implement a server-side management mechanism to handle storage of 
these artifacts on the driver as well as perform further processing (such as 
adding jars and moving class files to the right directories)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42658) Handle timeouts and CRC failures during artifact transfer

2023-03-02 Thread Venkata Sai Akhil Gudesa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Sai Akhil Gudesa updated SPARK-42658:
-
Description: 
We would need a retry mechanism on the client side to handle CRC failures 
during artifact transfer because the server would discard data that fails CRC 
and hence, may lead to missing artifacts during UDF execution. 

We also require a timeout policy to prevent indefinitely waiting for the server 
reply.

  was:We would need a retry mechanism on the client side to handle CRC failures 
during artifact transfer. The server would discard data that fails CRC and 
hence, may lead to missing artifacts during UDF execution. 


> Handle timeouts and CRC failures during artifact transfer
> -
>
> Key: SPARK-42658
> URL: https://issues.apache.org/jira/browse/SPARK-42658
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> We would need a retry mechanism on the client side to handle CRC failures 
> during artifact transfer because the server would discard data that fails CRC 
> and hence, may lead to missing artifacts during UDF execution. 
> We also require a timeout policy to prevent indefinitely waiting for the 
> server reply.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42658) Handle timeouts and CRC failures during artifact transfer

2023-03-02 Thread Venkata Sai Akhil Gudesa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Sai Akhil Gudesa updated SPARK-42658:
-
Summary: Handle timeouts and CRC failures during artifact transfer  (was: 
Handle CRC failures during artifact transfer)

> Handle timeouts and CRC failures during artifact transfer
> -
>
> Key: SPARK-42658
> URL: https://issues.apache.org/jira/browse/SPARK-42658
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> We would need a retry mechanism on the client side to handle CRC failures 
> during artifact transfer. The server would discard data that fails CRC and 
> hence, may lead to missing artifacts during UDF execution. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42658) Handle CRC failures during artifact transfer

2023-03-02 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-42658:


 Summary: Handle CRC failures during artifact transfer
 Key: SPARK-42658
 URL: https://issues.apache.org/jira/browse/SPARK-42658
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Venkata Sai Akhil Gudesa


We would need a retry mechanism on the client side to handle CRC failures 
during artifact transfer. The server would discard data that fails CRC and 
hence, may lead to missing artifacts during UDF execution. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42657) Support to find and transfer client-side REPL classfiles to server as artifacts

2023-03-02 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-42657:


 Summary: Support to find and transfer client-side REPL classfiles 
to server as artifacts  
 Key: SPARK-42657
 URL: https://issues.apache.org/jira/browse/SPARK-42657
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Venkata Sai Akhil Gudesa


To run UDFs which are defined on the client side REPL, we require a mechanism 
that can find the local REPL classfiles and then utilise the mechanism from 
https://issues.apache.org/jira/browse/SPARK-42653 to transfer them to the 
server as artifacts.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42657) Support to find and transfer client-side REPL classfiles to server as artifacts

2023-03-02 Thread Venkata Sai Akhil Gudesa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Sai Akhil Gudesa updated SPARK-42657:
-
Epic Link: SPARK-42554

> Support to find and transfer client-side REPL classfiles to server as 
> artifacts  
> -
>
> Key: SPARK-42657
> URL: https://issues.apache.org/jira/browse/SPARK-42657
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> To run UDFs which are defined on the client side REPL, we require a mechanism 
> that can find the local REPL classfiles and then utilise the mechanism from 
> https://issues.apache.org/jira/browse/SPARK-42653 to transfer them to the 
> server as artifacts.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42653) Artifact transfer from Scala/JVM client to Server

2023-03-02 Thread Venkata Sai Akhil Gudesa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Sai Akhil Gudesa updated SPARK-42653:
-
Epic Link: SPARK-42554

> Artifact transfer from Scala/JVM client to Server
> -
>
> Key: SPARK-42653
> URL: https://issues.apache.org/jira/browse/SPARK-42653
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> In the decoupled client-server architecture of Spark Connect, a remote client 
> may use a local JAR or a new class in their UDF that may not be present on 
> the server. To handle these cases of missing "artifacts", we need to 
> implement a mechanism to transfer artifacts from the client side over to the 
> server side as per the protocol defined in 
> https://github.com/apache/spark/pull/40147 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42653) Artifact transfer from Scala/JVM client to Server

2023-03-02 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-42653:


 Summary: Artifact transfer from Scala/JVM client to Server
 Key: SPARK-42653
 URL: https://issues.apache.org/jira/browse/SPARK-42653
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Venkata Sai Akhil Gudesa


In the decoupled client-server architecture of Spark Connect, a remote client 
may use a local JAR or a new class in their UDF that may not be present on the 
server. To handle these cases of missing "artifacts", we need to implement a 
mechanism to transfer artifacts from the client side over to the server side as 
per the protocol defined in https://github.com/apache/spark/pull/40147 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42543) Specify protocol for UDF artifact transfer in JVM/Scala client

2023-02-23 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-42543:


 Summary: Specify protocol for UDF artifact transfer in JVM/Scala 
client 
 Key: SPARK-42543
 URL: https://issues.apache.org/jira/browse/SPARK-42543
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Venkata Sai Akhil Gudesa


An "artifact" is any file that may be used during the execution of a UDF.

In the decoupled client-server architecture of Spark Connect, a remote client 
may use a local JAR or a new class in their UDF that may not be present on the 
server. To handle these cases of missing "artifacts", a protocol for artifact 
transfer is needed to move the required artifacts from the client side over to 
the server side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client

2023-02-01 Thread Venkata Sai Akhil Gudesa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Sai Akhil Gudesa updated SPARK-42283:
-
Description: “Simple” here refers to UDFs that utilize no client-specific 
class files (e.g REPL-generated) and JARs. Essentially, a “simple” UDF may only 
reference in-built libraries and classes defined within the scope of the UDF.  
(was: “Simple” here refers to UDFs that utilize no client-specific class files 
(e.g REPL-generated) and JARs. Essentially, a “vanilla” UDF may only reference 
in-built libraries and classes defined within the scope of the UDF.)

> Add Simple Scala UDFs to Scala/JVM Client
> -
>
> Key: SPARK-42283
> URL: https://issues.apache.org/jira/browse/SPARK-42283
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> “Simple” here refers to UDFs that utilize no client-specific class files (e.g 
> REPL-generated) and JARs. Essentially, a “simple” UDF may only reference 
> in-built libraries and classes defined within the scope of the UDF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client

2023-02-01 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-42283:


 Summary: Add Simple Scala UDFs to Scala/JVM Client
 Key: SPARK-42283
 URL: https://issues.apache.org/jira/browse/SPARK-42283
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


“Simple” here refers to UDFs that utilize no client-specific class files (e.g 
REPL-generated) and JARs. Essentially, a “vanilla” UDF may only reference 
in-built libraries and classes defined within the scope of the UDF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42133) Add basic Dataset API methods to Spark Connect Scala Client

2023-01-20 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-42133:


 Summary: Add basic Dataset API methods to Spark Connect Scala 
Client
 Key: SPARK-42133
 URL: https://issues.apache.org/jira/browse/SPARK-42133
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Venkata Sai Akhil Gudesa


Add basic Dataframe API methods (such as project, filter, limit) as well as 
range() support in SparkSession.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41967) SBT unable to resolve particular packages from the imported maven build

2023-01-10 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-41967:


 Summary: SBT unable to resolve particular packages from the 
imported maven build
 Key: SPARK-41967
 URL: https://issues.apache.org/jira/browse/SPARK-41967
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.4.0
Reporter: Venkata Sai Akhil Gudesa


An SBT issue causes the resolution from the imported maven build for particular 
packages to not work for an unknown reason. This affects Spark-Connect-related 
projects (see 
[here|https://github.com/apache/spark/blob/6cae6aa5156655c79eb3f20292ccec6c479c3b1b/project/SparkBuild.scala#L667-L668]
 and 
[here|https://github.com/apache/spark/blob/6cae6aa5156655c79eb3f20292ccec6c479c3b1b/project/SparkBuild.scala#L902-L904]
 for example) by forcing duplicate deps.

The pom build works fine when removing the affected dep (like guava for 
example) but the sbt build then fails. Thus, we are forced to explicitly 
mention the versions of the affected packages so that SBT can then parse the 
version(s) to manually include them (and they're also added as a dep in maven 
to ensure version consistency with sbt)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41917) Support SSL and Auth token in connection channel for JVM/Scala Client

2023-01-05 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-41917:


 Summary: Support SSL and Auth token in connection channel for 
JVM/Scala Client
 Key: SPARK-41917
 URL: https://issues.apache.org/jira/browse/SPARK-41917
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Venkata Sai Akhil Gudesa






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41822) Setup Scala/JVM Client Connection

2023-01-02 Thread Venkata Sai Akhil Gudesa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Sai Akhil Gudesa updated SPARK-41822:
-
Summary: Setup Scala/JVM Client Connection  (was: Setup Scala Client 
Connection)

> Setup Scala/JVM Client Connection
> -
>
> Key: SPARK-41822
> URL: https://issues.apache.org/jira/browse/SPARK-41822
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> Set up the gRPC connection for the Scala/JVM client to enable communication 
> with the Spark Connect server. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41822) Setup Scala Client Connection

2023-01-02 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-41822:


 Summary: Setup Scala Client Connection
 Key: SPARK-41822
 URL: https://issues.apache.org/jira/browse/SPARK-41822
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Venkata Sai Akhil Gudesa


Set up the gRPC connection for the Scala/JVM client to enable communication 
with the Spark Connect server. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41534) Setup initial client module for Spark Connect

2022-12-15 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-41534:


 Summary: Setup initial client module for Spark Connect
 Key: SPARK-41534
 URL: https://issues.apache.org/jira/browse/SPARK-41534
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.3.2
Reporter: Venkata Sai Akhil Gudesa


In https://issues.apache.org/jira/browse/SPARK-41369, the connect module was 
split into server/common to extract dependencies for the Scala client. 
With this extraction completed, the client module can be setup in preparation 
for the Scala client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41369) Refactor connect directory structure

2022-12-02 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-41369:


 Summary: Refactor connect directory structure
 Key: SPARK-41369
 URL: https://issues.apache.org/jira/browse/SPARK-41369
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.3.2, 3.4.0
Reporter: Venkata Sai Akhil Gudesa


Currently, `spark/connector/connect/` is a single module that contains both the 
"server"/service as well as the protobuf definitions.


However, this module can be split into multiple modules - "server" and 
"common". This brings the advantage of separating out the protobuf generation 
from the core "server" module for efficient reuse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36677) NestedColumnAliasing pushes down aggregate functions into projections

2021-09-06 Thread Venkata Sai Akhil Gudesa (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410734#comment-17410734
 ] 

Venkata Sai Akhil Gudesa commented on SPARK-36677:
--

I have a fix for this and have a PR on the way.

> NestedColumnAliasing pushes down aggregate functions into projections
> -
>
> Key: SPARK-36677
> URL: https://issues.apache.org/jira/browse/SPARK-36677
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> Aggregate functions are being pushed down into projections when nested 
> columns are accessed causing the following error:
> {code:java}
> Caused by: UnsupportedOperationException: Cannot generate code for 
> expression: ...{code}
> Reproduction:
>  
> {code:java}
> spark.sql("drop table if exists test_aggregates")
> spark.sql("create table if not exists test_aggregates(a STRUCT string>, d: int>, b string)")
> val df = sql("select max(a).c.e from (select a, b from test_aggregates) group 
> by b")
> println(df.queryExecution.optimizedPlan)
> {code}
>  
> The output of the above code:
> {noformat}
> 'Aggregate [b#1], [_extract_e#5 AS max(a).c.e#3]
> +- 'Project [max(a#0).c.e AS _extract_e#5, b#1]
>+- Relation default.test_aggregates[a#0,b#1] parquet
> {noformat}
> The error message when the dataframe is executed:
> {noformat}
> java.lang.UnsupportedOperationException: Cannot generate code for expression: 
> max(input[0, struct,d:int>, true])
>   at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.cannotGenerateCodeForExpressionError(QueryExecutionErrors.scala:83)
>   at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode(Expression.scala:312)
>   at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode$(Expression.scala:311)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.doGenCode(interfaces.scala:99)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:151)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:146)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.nullSafeCodeGen(Expression.scala:525)
>   at 
> org.apache.spark.sql.catalyst.expressions.GetStructField.doGenCode(complexTypeExtractors.scala:126)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:151)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:146)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.nullSafeCodeGen(Expression.scala:525)
>   at 
> org.apache.spark.sql.catalyst.expressions.GetStructField.doGenCode(complexTypeExtractors.scala:126)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:151)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:146)
>   at 
> org.apache.spark.sql.catalyst.expressions.Alias.genCode(namedExpressions.scala:171)
>   at 
> org.apache.spark.sql.execution.ProjectExec.$anonfun$doConsume$2(basicPhysicalOperators.scala:73)
>   at scala.collection.immutable.List.map(List.scala:293)
>   at 
> org.apache.spark.sql.execution.ProjectExec.$anonfun$doConsume$1(basicPhysicalOperators.scala:73)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.withSubExprEliminationExprs(CodeGenerator.scala:1039)
>   at 
> org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:73)
>   at 
> org.apache.spark.sql.execution.CodegenSupport.consume(WholeStageCodegenExec.scala:195)
>   at 
> org.apache.spark.sql.execution.CodegenSupport.consume$(WholeStageCodegenExec.scala:150)
>   at 
> org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:497)
>   at 
> org.apache.spark.sql.execution.InputRDDCodegen.doProduce(WholeStageCodegenExec.scala:484)
>   at 
> org.apache.spark.sql.execution.InputRDDCodegen.doProduce$(WholeStageCodegenExec.scala:457)
>   at 
> org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:497)
>   at 
> org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:96)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219)
>   at 
> org.apache.spark.sql.execution.CodegenSupport.produce(WholeStageCodegenExec.scala:91)
>   at 

[jira] [Created] (SPARK-36677) NestedColumnAliasing pushes down aggregate functions into projections

2021-09-06 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-36677:


 Summary: NestedColumnAliasing pushes down aggregate functions into 
projections
 Key: SPARK-36677
 URL: https://issues.apache.org/jira/browse/SPARK-36677
 Project: Spark
  Issue Type: Bug
  Components: Optimizer, SQL
Affects Versions: 3.2.0, 3.3.0
Reporter: Venkata Sai Akhil Gudesa


Aggregate functions are being pushed down into projections when nested columns 
are accessed causing the following error:
{code:java}
Caused by: UnsupportedOperationException: Cannot generate code for expression: 
...{code}
Reproduction:

 
{code:java}
spark.sql("drop table if exists test_aggregates")
spark.sql("create table if not exists test_aggregates(a STRUCT, d: int>, b string)")
val df = sql("select max(a).c.e from (select a, b from test_aggregates) group 
by b")
println(df.queryExecution.optimizedPlan)
{code}
 

The output of the above code:
{noformat}
'Aggregate [b#1], [_extract_e#5 AS max(a).c.e#3]
+- 'Project [max(a#0).c.e AS _extract_e#5, b#1]
   +- Relation default.test_aggregates[a#0,b#1] parquet
{noformat}
The error message when the dataframe is executed:
{noformat}
java.lang.UnsupportedOperationException: Cannot generate code for expression: 
max(input[0, struct,d:int>, true])
  at 
org.apache.spark.sql.errors.QueryExecutionErrors$.cannotGenerateCodeForExpressionError(QueryExecutionErrors.scala:83)
  at 
org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode(Expression.scala:312)
  at 
org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode$(Expression.scala:311)
  at 
org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.doGenCode(interfaces.scala:99)
  at 
org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:151)
  at scala.Option.getOrElse(Option.scala:189)
  at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:146)
  at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.nullSafeCodeGen(Expression.scala:525)
  at 
org.apache.spark.sql.catalyst.expressions.GetStructField.doGenCode(complexTypeExtractors.scala:126)
  at 
org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:151)
  at scala.Option.getOrElse(Option.scala:189)
  at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:146)
  at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.nullSafeCodeGen(Expression.scala:525)
  at 
org.apache.spark.sql.catalyst.expressions.GetStructField.doGenCode(complexTypeExtractors.scala:126)
  at 
org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:151)
  at scala.Option.getOrElse(Option.scala:189)
  at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:146)
  at 
org.apache.spark.sql.catalyst.expressions.Alias.genCode(namedExpressions.scala:171)
  at 
org.apache.spark.sql.execution.ProjectExec.$anonfun$doConsume$2(basicPhysicalOperators.scala:73)
  at scala.collection.immutable.List.map(List.scala:293)
  at 
org.apache.spark.sql.execution.ProjectExec.$anonfun$doConsume$1(basicPhysicalOperators.scala:73)
  at 
org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.withSubExprEliminationExprs(CodeGenerator.scala:1039)
  at 
org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:73)
  at 
org.apache.spark.sql.execution.CodegenSupport.consume(WholeStageCodegenExec.scala:195)
  at 
org.apache.spark.sql.execution.CodegenSupport.consume$(WholeStageCodegenExec.scala:150)
  at 
org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:497)
  at 
org.apache.spark.sql.execution.InputRDDCodegen.doProduce(WholeStageCodegenExec.scala:484)
  at 
org.apache.spark.sql.execution.InputRDDCodegen.doProduce$(WholeStageCodegenExec.scala:457)
  at 
org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:497)
  at 
org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:96)
  at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219)
  at 
org.apache.spark.sql.execution.CodegenSupport.produce(WholeStageCodegenExec.scala:91)
  at 
org.apache.spark.sql.execution.CodegenSupport.produce$(WholeStageCodegenExec.scala:91)
  at 
org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:497)
  at 
org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:54)
  at 
org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:96)
  at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222)
  at