[jira] [Created] (SPARK-47341) Replace commands with relations in a few tests in SparkConnectClientSuite
Venkata Sai Akhil Gudesa created SPARK-47341: Summary: Replace commands with relations in a few tests in SparkConnectClientSuite Key: SPARK-47341 URL: https://issues.apache.org/jira/browse/SPARK-47341 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Venkata Sai Akhil Gudesa A few [tests|https://github.com/apache/spark/blob/master/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala#L481-L527] in SparkConnectClientSuite attempt to test the result collection of a reattachable execution through the use of a SQL command. The SQL command, on a real server, is not executed eagerly (since it is a select command) and thus, is not entirely accurate. The test itself is non-problematic since a dummy server with dummy responses is used but a small improvement here would be to construct a relation rather than a command. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46660) ReattachExecute requests do not refresh aliveness of SessionHolder
Venkata Sai Akhil Gudesa created SPARK-46660: Summary: ReattachExecute requests do not refresh aliveness of SessionHolder Key: SPARK-46660 URL: https://issues.apache.org/jira/browse/SPARK-46660 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Venkata Sai Akhil Gudesa In the first executePlan request, creating the {{ExecuteHolder}} triggers {{getOrCreateIsolatedSession}} which refreshes the aliveness of {{{}SessionHolder{}}}. However in {{ReattachExecute}} , we fetch the {{ExecuteHolder}} directly without going through the {{SessionHolder}} (and hence making it seem like the {{SessionHolder}} is idle). This would result in long-running queries (which do not send release execute requests since that refreshes aliveness) failing because the {{SessionHolder}} would expire during active query execution. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46202) Expose new ArtifactManager API to support in-memory artifacts and sub-directory structure
Venkata Sai Akhil Gudesa created SPARK-46202: Summary: Expose new ArtifactManager API to support in-memory artifacts and sub-directory structure Key: SPARK-46202 URL: https://issues.apache.org/jira/browse/SPARK-46202 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Venkata Sai Akhil Gudesa Currently, without the use of a REPL/Class finder, there is no API to support adding in-memory artifacts to the remote Spark Connect session. Further, there is currently no API to preserve/impose a sub-directory structure on the files we send over the wire. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45155) Add API Docs
Venkata Sai Akhil Gudesa created SPARK-45155: Summary: Add API Docs Key: SPARK-45155 URL: https://issues.apache.org/jira/browse/SPARK-45155 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa Similar to the pages listed here for regular Spark - [https://spark.apache.org/docs/latest/api/scala/org/apache/spark/index.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44851) Update SparkConnectClientParser usage() method to match implementation
[ https://issues.apache.org/jira/browse/SPARK-44851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763343#comment-17763343 ] Venkata Sai Akhil Gudesa commented on SPARK-44851: -- [~harry] Go ahead :) > Update SparkConnectClientParser usage() method to match implementation > -- > > Key: SPARK-44851 > URL: https://issues.apache.org/jira/browse/SPARK-44851 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > Several missing options as well as inconsistent ones (`enable-ssl` vs > `use_ssl`) > https://github.com/apache/spark/blob/7af4e358f3f4902cc9601e56c2662b8921a925d6/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClientParser.scala#L31-L42 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44867) Refactor Spark Connect Docs to incorporate Scala setup
Venkata Sai Akhil Gudesa created SPARK-44867: Summary: Refactor Spark Connect Docs to incorporate Scala setup Key: SPARK-44867 URL: https://issues.apache.org/jira/browse/SPARK-44867 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa The current Spark Connect [overview|https://spark.apache.org/docs/latest/spark-connect-overview.html] does not include instructions to setup the Scala REPL as well using the Scala client in applications. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44851) Update SparkConnectClientParser usage() method to match implementation
[ https://issues.apache.org/jira/browse/SPARK-44851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Sai Akhil Gudesa updated SPARK-44851: - Epic Link: SPARK-42554 > Update SparkConnectClientParser usage() method to match implementation > -- > > Key: SPARK-44851 > URL: https://issues.apache.org/jira/browse/SPARK-44851 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > Several missing options as well as inconsistent ones (`enable-ssl` vs > `use_ssl`) > https://github.com/apache/spark/blob/7af4e358f3f4902cc9601e56c2662b8921a925d6/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClientParser.scala#L31-L42 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44851) Update SparkConnectClientParser usage() method to match implementation
Venkata Sai Akhil Gudesa created SPARK-44851: Summary: Update SparkConnectClientParser usage() method to match implementation Key: SPARK-44851 URL: https://issues.apache.org/jira/browse/SPARK-44851 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa Several missing options as well as inconsistent ones (`enable-ssl` vs `use_ssl`) https://github.com/apache/spark/blob/7af4e358f3f4902cc9601e56c2662b8921a925d6/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClientParser.scala#L31-L42 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44829) Expose uploadAllArtifactClasses in ArtifactManager to `sql` package
Venkata Sai Akhil Gudesa created SPARK-44829: Summary: Expose uploadAllArtifactClasses in ArtifactManager to `sql` package Key: SPARK-44829 URL: https://issues.apache.org/jira/browse/SPARK-44829 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.1 Reporter: Venkata Sai Akhil Gudesa Currently, the [uploadAllClassFilesArtifacts|https://github.com/apache/spark/blob/master/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala#L144-L146] method is private[client] but this limits the ability of non-client features to use UDFs (which require the class files). Currently, this is not an issue because classfiles are uploaded in all analyze/execute operations. Any new code paths would suffer from CNFE if they are not able to call this method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44657) Incorrect limit handling and config parsing in Arrow collect
Venkata Sai Akhil Gudesa created SPARK-44657: Summary: Incorrect limit handling and config parsing in Arrow collect Key: SPARK-44657 URL: https://issues.apache.org/jira/browse/SPARK-44657 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.4.1, 3.4.0, 3.4.2, 3.5.0 Reporter: Venkata Sai Akhil Gudesa In the arrow writer [code|https://github.com/apache/spark/blob/6161bf44f40f8146ea4c115c788fd4eaeb128769/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala#L154-L163] , the conditions don’t seem to hold what the documentation says regd "{_}maxBatchSize and maxRecordsPerBatch, respect whatever smaller"{_} since it seems to actually respect the conf which is "larger" (i.e less restrictive) due to _||_ operator. Further, when the `{_}CONNECT_GRPC_ARROW_MAX_BATCH_SIZE{_}` conf is read, the value is not converted to bytes from Mib ([example|https://github.com/apache/spark/blob/3e5203c64c06cc8a8560dfa0fb6f52e74589b583/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/SparkConnectPlanExecution.scala#L103]). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44584) AddArtifactsRequest and ArtifactStatusesRequest do not set client_type information
Venkata Sai Akhil Gudesa created SPARK-44584: Summary: AddArtifactsRequest and ArtifactStatusesRequest do not set client_type information Key: SPARK-44584 URL: https://issues.apache.org/jira/browse/SPARK-44584 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44476) JobArtifactSet is populated with all artifacts if it is not associated with an artifact
Venkata Sai Akhil Gudesa created SPARK-44476: Summary: JobArtifactSet is populated with all artifacts if it is not associated with an artifact Key: SPARK-44476 URL: https://issues.apache.org/jira/browse/SPARK-44476 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0, 4.0.0 Reporter: Venkata Sai Akhil Gudesa Fix For: 3.5.0, 4.0.0 Consider each artifact type - files/jars/archives. For each artifact type, the following bug exists: # Initialise a `JobArtifactState` with no artifacts added to it. # Create a `JobArtifactSet` from the `JobArtifactState`. # Add an artifact with the same active `JobArtifactState`. # Create another `JobArtifactSet` In the current behaviour, the set created in step 2 contains all the artifacts (through `sc.allAddedFiles` for example) while step 3 contains only the single artifact added in step 3. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44388) Using an updated instance of ScalarUserDefinedFunction causes protobuf cast failures on server
[ https://issues.apache.org/jira/browse/SPARK-44388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Sai Akhil Gudesa updated SPARK-44388: - Epic Link: SPARK-42554 > Using an updated instance of ScalarUserDefinedFunction causes protobuf cast > failures on server > -- > > Key: SPARK-44388 > URL: https://issues.apache.org/jira/browse/SPARK-44388 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > When running the following code- > {code:java} > class A(x: Int) { def get = x * 7 } > val myUdf = udf((x: Int) => new A(x).get) > val modifiedUdf = myUdf.withName("myUdf") > spark.range(5).select(modifiedUdf(col("id"))).as[Int].collect(){code} > which modifies the original myUdf instance through the `withName` method > causes the following error to occur during execution: > {noformat} > java.lang.ClassCastException: org.apache.spark.connect.proto.ScalarScalaUDF > cannot be cast to com.google.protobuf.MessageLite > at > com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:1462) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2285) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) > at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) > at org.apache.spark.util.Utils$.deserialize(Utils.scala:148){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44388) Using an updated instance of ScalarUserDefinedFunction causes protobuf cast failures on server
Venkata Sai Akhil Gudesa created SPARK-44388: Summary: Using an updated instance of ScalarUserDefinedFunction causes protobuf cast failures on server Key: SPARK-44388 URL: https://issues.apache.org/jira/browse/SPARK-44388 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa When running the following code- {code:java} class A(x: Int) { def get = x * 7 } val myUdf = udf((x: Int) => new A(x).get) val modifiedUdf = myUdf.withName("myUdf") spark.range(5).select(modifiedUdf(col("id"))).as[Int].collect(){code} which modifies the original myUdf instance through the `withName` method causes the following error to occur during execution: {noformat} java.lang.ClassCastException: org.apache.spark.connect.proto.ScalarScalaUDF cannot be cast to com.google.protobuf.MessageLite at com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:1462) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2285) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) at org.apache.spark.util.Utils$.deserialize(Utils.scala:148){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44300) SparkConnectArtifactManager#cleanUpResources deletes all artifacts instead of session-specific artifacts
[ https://issues.apache.org/jira/browse/SPARK-44300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Sai Akhil Gudesa updated SPARK-44300: - Epic Link: SPARK-42554 > SparkConnectArtifactManager#cleanUpResources deletes all artifacts instead of > session-specific artifacts > > > Key: SPARK-44300 > URL: https://issues.apache.org/jira/browse/SPARK-44300 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > _SparkConnectArtifactManager#cleanUpResources_ deletes all resources instead > of session-specific resources. > This method is triggered through the _userSessionMapping_ cache when an entry > is removed > ([code|https://github.com/apache/spark/blob/b02ea4cd370ce6a066561dfde9d517ea70805a2b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala#L304]). > Once triggered, further artifact transfers and existing artifact usage would > fail. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44300) SparkConnectArtifactManager#cleanUpResources deletes all artifacts instead of session-specific artifacts
Venkata Sai Akhil Gudesa created SPARK-44300: Summary: SparkConnectArtifactManager#cleanUpResources deletes all artifacts instead of session-specific artifacts Key: SPARK-44300 URL: https://issues.apache.org/jira/browse/SPARK-44300 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa _SparkConnectArtifactManager#cleanUpResources_ deletes all resources instead of session-specific resources. This method is triggered through the _userSessionMapping_ cache when an entry is removed ([code|https://github.com/apache/spark/blob/b02ea4cd370ce6a066561dfde9d517ea70805a2b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala#L304]). Once triggered, further artifact transfers and existing artifact usage would fail. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44293) Task failures during custom JAR fetch in executors
Venkata Sai Akhil Gudesa created SPARK-44293: Summary: Task failures during custom JAR fetch in executors Key: SPARK-44293 URL: https://issues.apache.org/jira/browse/SPARK-44293 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa When attempting to use a custom JAR in a Spark Connect session, the tasks fail due to the following error: {code:java} 23/07/03 17:00:15 INFO Executor: Fetching spark://ip-10-110-22-170.us-west-2.compute.internal:43743/artifacts/d9548b02-ff3b-4278-ab52-aef5d1fc724e//home/venkata.gudesa/spark/artifacts/spark-d6141194-c487-40fd-ba40-444d922808ea/d9548b02-ff3b-4278-ab52-aef5d1fc724e/jars/TestHelloV2.jar with timestamp 0 23/07/03 17:00:15 ERROR Executor: Exception in task 6.0 in stage 4.0 (TID 55) java.lang.RuntimeException: Stream '/artifacts/d9548b02-ff3b-4278-ab52-aef5d1fc724e//home/venkata.gudesa/spark/artifacts/spark-d6141194-c487-40fd-ba40-444d922808ea/d9548b02-ff3b-4278-ab52-aef5d1fc724e/jars/TestHelloV2.jar' was not found. at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:260) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:142) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) {code} *Root Cause: The URI for the JAR file is invalid.* (Instead of the URI being in the form of {_}/artifacts//jars/{_}, its instead \{_}/artifacts//{_}) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44246) Follow-ups for Jar/Classfile Isolation
Venkata Sai Akhil Gudesa created SPARK-44246: Summary: Follow-ups for Jar/Classfile Isolation Key: SPARK-44246 URL: https://issues.apache.org/jira/browse/SPARK-44246 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa Related to https://issues.apache.org/jira/browse/SPARK-44146 ([PR|https://github.com/apache/spark/pull/41701]), this ticket is for the general follow-ups mentioned by [~hvanhovell] [here.|https://github.com/apache/spark/pull/41701#issuecomment-1608577372] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44146) Isolate Spark Connect session/artifacts
[ https://issues.apache.org/jira/browse/SPARK-44146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Sai Akhil Gudesa updated SPARK-44146: - Epic Link: SPARK-42554 > Isolate Spark Connect session/artifacts > --- > > Key: SPARK-44146 > URL: https://issues.apache.org/jira/browse/SPARK-44146 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Assignee: Venkata Sai Akhil Gudesa >Priority: Major > Fix For: 3.5.0 > > > Following up on https://issues.apache.org/jira/browse/SPARK-44078, with the > support for classloader isolation implemented, we can now utilise it to > isolate Spark Connect sessions from each other. Here, isolation refers to > isolation of artifacts from each Spark Connect session which enables us to > have multi-user UDFs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44146) Isolate Spark Connect session/artifacts
Venkata Sai Akhil Gudesa created SPARK-44146: Summary: Isolate Spark Connect session/artifacts Key: SPARK-44146 URL: https://issues.apache.org/jira/browse/SPARK-44146 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa Following up on https://issues.apache.org/jira/browse/SPARK-44078, with the support for classloader isolation implemented, we can now utilise it to isolate Spark Connect sessions from each other. Here, isolation refers to isolation of artifacts from each Spark Connect session which enables us to have multi-user UDFs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44078) Add support for classloader/resource isolation
Venkata Sai Akhil Gudesa created SPARK-44078: Summary: Add support for classloader/resource isolation Key: SPARK-44078 URL: https://issues.apache.org/jira/browse/SPARK-44078 Project: Spark Issue Type: New Feature Components: Connect, Spark Core Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa A current limitation of Scala UDFs is that a Spark cluster would only be able to support a single REPL at a time due to the fact that classloaders of different Spark Sessions (and therefore, Spark Connect sessions) aren't isolated from each other. Without isolation, REPL-generated classfiles as well as user-added JARs may conflict if there are multiple users of the cluster. Thus, we need a mechanism to support isolated sessions (i.e isolated resources/classloader) so that each REPL user does not conflict with other users on the same cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44016) Artifacts with name as an absolute path may overwrite other files
Venkata Sai Akhil Gudesa created SPARK-44016: Summary: Artifacts with name as an absolute path may overwrite other files Key: SPARK-44016 URL: https://issues.apache.org/jira/browse/SPARK-44016 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa Fix For: 3.5.0 In `SparkConnectAddArtifactsHandler`, an artifact being moved to a staging location may overwrite another file when the `name`/`path` of the artifact is an `absolute` path. This happens when the [stagedPath|https://github.com/apache/spark/blob/master/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectAddArtifactsHandler.scala#L172] is being computed with the help of the `.resolve(...)` method where the `resolve` method returns the `other` path (in this case, the name of the artifact) if the `other` path is an absolute path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43998) Add support for UDAF
Venkata Sai Akhil Gudesa created SPARK-43998: Summary: Add support for UDAF Key: SPARK-43998 URL: https://issues.apache.org/jira/browse/SPARK-43998 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa Reference: [https://github.com/apache/spark/blob/4547c9c90e3d35436afe89b10c794050ed8d04d7/sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala#L136-L175] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43995) Implement UDFRegistration
[ https://issues.apache.org/jira/browse/SPARK-43995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Sai Akhil Gudesa updated SPARK-43995: - Description: Reference file - [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala] API to be implemented: * {noformat} def register(name: String, udf: UserDefinedFunction): UserDefinedFunction{noformat} * ** [Reference|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L112-L123] * {noformat} def register[RT: TypeTag](name: String, func: Function0[RT]): UserDefinedFunction{noformat} * ** From [0 to 22 arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L125-L642] * {noformat} def register(name: String, f: UDF0[_], returnType: DataType): Unit{noformat} * ** From [0 to 22 arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L735-L1076] We currently do not support UDAFs so the relevant UDAF APIs may be skipped as well as the python/pyspark (in the context of the scala client) related APIs. was: Reference file - [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala] API to be implemented: * def register(name: String, udf: UserDefinedFunction): UserDefinedFunction ** [Reference|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L112-L123] * def register[RT: TypeTag](name: String, func: Function0[RT]): UserDefinedFunction ** From [0 to 22 arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L125-L642] * def register(name: String, f: UDF0[_], returnType: DataType): Unit ** From [0 to 22 arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L735-L1076] We currently do not support UDAFs so the relevant UDAF APIs may be skipped as well as the python/pyspark (in the context of the scala client) related APIs. > Implement UDFRegistration > - > > Key: SPARK-43995 > URL: https://issues.apache.org/jira/browse/SPARK-43995 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > Reference file - > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala] > API to be implemented: > * > {noformat} > def register(name: String, udf: UserDefinedFunction): > UserDefinedFunction{noformat} > * > ** > [Reference|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L112-L123] > * > {noformat} > def register[RT: TypeTag](name: String, func: Function0[RT]): > UserDefinedFunction{noformat} > * > ** From [0 to 22 > arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L125-L642] > * > {noformat} > def register(name: String, f: UDF0[_], returnType: DataType): Unit{noformat} > * > ** From [0 to 22 > arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L735-L1076] > > We currently do not support UDAFs so the relevant UDAF APIs may be skipped as > well as the python/pyspark (in the context of the scala client) related APIs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43996) Implement call_udf function
[ https://issues.apache.org/jira/browse/SPARK-43996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Sai Akhil Gudesa updated SPARK-43996: - Description: Reference:[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L5824-L5842] API: {noformat} def call_udf(udfName: String, cols: Column*): Column{noformat} was: Reference:[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L5824-L5842] API: def call_udf(udfName: String, cols: Column*): Column > Implement call_udf function > --- > > Key: SPARK-43996 > URL: https://issues.apache.org/jira/browse/SPARK-43996 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > Reference:[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L5824-L5842] > API: > {noformat} > def call_udf(udfName: String, cols: Column*): Column{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43997) Add support for Java UDFs
[ https://issues.apache.org/jira/browse/SPARK-43997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Sai Akhil Gudesa updated SPARK-43997: - Description: API: {code:java} def udf(f: UDF0[_], returnType: DataType): UserDefinedFunction{code} For 0 - 10 arguments, [reference|https://github.com/apache/spark/blob/747db6675da86e79d04e2fcc531b3b72b22ebf04/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L5624-L5780] was: API: def udf(f: UDF0[_], returnType: DataType): UserDefinedFunction For 0 - 10 arguments, [reference|https://github.com/apache/spark/blob/747db6675da86e79d04e2fcc531b3b72b22ebf04/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L5624-L5780] > Add support for Java UDFs > - > > Key: SPARK-43997 > URL: https://issues.apache.org/jira/browse/SPARK-43997 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > API: > {code:java} > def udf(f: UDF0[_], returnType: DataType): UserDefinedFunction{code} > For 0 - 10 arguments, > [reference|https://github.com/apache/spark/blob/747db6675da86e79d04e2fcc531b3b72b22ebf04/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L5624-L5780] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43997) Add support for Java UDFs
Venkata Sai Akhil Gudesa created SPARK-43997: Summary: Add support for Java UDFs Key: SPARK-43997 URL: https://issues.apache.org/jira/browse/SPARK-43997 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa API: def udf(f: UDF0[_], returnType: DataType): UserDefinedFunction For 0 - 10 arguments, [reference|https://github.com/apache/spark/blob/747db6675da86e79d04e2fcc531b3b72b22ebf04/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L5624-L5780] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43996) Implement call_udf function
Venkata Sai Akhil Gudesa created SPARK-43996: Summary: Implement call_udf function Key: SPARK-43996 URL: https://issues.apache.org/jira/browse/SPARK-43996 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa Reference:[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L5824-L5842] API: def call_udf(udfName: String, cols: Column*): Column -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43995) Implement UDFRegistration
Venkata Sai Akhil Gudesa created SPARK-43995: Summary: Implement UDFRegistration Key: SPARK-43995 URL: https://issues.apache.org/jira/browse/SPARK-43995 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa Reference file - [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala] API to be implemented: * def register(name: String, udf: UserDefinedFunction): UserDefinedFunction ** [Reference|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L112-L123] * def register[RT: TypeTag](name: String, func: Function0[RT]): UserDefinedFunction ** From [0 to 22 arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L125-L642] * def register(name: String, f: UDF0[_], returnType: DataType): Unit ** From [0 to 22 arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L735-L1076] We currently do not support UDAFs so the relevant UDAF APIs may be skipped as well as the python/pyspark (in the context of the scala client) related APIs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43285) ReplE2ESuite consistently fails with JDK 17
[ https://issues.apache.org/jira/browse/SPARK-43285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Sai Akhil Gudesa updated SPARK-43285: - Description: [[Comment|https://github.com/apache/spark/pull/40675#discussion_r1174696470] from [~gurwls223]] This test consistently fails with JDK 17: {code:java} [info] ReplE2ESuite: [info] - Simple query *** FAILED *** (10 seconds, 4 milliseconds) [info] java.lang.RuntimeException: REPL Timed out while running command: [info] spark.sql("select 1").collect() [info] [info] Console output: [info] Error output: Compiling (synthetic)/ammonite/predef/ArgsPredef.sc [info] at org.apache.spark.sql.application.ReplE2ESuite.runCommandsInShell(ReplE2ESuite.scala:87) [info] at org.apache.spark.sql.application.ReplE2ESuite.$anonfun$new$1(ReplE2ESuite.scala:102) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) [info] at org.scalatest.TestSuite.withFixture(TestSuite.scala:196) [info] at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195) [info] at org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564) [info] at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224){code} [https://github.com/apache/spark/actions/runs/4780630672/jobs/8498505928#step:9:4647] [https://github.com/apache/spark/actions/runs/4774942961/jobs/8488946907] [https://github.com/apache/spark/actions/runs/4769162286/jobs/8479293802] [https://github.com/apache/spark/actions/runs/4759278349/jobs/8458399201] [https://github.com/apache/spark/actions/runs/4748319019/jobs/8434392414] was: [[Comment|https://github.com/apache/spark/pull/40675#discussion_r1174696470] from [~gurwls223]] This test consistently fails with JDK 17: [info] ReplE2ESuite: [info] - Simple query *** FAILED *** (10 seconds, 4 milliseconds) [info] java.lang.RuntimeException: REPL Timed out while running command: [info] spark.sql("select 1").collect() [info] [info] Console output: [info] Error output: Compiling (synthetic)/ammonite/predef/ArgsPredef.sc [info] at org.apache.spark.sql.application.ReplE2ESuite.runCommandsInShell(ReplE2ESuite.scala:87) [info] at org.apache.spark.sql.application.ReplE2ESuite.$anonfun$new$1(ReplE2ESuite.scala:102) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) [info] at org.scalatest.TestSuite.withFixture(TestSuite.scala:196) [info] at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195) [info] at org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564) [info] at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) [https://github.com/apache/spark/actions/runs/4780630672/jobs/8498505928#step:9:4647] [https://github.com/apache/spark/actions/runs/4774942961/jobs/8488946907] [https://github.com/apache/spark/actions/runs/4769162286/jobs/8479293802] [https://github.com/apache/spark/actions/runs/4759278349/jobs/8458399201] [https://github.com/apache/spark/actions/runs/4748319019/jobs/8434392414] > ReplE2ESuite consistently fails with JDK 17 > --- > > Key: SPARK-43285 > URL: https://issues.apache.org/jira/browse/SPARK-43285 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > [[Comment|https://github.com/apache/spark/pull/40675#discussion_r1174696470] > from [~gurwls223]] > This test consistently fails with JDK 17: > {code:java} > [info] ReplE2ESuite: > [info] - Simple query *** FAILED *** (10 seconds, 4 milliseconds) > [info] java.lang.RuntimeException: REPL Timed out while running command: > [info] spark.sql("select 1").collect() > [info] > [info] Console output: > [info] Error output: Compiling (synthetic)/ammonite/predef/ArgsPredef.sc > [info] at > org.apache.spark.sql.application.ReplE2ESuite.runCommandsInShell(ReplE2ESuite.scala:87) > [info] at >
[jira] [Created] (SPARK-43285) ReplE2ESuite consistently fails with JDK 17
Venkata Sai Akhil Gudesa created SPARK-43285: Summary: ReplE2ESuite consistently fails with JDK 17 Key: SPARK-43285 URL: https://issues.apache.org/jira/browse/SPARK-43285 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa [[Comment|https://github.com/apache/spark/pull/40675#discussion_r1174696470] from [~gurwls223]] This test consistently fails with JDK 17: [info] ReplE2ESuite: [info] - Simple query *** FAILED *** (10 seconds, 4 milliseconds) [info] java.lang.RuntimeException: REPL Timed out while running command: [info] spark.sql("select 1").collect() [info] [info] Console output: [info] Error output: Compiling (synthetic)/ammonite/predef/ArgsPredef.sc [info] at org.apache.spark.sql.application.ReplE2ESuite.runCommandsInShell(ReplE2ESuite.scala:87) [info] at org.apache.spark.sql.application.ReplE2ESuite.$anonfun$new$1(ReplE2ESuite.scala:102) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) [info] at org.scalatest.TestSuite.withFixture(TestSuite.scala:196) [info] at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195) [info] at org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564) [info] at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) [https://github.com/apache/spark/actions/runs/4780630672/jobs/8498505928#step:9:4647] [https://github.com/apache/spark/actions/runs/4774942961/jobs/8488946907] [https://github.com/apache/spark/actions/runs/4769162286/jobs/8479293802] [https://github.com/apache/spark/actions/runs/4759278349/jobs/8458399201] [https://github.com/apache/spark/actions/runs/4748319019/jobs/8434392414] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43227) Fix deserialisation issue when UDFs contain a lambda expression
Venkata Sai Akhil Gudesa created SPARK-43227: Summary: Fix deserialisation issue when UDFs contain a lambda expression Key: SPARK-43227 URL: https://issues.apache.org/jira/browse/SPARK-43227 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa The following code: {code:java} class A(x: Int) { def get = x * 20 + 5 } val dummyUdf = (x: Int) => new A(x).get val myUdf = udf(dummyUdf) spark.range(5).select(myUdf(col("id"))).as[Int].collect() {code} hits the following error: {noformat} io.grpc.StatusRuntimeException: INTERNAL: cannot assign instance of java.lang.invoke.SerializedLambda to field ammonite.$sess.cmd26$Helper.dummyUdf of type scala.Function1 in instance of ammonite.$sess.cmd26$Helper io.grpc.Status.asRuntimeException(Status.java:535) io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660) org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:62) org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:114) org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:131) org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2687) org.apache.spark.sql.Dataset.withResult(Dataset.scala:3088) org.apache.spark.sql.Dataset.collect(Dataset.scala:2686) ammonite.$sess.cmd28$Helper.(cmd28.sc:1) ammonite.$sess.cmd28$.(cmd28.sc:7) ammonite.$sess.cmd28$.(cmd28.sc){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43198) Fix "Could not initialise class ammonite..." error when using filter
[ https://issues.apache.org/jira/browse/SPARK-43198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Sai Akhil Gudesa updated SPARK-43198: - Description: When {code:java} spark.range(10).filter(n => n % 2 == 0).collectAsList()`{code} is run in the ammonite REPL (Spark Connect), the following error is thrown: {noformat} io.grpc.StatusRuntimeException: UNKNOWN: ammonite/repl/ReplBridge$ io.grpc.Status.asRuntimeException(Status.java:535) io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660) org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:62) org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:114) org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:131) org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2687) org.apache.spark.sql.Dataset.withResult(Dataset.scala:3088) org.apache.spark.sql.Dataset.collect(Dataset.scala:2686) org.apache.spark.sql.Dataset.collectAsList(Dataset.scala:2700) ammonite.$sess.cmd0$.(cmd0.sc:1) ammonite.$sess.cmd0$.(cmd0.sc){noformat} was: When `spark.range(10).filter(n => n % 2 == 0).collectAsList()` is run in the ammonite REPL (Spark Connect), the following error is thrown: ``` io.grpc.StatusRuntimeException: UNKNOWN: ammonite/repl/ReplBridge$ io.grpc.Status.asRuntimeException(Status.java:535) io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660) org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:62) org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:114) org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:131) org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2687) org.apache.spark.sql.Dataset.withResult(Dataset.scala:3088) org.apache.spark.sql.Dataset.collect(Dataset.scala:2686) org.apache.spark.sql.Dataset.collectAsList(Dataset.scala:2700) ammonite.$sess.cmd0$.(cmd0.sc:1) ammonite.$sess.cmd0$.(cmd0.sc) ``` > Fix "Could not initialise class ammonite..." error when using filter > > > Key: SPARK-43198 > URL: https://issues.apache.org/jira/browse/SPARK-43198 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > When > {code:java} > spark.range(10).filter(n => n % 2 == 0).collectAsList()`{code} > is run in the ammonite REPL (Spark Connect), the following error is thrown: > {noformat} > io.grpc.StatusRuntimeException: UNKNOWN: ammonite/repl/ReplBridge$ > io.grpc.Status.asRuntimeException(Status.java:535) > > io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660) > > org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:62) > > org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:114) > > org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:131) > org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2687) > org.apache.spark.sql.Dataset.withResult(Dataset.scala:3088) > org.apache.spark.sql.Dataset.collect(Dataset.scala:2686) > org.apache.spark.sql.Dataset.collectAsList(Dataset.scala:2700) > ammonite.$sess.cmd0$.(cmd0.sc:1) > ammonite.$sess.cmd0$.(cmd0.sc){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43198) Fix "Could not initialise class ammonite..." error when using filter
Venkata Sai Akhil Gudesa created SPARK-43198: Summary: Fix "Could not initialise class ammonite..." error when using filter Key: SPARK-43198 URL: https://issues.apache.org/jira/browse/SPARK-43198 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa When `spark.range(10).filter(n => n % 2 == 0).collectAsList()` is run in the ammonite REPL (Spark Connect), the following error is thrown: ``` io.grpc.StatusRuntimeException: UNKNOWN: ammonite/repl/ReplBridge$ io.grpc.Status.asRuntimeException(Status.java:535) io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660) org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:62) org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:114) org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:131) org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2687) org.apache.spark.sql.Dataset.withResult(Dataset.scala:3088) org.apache.spark.sql.Dataset.collect(Dataset.scala:2686) org.apache.spark.sql.Dataset.collectAsList(Dataset.scala:2700) ammonite.$sess.cmd0$.(cmd0.sc:1) ammonite.$sess.cmd0$.(cmd0.sc) ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42812) client_type is missing from AddArtifactsRequest proto message
Venkata Sai Akhil Gudesa created SPARK-42812: Summary: client_type is missing from AddArtifactsRequest proto message Key: SPARK-42812 URL: https://issues.apache.org/jira/browse/SPARK-42812 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.4.0 Reporter: Venkata Sai Akhil Gudesa The client_type is missing from AddArtifactsRequest proto message -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42748) Server-side Artifact Management
Venkata Sai Akhil Gudesa created SPARK-42748: Summary: Server-side Artifact Management Key: SPARK-42748 URL: https://issues.apache.org/jira/browse/SPARK-42748 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.4.0 Reporter: Venkata Sai Akhil Gudesa https://issues.apache.org/jira/browse/SPARK-42653 implements the client-side transfer of artifacts to the server but currently, the server does not process these requests. We need to implement a server-side management mechanism to handle storage of these artifacts on the driver as well as perform further processing (such as adding jars and moving class files to the right directories) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42658) Handle timeouts and CRC failures during artifact transfer
[ https://issues.apache.org/jira/browse/SPARK-42658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Sai Akhil Gudesa updated SPARK-42658: - Description: We would need a retry mechanism on the client side to handle CRC failures during artifact transfer because the server would discard data that fails CRC and hence, may lead to missing artifacts during UDF execution. We also require a timeout policy to prevent indefinitely waiting for the server reply. was:We would need a retry mechanism on the client side to handle CRC failures during artifact transfer. The server would discard data that fails CRC and hence, may lead to missing artifacts during UDF execution. > Handle timeouts and CRC failures during artifact transfer > - > > Key: SPARK-42658 > URL: https://issues.apache.org/jira/browse/SPARK-42658 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > We would need a retry mechanism on the client side to handle CRC failures > during artifact transfer because the server would discard data that fails CRC > and hence, may lead to missing artifacts during UDF execution. > We also require a timeout policy to prevent indefinitely waiting for the > server reply. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42658) Handle timeouts and CRC failures during artifact transfer
[ https://issues.apache.org/jira/browse/SPARK-42658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Sai Akhil Gudesa updated SPARK-42658: - Summary: Handle timeouts and CRC failures during artifact transfer (was: Handle CRC failures during artifact transfer) > Handle timeouts and CRC failures during artifact transfer > - > > Key: SPARK-42658 > URL: https://issues.apache.org/jira/browse/SPARK-42658 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > We would need a retry mechanism on the client side to handle CRC failures > during artifact transfer. The server would discard data that fails CRC and > hence, may lead to missing artifacts during UDF execution. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42658) Handle CRC failures during artifact transfer
Venkata Sai Akhil Gudesa created SPARK-42658: Summary: Handle CRC failures during artifact transfer Key: SPARK-42658 URL: https://issues.apache.org/jira/browse/SPARK-42658 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Venkata Sai Akhil Gudesa We would need a retry mechanism on the client side to handle CRC failures during artifact transfer. The server would discard data that fails CRC and hence, may lead to missing artifacts during UDF execution. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42657) Support to find and transfer client-side REPL classfiles to server as artifacts
Venkata Sai Akhil Gudesa created SPARK-42657: Summary: Support to find and transfer client-side REPL classfiles to server as artifacts Key: SPARK-42657 URL: https://issues.apache.org/jira/browse/SPARK-42657 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Venkata Sai Akhil Gudesa To run UDFs which are defined on the client side REPL, we require a mechanism that can find the local REPL classfiles and then utilise the mechanism from https://issues.apache.org/jira/browse/SPARK-42653 to transfer them to the server as artifacts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42657) Support to find and transfer client-side REPL classfiles to server as artifacts
[ https://issues.apache.org/jira/browse/SPARK-42657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Sai Akhil Gudesa updated SPARK-42657: - Epic Link: SPARK-42554 > Support to find and transfer client-side REPL classfiles to server as > artifacts > - > > Key: SPARK-42657 > URL: https://issues.apache.org/jira/browse/SPARK-42657 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > To run UDFs which are defined on the client side REPL, we require a mechanism > that can find the local REPL classfiles and then utilise the mechanism from > https://issues.apache.org/jira/browse/SPARK-42653 to transfer them to the > server as artifacts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42653) Artifact transfer from Scala/JVM client to Server
[ https://issues.apache.org/jira/browse/SPARK-42653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Sai Akhil Gudesa updated SPARK-42653: - Epic Link: SPARK-42554 > Artifact transfer from Scala/JVM client to Server > - > > Key: SPARK-42653 > URL: https://issues.apache.org/jira/browse/SPARK-42653 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > In the decoupled client-server architecture of Spark Connect, a remote client > may use a local JAR or a new class in their UDF that may not be present on > the server. To handle these cases of missing "artifacts", we need to > implement a mechanism to transfer artifacts from the client side over to the > server side as per the protocol defined in > https://github.com/apache/spark/pull/40147 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42653) Artifact transfer from Scala/JVM client to Server
Venkata Sai Akhil Gudesa created SPARK-42653: Summary: Artifact transfer from Scala/JVM client to Server Key: SPARK-42653 URL: https://issues.apache.org/jira/browse/SPARK-42653 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Venkata Sai Akhil Gudesa In the decoupled client-server architecture of Spark Connect, a remote client may use a local JAR or a new class in their UDF that may not be present on the server. To handle these cases of missing "artifacts", we need to implement a mechanism to transfer artifacts from the client side over to the server side as per the protocol defined in https://github.com/apache/spark/pull/40147 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42543) Specify protocol for UDF artifact transfer in JVM/Scala client
Venkata Sai Akhil Gudesa created SPARK-42543: Summary: Specify protocol for UDF artifact transfer in JVM/Scala client Key: SPARK-42543 URL: https://issues.apache.org/jira/browse/SPARK-42543 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Venkata Sai Akhil Gudesa An "artifact" is any file that may be used during the execution of a UDF. In the decoupled client-server architecture of Spark Connect, a remote client may use a local JAR or a new class in their UDF that may not be present on the server. To handle these cases of missing "artifacts", a protocol for artifact transfer is needed to move the required artifacts from the client side over to the server side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client
[ https://issues.apache.org/jira/browse/SPARK-42283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Sai Akhil Gudesa updated SPARK-42283: - Description: “Simple” here refers to UDFs that utilize no client-specific class files (e.g REPL-generated) and JARs. Essentially, a “simple” UDF may only reference in-built libraries and classes defined within the scope of the UDF. (was: “Simple” here refers to UDFs that utilize no client-specific class files (e.g REPL-generated) and JARs. Essentially, a “vanilla” UDF may only reference in-built libraries and classes defined within the scope of the UDF.) > Add Simple Scala UDFs to Scala/JVM Client > - > > Key: SPARK-42283 > URL: https://issues.apache.org/jira/browse/SPARK-42283 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > “Simple” here refers to UDFs that utilize no client-specific class files (e.g > REPL-generated) and JARs. Essentially, a “simple” UDF may only reference > in-built libraries and classes defined within the scope of the UDF. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client
Venkata Sai Akhil Gudesa created SPARK-42283: Summary: Add Simple Scala UDFs to Scala/JVM Client Key: SPARK-42283 URL: https://issues.apache.org/jira/browse/SPARK-42283 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa “Simple” here refers to UDFs that utilize no client-specific class files (e.g REPL-generated) and JARs. Essentially, a “vanilla” UDF may only reference in-built libraries and classes defined within the scope of the UDF. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42133) Add basic Dataset API methods to Spark Connect Scala Client
Venkata Sai Akhil Gudesa created SPARK-42133: Summary: Add basic Dataset API methods to Spark Connect Scala Client Key: SPARK-42133 URL: https://issues.apache.org/jira/browse/SPARK-42133 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Venkata Sai Akhil Gudesa Add basic Dataframe API methods (such as project, filter, limit) as well as range() support in SparkSession. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41967) SBT unable to resolve particular packages from the imported maven build
Venkata Sai Akhil Gudesa created SPARK-41967: Summary: SBT unable to resolve particular packages from the imported maven build Key: SPARK-41967 URL: https://issues.apache.org/jira/browse/SPARK-41967 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.4.0 Reporter: Venkata Sai Akhil Gudesa An SBT issue causes the resolution from the imported maven build for particular packages to not work for an unknown reason. This affects Spark-Connect-related projects (see [here|https://github.com/apache/spark/blob/6cae6aa5156655c79eb3f20292ccec6c479c3b1b/project/SparkBuild.scala#L667-L668] and [here|https://github.com/apache/spark/blob/6cae6aa5156655c79eb3f20292ccec6c479c3b1b/project/SparkBuild.scala#L902-L904] for example) by forcing duplicate deps. The pom build works fine when removing the affected dep (like guava for example) but the sbt build then fails. Thus, we are forced to explicitly mention the versions of the affected packages so that SBT can then parse the version(s) to manually include them (and they're also added as a dep in maven to ensure version consistency with sbt) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41917) Support SSL and Auth token in connection channel for JVM/Scala Client
Venkata Sai Akhil Gudesa created SPARK-41917: Summary: Support SSL and Auth token in connection channel for JVM/Scala Client Key: SPARK-41917 URL: https://issues.apache.org/jira/browse/SPARK-41917 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Venkata Sai Akhil Gudesa -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41822) Setup Scala/JVM Client Connection
[ https://issues.apache.org/jira/browse/SPARK-41822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Sai Akhil Gudesa updated SPARK-41822: - Summary: Setup Scala/JVM Client Connection (was: Setup Scala Client Connection) > Setup Scala/JVM Client Connection > - > > Key: SPARK-41822 > URL: https://issues.apache.org/jira/browse/SPARK-41822 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > Set up the gRPC connection for the Scala/JVM client to enable communication > with the Spark Connect server. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41822) Setup Scala Client Connection
Venkata Sai Akhil Gudesa created SPARK-41822: Summary: Setup Scala Client Connection Key: SPARK-41822 URL: https://issues.apache.org/jira/browse/SPARK-41822 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Venkata Sai Akhil Gudesa Set up the gRPC connection for the Scala/JVM client to enable communication with the Spark Connect server. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41534) Setup initial client module for Spark Connect
Venkata Sai Akhil Gudesa created SPARK-41534: Summary: Setup initial client module for Spark Connect Key: SPARK-41534 URL: https://issues.apache.org/jira/browse/SPARK-41534 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.3.2 Reporter: Venkata Sai Akhil Gudesa In https://issues.apache.org/jira/browse/SPARK-41369, the connect module was split into server/common to extract dependencies for the Scala client. With this extraction completed, the client module can be setup in preparation for the Scala client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41369) Refactor connect directory structure
Venkata Sai Akhil Gudesa created SPARK-41369: Summary: Refactor connect directory structure Key: SPARK-41369 URL: https://issues.apache.org/jira/browse/SPARK-41369 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.3.2, 3.4.0 Reporter: Venkata Sai Akhil Gudesa Currently, `spark/connector/connect/` is a single module that contains both the "server"/service as well as the protobuf definitions. However, this module can be split into multiple modules - "server" and "common". This brings the advantage of separating out the protobuf generation from the core "server" module for efficient reuse. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36677) NestedColumnAliasing pushes down aggregate functions into projections
[ https://issues.apache.org/jira/browse/SPARK-36677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410734#comment-17410734 ] Venkata Sai Akhil Gudesa commented on SPARK-36677: -- I have a fix for this and have a PR on the way. > NestedColumnAliasing pushes down aggregate functions into projections > - > > Key: SPARK-36677 > URL: https://issues.apache.org/jira/browse/SPARK-36677 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > Aggregate functions are being pushed down into projections when nested > columns are accessed causing the following error: > {code:java} > Caused by: UnsupportedOperationException: Cannot generate code for > expression: ...{code} > Reproduction: > > {code:java} > spark.sql("drop table if exists test_aggregates") > spark.sql("create table if not exists test_aggregates(a STRUCT string>, d: int>, b string)") > val df = sql("select max(a).c.e from (select a, b from test_aggregates) group > by b") > println(df.queryExecution.optimizedPlan) > {code} > > The output of the above code: > {noformat} > 'Aggregate [b#1], [_extract_e#5 AS max(a).c.e#3] > +- 'Project [max(a#0).c.e AS _extract_e#5, b#1] >+- Relation default.test_aggregates[a#0,b#1] parquet > {noformat} > The error message when the dataframe is executed: > {noformat} > java.lang.UnsupportedOperationException: Cannot generate code for expression: > max(input[0, struct,d:int>, true]) > at > org.apache.spark.sql.errors.QueryExecutionErrors$.cannotGenerateCodeForExpressionError(QueryExecutionErrors.scala:83) > at > org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode(Expression.scala:312) > at > org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode$(Expression.scala:311) > at > org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.doGenCode(interfaces.scala:99) > at > org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:151) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:146) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.nullSafeCodeGen(Expression.scala:525) > at > org.apache.spark.sql.catalyst.expressions.GetStructField.doGenCode(complexTypeExtractors.scala:126) > at > org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:151) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:146) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.nullSafeCodeGen(Expression.scala:525) > at > org.apache.spark.sql.catalyst.expressions.GetStructField.doGenCode(complexTypeExtractors.scala:126) > at > org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:151) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:146) > at > org.apache.spark.sql.catalyst.expressions.Alias.genCode(namedExpressions.scala:171) > at > org.apache.spark.sql.execution.ProjectExec.$anonfun$doConsume$2(basicPhysicalOperators.scala:73) > at scala.collection.immutable.List.map(List.scala:293) > at > org.apache.spark.sql.execution.ProjectExec.$anonfun$doConsume$1(basicPhysicalOperators.scala:73) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.withSubExprEliminationExprs(CodeGenerator.scala:1039) > at > org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:73) > at > org.apache.spark.sql.execution.CodegenSupport.consume(WholeStageCodegenExec.scala:195) > at > org.apache.spark.sql.execution.CodegenSupport.consume$(WholeStageCodegenExec.scala:150) > at > org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:497) > at > org.apache.spark.sql.execution.InputRDDCodegen.doProduce(WholeStageCodegenExec.scala:484) > at > org.apache.spark.sql.execution.InputRDDCodegen.doProduce$(WholeStageCodegenExec.scala:457) > at > org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:497) > at > org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:96) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) > at > org.apache.spark.sql.execution.CodegenSupport.produce(WholeStageCodegenExec.scala:91) > at
[jira] [Created] (SPARK-36677) NestedColumnAliasing pushes down aggregate functions into projections
Venkata Sai Akhil Gudesa created SPARK-36677: Summary: NestedColumnAliasing pushes down aggregate functions into projections Key: SPARK-36677 URL: https://issues.apache.org/jira/browse/SPARK-36677 Project: Spark Issue Type: Bug Components: Optimizer, SQL Affects Versions: 3.2.0, 3.3.0 Reporter: Venkata Sai Akhil Gudesa Aggregate functions are being pushed down into projections when nested columns are accessed causing the following error: {code:java} Caused by: UnsupportedOperationException: Cannot generate code for expression: ...{code} Reproduction: {code:java} spark.sql("drop table if exists test_aggregates") spark.sql("create table if not exists test_aggregates(a STRUCT, d: int>, b string)") val df = sql("select max(a).c.e from (select a, b from test_aggregates) group by b") println(df.queryExecution.optimizedPlan) {code} The output of the above code: {noformat} 'Aggregate [b#1], [_extract_e#5 AS max(a).c.e#3] +- 'Project [max(a#0).c.e AS _extract_e#5, b#1] +- Relation default.test_aggregates[a#0,b#1] parquet {noformat} The error message when the dataframe is executed: {noformat} java.lang.UnsupportedOperationException: Cannot generate code for expression: max(input[0, struct,d:int>, true]) at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotGenerateCodeForExpressionError(QueryExecutionErrors.scala:83) at org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode(Expression.scala:312) at org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode$(Expression.scala:311) at org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.doGenCode(interfaces.scala:99) at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:151) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:146) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.nullSafeCodeGen(Expression.scala:525) at org.apache.spark.sql.catalyst.expressions.GetStructField.doGenCode(complexTypeExtractors.scala:126) at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:151) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:146) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.nullSafeCodeGen(Expression.scala:525) at org.apache.spark.sql.catalyst.expressions.GetStructField.doGenCode(complexTypeExtractors.scala:126) at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:151) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:146) at org.apache.spark.sql.catalyst.expressions.Alias.genCode(namedExpressions.scala:171) at org.apache.spark.sql.execution.ProjectExec.$anonfun$doConsume$2(basicPhysicalOperators.scala:73) at scala.collection.immutable.List.map(List.scala:293) at org.apache.spark.sql.execution.ProjectExec.$anonfun$doConsume$1(basicPhysicalOperators.scala:73) at org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.withSubExprEliminationExprs(CodeGenerator.scala:1039) at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:73) at org.apache.spark.sql.execution.CodegenSupport.consume(WholeStageCodegenExec.scala:195) at org.apache.spark.sql.execution.CodegenSupport.consume$(WholeStageCodegenExec.scala:150) at org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:497) at org.apache.spark.sql.execution.InputRDDCodegen.doProduce(WholeStageCodegenExec.scala:484) at org.apache.spark.sql.execution.InputRDDCodegen.doProduce$(WholeStageCodegenExec.scala:457) at org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:497) at org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:96) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) at org.apache.spark.sql.execution.CodegenSupport.produce(WholeStageCodegenExec.scala:91) at org.apache.spark.sql.execution.CodegenSupport.produce$(WholeStageCodegenExec.scala:91) at org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:497) at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:54) at org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:96) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) at