[jira] [Created] (SPARK-49263) Spark Connect python client: Consistently handle boolean Dataframe reader options
Juliusz Sompolski created SPARK-49263: - Summary: Spark Connect python client: Consistently handle boolean Dataframe reader options Key: SPARK-49263 URL: https://issues.apache.org/jira/browse/SPARK-49263 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski Python connect client spark.read.option should be using to_str -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48196) Make lazy val plans in QueryExecution Trys to not re-execute on error
Juliusz Sompolski created SPARK-48196: - Summary: Make lazy val plans in QueryExecution Trys to not re-execute on error Key: SPARK-48196 URL: https://issues.apache.org/jira/browse/SPARK-48196 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Juliusz Sompolski To achieve more consistency, keep capture errors from analysis/optimization/execution if the lazy val evaluation fails. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48195) Keep and RDD created by SparkPlan doExecute
Juliusz Sompolski created SPARK-48195: - Summary: Keep and RDD created by SparkPlan doExecute Key: SPARK-48195 URL: https://issues.apache.org/jira/browse/SPARK-48195 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Juliusz Sompolski For more consistency, don't make SparkPlan execute generate a new RDD each time, but reuse the one created by doExecute once. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46186) Invalid Spark Connect execution state transition if interrupted before thread started
Juliusz Sompolski created SPARK-46186: - Summary: Invalid Spark Connect execution state transition if interrupted before thread started Key: SPARK-46186 URL: https://issues.apache.org/jira/browse/SPARK-46186 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Juliusz Sompolski Fix edge case where interrupting execution before the ExecuteThreadRunner started could lead to illegal state transition. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46011) Spark Connect session heartbeat / keepalive
[ https://issues.apache.org/jira/browse/SPARK-46011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-46011: -- Epic Link: SPARK-43754 > Spark Connect session heartbeat / keepalive > --- > > Key: SPARK-46011 > URL: https://issues.apache.org/jira/browse/SPARK-46011 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Juliusz Sompolski >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46011) Spark Connect session heartbeat / keepalive
[ https://issues.apache.org/jira/browse/SPARK-46011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski resolved SPARK-46011. --- Resolution: Won't Fix Decided to not add this at this point. > Spark Connect session heartbeat / keepalive > --- > > Key: SPARK-46011 > URL: https://issues.apache.org/jira/browse/SPARK-46011 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Juliusz Sompolski >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46075) Refactor SparkConnectSessionManager to not use guava cache
Juliusz Sompolski created SPARK-46075: - Summary: Refactor SparkConnectSessionManager to not use guava cache Key: SPARK-46075 URL: https://issues.apache.org/jira/browse/SPARK-46075 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Juliusz Sompolski Guava cache gives limited control over session eviction. For example, there can't be more complex policies of session eviction. Refactor it to be more similar to SparkConnectExecutionManager. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46011) Spark Connect session heartbeat / keepalive
Juliusz Sompolski created SPARK-46011: - Summary: Spark Connect session heartbeat / keepalive Key: SPARK-46011 URL: https://issues.apache.org/jira/browse/SPARK-46011 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Juliusz Sompolski -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45780) Propagate all Spark Connect client threadlocal in InheritableThread
Juliusz Sompolski created SPARK-45780: - Summary: Propagate all Spark Connect client threadlocal in InheritableThread Key: SPARK-45780 URL: https://issues.apache.org/jira/browse/SPARK-45780 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Juliusz Sompolski Propagate all thread locals that can be set in SparkConnectClient, not only 'tags' -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43754) Spark Connect Session & Query lifecycle
[ https://issues.apache.org/jira/browse/SPARK-43754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-43754: -- Affects Version/s: 4.0.0 > Spark Connect Session & Query lifecycle > --- > > Key: SPARK-43754 > URL: https://issues.apache.org/jira/browse/SPARK-43754 > Project: Spark > Issue Type: Epic > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Juliusz Sompolski >Priority: Major > > Currently, queries in Spark Connect are executed within the RPC handler. > We want to detach the RPC interface from actual sessions and execution, so > that we can make the interface more flexible > * maintain long running sessions, independent of unbroken GRPC channel > * be able to cancel queries > * have different interfaces to query results than push from server -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45680) ReleaseSession to close Spark Connect session
Juliusz Sompolski created SPARK-45680: - Summary: ReleaseSession to close Spark Connect session Key: SPARK-45680 URL: https://issues.apache.org/jira/browse/SPARK-45680 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Juliusz Sompolski -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45647) Spark Connect API to propagate per request context
Juliusz Sompolski created SPARK-45647: - Summary: Spark Connect API to propagate per request context Key: SPARK-45647 URL: https://issues.apache.org/jira/browse/SPARK-45647 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Juliusz Sompolski There is an extension point to pass arbitrary proto extension in Spark Connect UserContext, but there is no API to do this in the client. Add a SparkSession API to attach extra protos that will be sent with all requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45435) Document that lazy checkpoint may not be a consistent
[ https://issues.apache.org/jira/browse/SPARK-45435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-45435: -- Summary: Document that lazy checkpoint may not be a consistent (was: Document that lazy checkpoint may cause undeterministm) > Document that lazy checkpoint may not be a consistent > - > > Key: SPARK-45435 > URL: https://issues.apache.org/jira/browse/SPARK-45435 > Project: Spark > Issue Type: Documentation > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Juliusz Sompolski >Priority: Major > Labels: pull-request-available > > Some people may want to use checkpoint to get a consistent snapshot of the > Dataset / RDD. Warn that this is not the case with lazy checkpoint, because > checkpoint is computed only at the end of the first action, and the data used > during the first action may be different because of non-determinism and > retries. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45435) Document that lazy checkpoint may cause undeterministm
Juliusz Sompolski created SPARK-45435: - Summary: Document that lazy checkpoint may cause undeterministm Key: SPARK-45435 URL: https://issues.apache.org/jira/browse/SPARK-45435 Project: Spark Issue Type: Documentation Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Juliusz Sompolski Some people may want to use checkpoint to get a consistent snapshot of the Dataset / RDD. Warn that this is not the case with lazy checkpoint, because checkpoint is computed only at the end of the first action, and the data used during the first action may be different because of non-determinism and retries. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45416) Sanity check that Spark Connect returns arrow batches in order
Juliusz Sompolski created SPARK-45416: - Summary: Sanity check that Spark Connect returns arrow batches in order Key: SPARK-45416 URL: https://issues.apache.org/jira/browse/SPARK-45416 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Juliusz Sompolski -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45206) Shutting down SparkConnectClient can have outstanding ReleaseExecute
Juliusz Sompolski created SPARK-45206: - Summary: Shutting down SparkConnectClient can have outstanding ReleaseExecute Key: SPARK-45206 URL: https://issues.apache.org/jira/browse/SPARK-45206 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Juliusz Sompolski In Spark Connect scala client, there can be outstanding asynchronous ReleaseExecute calls when the client is shutdown. In ExecutePlanResponseReattachableIterator we (ab)use a grpc thread in createRetryingReleaseExecuteResponseObserver to run it. When we do {{{}SparkConnectClient.shutdown{}}}, which does {{channel.shutdownNow()}} it kills it before it finishes the ReleaseExecute. Maybe a more graceful shutdown would work, or we should move to a more explicit threadpool that we manage, like in python? See discussion in https://github.com/apache/spark/pull/42929/files#r1329071826 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45167) Python Spark Connect client does not call `releaseAll`
[ https://issues.apache.org/jira/browse/SPARK-45167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-45167: -- Epic Link: SPARK-43754 (was: SPARK-39375) > Python Spark Connect client does not call `releaseAll` > -- > > Key: SPARK-45167 > URL: https://issues.apache.org/jira/browse/SPARK-45167 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Martin Grund >Priority: Major > > The Python client does not call release all previous responses on the server > and thus does not properly close the queries. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45133) Mark Spark Connect queries as finished when all result tasks are finished, not sent
Juliusz Sompolski created SPARK-45133: - Summary: Mark Spark Connect queries as finished when all result tasks are finished, not sent Key: SPARK-45133 URL: https://issues.apache.org/jira/browse/SPARK-45133 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44872) Testing reattachable execute
Juliusz Sompolski created SPARK-44872: - Summary: Testing reattachable execute Key: SPARK-44872 URL: https://issues.apache.org/jira/browse/SPARK-44872 Project: Spark Issue Type: Task Components: Connect Affects Versions: 3.5.0, 4.0.0 Reporter: Juliusz Sompolski -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44855) Small tweaks to attaching ExecuteGrpcResponseSender to ExecuteResponseObserver
Juliusz Sompolski created SPARK-44855: - Summary: Small tweaks to attaching ExecuteGrpcResponseSender to ExecuteResponseObserver Key: SPARK-44855 URL: https://issues.apache.org/jira/browse/SPARK-44855 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Juliusz Sompolski Small improvements can be made to the way new ExecuteGrpcResponseSender is attached to observer. * Since now we have addGrpcResponseSender in ExecuteHolder, it should be ExecuteHolder responsibility to interrupt the old sender and that there is only one at a time, and to ExecuteResponseObserver's responsibility * executeObserver is used as a lock for synchronization. An explicit lock object could be better. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44730) Spark Connect: Cleaner thread not stopped when SparkSession stops
[ https://issues.apache.org/jira/browse/SPARK-44730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski resolved SPARK-44730. --- Resolution: Not A Problem > Spark Connect: Cleaner thread not stopped when SparkSession stops > - > > Key: SPARK-44730 > URL: https://issues.apache.org/jira/browse/SPARK-44730 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Juliusz Sompolski >Priority: Minor > > Spark Connect scala client SparkSession has a cleaner, which starts a daemon > thread to clean up Closeable objects after GC. This daemon thread is never > stopped, and every SparkSession creates a new one. > Cleaner implements a stop() function, but no-one ever calls it. Possibly > because even after SparkSession.stop(), the cleaner may still be needed when > remaining references are GCed... For this reason it seems that the Cleaner > should rather be a global singleton than within a session. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44730) Spark Connect: Cleaner thread not stopped when SparkSession stops
[ https://issues.apache.org/jira/browse/SPARK-44730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755613#comment-17755613 ] Juliusz Sompolski commented on SPARK-44730: --- Silly me, Cleaner is already global and I looked wrong. > Spark Connect: Cleaner thread not stopped when SparkSession stops > - > > Key: SPARK-44730 > URL: https://issues.apache.org/jira/browse/SPARK-44730 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Juliusz Sompolski >Priority: Minor > > Spark Connect scala client SparkSession has a cleaner, which starts a daemon > thread to clean up Closeable objects after GC. This daemon thread is never > stopped, and every SparkSession creates a new one. > Cleaner implements a stop() function, but no-one ever calls it. Possibly > because even after SparkSession.stop(), the cleaner may still be needed when > remaining references are GCed... For this reason it seems that the Cleaner > should rather be a global singleton than within a session. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44849) Expose SparkConnectExecutionManager.listActiveExecutions
Juliusz Sompolski created SPARK-44849: - Summary: Expose SparkConnectExecutionManager.listActiveExecutions Key: SPARK-44849 URL: https://issues.apache.org/jira/browse/SPARK-44849 Project: Spark Issue Type: Task Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44835) SparkConnect ReattachExecute could raise before ExecutePlan even attaches.
Juliusz Sompolski created SPARK-44835: - Summary: SparkConnect ReattachExecute could raise before ExecutePlan even attaches. Key: SPARK-44835 URL: https://issues.apache.org/jira/browse/SPARK-44835 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski If a ReattachExecute is sent very quickly after ExecutePlan, the following could happen: * ExecutePlan didn't reach *executeHolder.runGrpcResponseSender(responseSender)* in SparkConnectExecutePlanHandler yet. * ReattachExecute races around and reaches *executeHolder.runGrpcResponseSender(responseSender)* in SparkConnectReattachExecuteHandler first. * When ExecutePlan reaches {*}executeHolder.runGrpcResponseSender(responseSender){*}, and executionObserver.attachConsumer(this) is called in ExecuteGrpcResponseSender of ExecutePlan, it will kick out the ExecuteGrpcResponseSender or ReattachExecute. So even though ReattachExecute came later, it will get interrupted by the earlier ExecutePlan and finish with a *SparkSQLException(errorClass = "INVALID_CURSOR.DISCONNECTED", Map.empty)* (which was assumed to be a situation where a stale hanging RPC is replaced by a reconnection. That would be very unlikely to happen in practice, because ExecutePlan shouldn't be abandoned so fast, but because of https://issues.apache.org/jira/browse/SPARK-44833 it is slightly more likely (though there there is also a 50ms sleep before retry, which again make it unlikely) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44833) Spark Connect reattach when initial ExecutePlan didn't reach server doing too eager Reattach
Juliusz Sompolski created SPARK-44833: - Summary: Spark Connect reattach when initial ExecutePlan didn't reach server doing too eager Reattach Key: SPARK-44833 URL: https://issues.apache.org/jira/browse/SPARK-44833 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski In {code:java} case ex: StatusRuntimeException if Option(StatusProto.fromThrowable(ex)) .exists(_.getMessage.contains("INVALID_HANDLE.OPERATION_NOT_FOUND")) => if (lastReturnedResponseId.isDefined) { throw new IllegalStateException( "OPERATION_NOT_FOUND on the server but responses were already received from it.", ex) } // Try a new ExecutePlan, and throw upstream for retry. -> iter = rawBlockingStub.executePlan(initialRequest) -> throw new GrpcRetryHandler.RetryException {code} we call executePlan, and throw RetryException to have an exception handled upstream. Then it goes to {code:java} retry { if (firstTry) { // on first try, we use the existing iter. firstTry = false } else { // on retry, the iter is borked, so we need a new one ->iter = rawBlockingStub.reattachExecute(createReattachExecuteRequest()) } {code} and because it's not firstTry, immediately does reattach. This causes no failure - the reattach will work and attach to the query, the original executePlan will get detached. But it could be improved. Same issue is also present in python reattach.py. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44765) Make ReleaseExecute retry in ExecutePlanResponseReattachableIterator reuse common mechanism
[ https://issues.apache.org/jira/browse/SPARK-44765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski resolved SPARK-44765. --- Fix Version/s: 3.5.0 Resolution: Fixed > Make ReleaseExecute retry in ExecutePlanResponseReattachableIterator reuse > common mechanism > --- > > Key: SPARK-44765 > URL: https://issues.apache.org/jira/browse/SPARK-44765 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Assignee: Juliusz Sompolski >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43754) Spark Connect Session & Query lifecycle
[ https://issues.apache.org/jira/browse/SPARK-43754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754182#comment-17754182 ] Juliusz Sompolski commented on SPARK-43754: --- Needed for testing of this: https://issues.apache.org/jira/browse/SPARK-44806 > Spark Connect Session & Query lifecycle > --- > > Key: SPARK-43754 > URL: https://issues.apache.org/jira/browse/SPARK-43754 > Project: Spark > Issue Type: Epic > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > > Currently, queries in Spark Connect are executed within the RPC handler. > We want to detach the RPC interface from actual sessions and execution, so > that we can make the interface more flexible > * maintain long running sessions, independent of unbroken GRPC channel > * be able to cancel queries > * have different interfaces to query results than push from server -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44806) Separate connect-client-jvm-internal
Juliusz Sompolski created SPARK-44806: - Summary: Separate connect-client-jvm-internal Key: SPARK-44806 URL: https://issues.apache.org/jira/browse/SPARK-44806 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44784) Failure in testing `SparkSessionE2ESuite` using Maven
[ https://issues.apache.org/jira/browse/SPARK-44784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753974#comment-17753974 ] Juliusz Sompolski commented on SPARK-44784: --- This is a classloader issue with the serialization of an UDF pulling classes from the client's class context that it doesn't necessarily need, and then resulting in ClassNotFoundException on the server. org.apache.spark.SparkException: org/apache/spark/sql/connect/client/SparkResult It's not specifically related to the tests in that suite, but can happen anyplace with UDFs. > Failure in testing `SparkSessionE2ESuite` using Maven > - > > Key: SPARK-44784 > URL: https://issues.apache.org/jira/browse/SPARK-44784 > Project: Spark > Issue Type: Bug > Components: Connect, Tests >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Blocker > > [https://github.com/apache/spark/actions/runs/5832898984/job/15819181762] > > The following failures exist in the daily Maven tests, we should fix them > before Apache Spark 3.5.0 release: > {code:java} > SparkSessionE2ESuite: > 4638- interrupt all - background queries, foreground interrupt *** FAILED *** > 4639 The code passed to eventually never returned normally. Attempted 30 > times over 20.092924822 seconds. Last failure message: Some("unexpected > failure in q1: org.apache.spark.SparkException: > org/apache/spark/sql/connect/client/SparkResult") was not empty Error not > empty: Some(unexpected failure in q1: org.apache.spark.SparkException: > org/apache/spark/sql/connect/client/SparkResult). > (SparkSessionE2ESuite.scala:71) > 4640- interrupt all - foreground queries, background interrupt *** FAILED *** > 4641 "org/apache/spark/sql/connect/client/SparkResult" did not contain > "OPERATION_CANCELED" Unexpected exception: org.apache.spark.SparkException: > org/apache/spark/sql/connect/client/SparkResult > (SparkSessionE2ESuite.scala:105) > 4642- interrupt tag *** FAILED *** > 4643 The code passed to eventually never returned normally. Attempted 30 > times over 20.069445587 seconds. Last failure message: ListBuffer() had > length 0 instead of expected length 2 Interrupted operations: ListBuffer().. > (SparkSessionE2ESuite.scala:199) > 4644- interrupt operation *** FAILED *** > 4645 org.apache.spark.SparkException: > org/apache/spark/sql/connect/client/SparkResult > 4646 at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toThrowable(GrpcExceptionConverter.scala:89) > 4647 at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:38) > 4648 at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:46) > 4649 at > org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:83) > 4650 at > org.apache.spark.sql.connect.client.SparkResult.operationId(SparkResult.scala:174) > 4651 at > org.apache.spark.sql.SparkSessionE2ESuite.$anonfun$new$31(SparkSessionE2ESuite.scala:243) > 4652 at > org.apache.spark.sql.connect.client.util.RemoteSparkSession.$anonfun$test$1(RemoteSparkSession.scala:243) > 4653 at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > 4654 at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > 4655 at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > 4656 ... {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44765) Make ReleaseExecute retry in ExecutePlanResponseReattachableIterator reuse common mechanism
Juliusz Sompolski created SPARK-44765: - Summary: Make ReleaseExecute retry in ExecutePlanResponseReattachableIterator reuse common mechanism Key: SPARK-44765 URL: https://issues.apache.org/jira/browse/SPARK-44765 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44762) Add more documentation and examples for using job tags for interrupt
Juliusz Sompolski created SPARK-44762: - Summary: Add more documentation and examples for using job tags for interrupt Key: SPARK-44762 URL: https://issues.apache.org/jira/browse/SPARK-44762 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski Add documentation to spark.addJob tag with similar examples and explanation like SparkContext.setJobGroup -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44738) Spark Connect Reattach misses metadata propagation
[ https://issues.apache.org/jira/browse/SPARK-44738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-44738: -- Epic Link: SPARK-43754 > Spark Connect Reattach misses metadata propagation > -- > > Key: SPARK-44738 > URL: https://issues.apache.org/jira/browse/SPARK-44738 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Blocker > Fix For: 3.5.0, 4.0.0 > > > Currently, in the Spark Connect Reattach handler client metadata is not > propgated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44730) Spark Connect: Cleaner thread not stopped when SparkSession stops
Juliusz Sompolski created SPARK-44730: - Summary: Spark Connect: Cleaner thread not stopped when SparkSession stops Key: SPARK-44730 URL: https://issues.apache.org/jira/browse/SPARK-44730 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0, 4.0.0 Reporter: Juliusz Sompolski Spark Connect scala client SparkSession has a cleaner, which starts a daemon thread to clean up Closeable objects after GC. This daemon thread is never stopped, and every SparkSession creates a new one. Cleaner implements a stop() function, but no-one ever calls it. Possibly because even after SparkSession.stop(), the cleaner may still be needed when remaining references are GCed... For this reason it seems that the Cleaner should rather be a global singleton than within a session. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43754) Spark Connect Session & Query lifecycle
[ https://issues.apache.org/jira/browse/SPARK-43754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752130#comment-17752130 ] Juliusz Sompolski commented on SPARK-43754: --- Not in epic, but nice-to-have refactoring: https://issues.apache.org/jira/browse/SPARK-43756 > Spark Connect Session & Query lifecycle > --- > > Key: SPARK-43754 > URL: https://issues.apache.org/jira/browse/SPARK-43754 > Project: Spark > Issue Type: Epic > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > > Currently, queries in Spark Connect are executed within the RPC handler. > We want to detach the RPC interface from actual sessions and execution, so > that we can make the interface more flexible > * maintain long running sessions, independent of unbroken GRPC channel > * be able to cancel queries > * have different interfaces to query results than push from server -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43756) Spark Connect - prefer to pass around SessionHolder / ExecuteHolder more
[ https://issues.apache.org/jira/browse/SPARK-43756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-43756: -- Epic Link: (was: SPARK-43754) > Spark Connect - prefer to pass around SessionHolder / ExecuteHolder more > > > Key: SPARK-43756 > URL: https://issues.apache.org/jira/browse/SPARK-43756 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > > Right now, we pass around inidvidual things like sessionId, userId etc. in > multiple places. This needs to the need of additional threading of parameters > quite often when something new is added. > Better pass around SessionHolder and ExecutePlanHolder, where accessor and > utility functions can be added. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44722) reattach.py: AttributeError: 'NoneType' object has no attribute 'message'
Juliusz Sompolski created SPARK-44722: - Summary: reattach.py: AttributeError: 'NoneType' object has no attribute 'message' Key: SPARK-44722 URL: https://issues.apache.org/jira/browse/SPARK-44722 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0, 4.0.0 Reporter: Juliusz Sompolski -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44709) Fix flow control in ExecuteGrpcResponseSender
Juliusz Sompolski created SPARK-44709: - Summary: Fix flow control in ExecuteGrpcResponseSender Key: SPARK-44709 URL: https://issues.apache.org/jira/browse/SPARK-44709 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0, 4.0.0 Reporter: Juliusz Sompolski -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44676) Ensure Spark Connect scala client CloseableIterator is closed in all cases where exception could be thrown
Juliusz Sompolski created SPARK-44676: - Summary: Ensure Spark Connect scala client CloseableIterator is closed in all cases where exception could be thrown Key: SPARK-44676 URL: https://issues.apache.org/jira/browse/SPARK-44676 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Juliusz Sompolski We currently already ensure that CloseableIterator is consumed in all places, and in ExecutePlanResponseReattachableIterator we ensure that the iterator is closed in case of GRPC error. We should also ensure that all places in the client that use a CloseableIterator will close it gracefully, also in case of another exception being thrown, including InterruptedException. Some try \{ } finally \{ iteretor.close() } blocks may be needed for that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44656) Close dangling iterators in SparkResult too (Spark Connect Scala)
[ https://issues.apache.org/jira/browse/SPARK-44656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-44656: -- Epic Link: SPARK-43754 > Close dangling iterators in SparkResult too (Spark Connect Scala) > - > > Key: SPARK-44656 > URL: https://issues.apache.org/jira/browse/SPARK-44656 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Alice Sayutina >Priority: Major > > SPARK-44636 followup. We didn't address iterators grabbed in SparkResult > there. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43754) Spark Connect Session & Query lifecycle
[ https://issues.apache.org/jira/browse/SPARK-43754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-43754: -- Epic Name: connect-query-lifecycle (was: sc-query-lifecycle) > Spark Connect Session & Query lifecycle > --- > > Key: SPARK-43754 > URL: https://issues.apache.org/jira/browse/SPARK-43754 > Project: Spark > Issue Type: Epic > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > > Currently, queries in Spark Connect are executed within the RPC handler. > We want to detach the RPC interface from actual sessions and execution, so > that we can make the interface more flexible > * maintain long running sessions, independent of unbroken GRPC channel > * be able to cancel queries > * have different interfaces to query results than push from server -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43754) Spark Connect Session & Query lifecycle
[ https://issues.apache.org/jira/browse/SPARK-43754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-43754: -- Epic Name: sc-query-lifecycle (was: sc-session-lifecycle) > Spark Connect Session & Query lifecycle > --- > > Key: SPARK-43754 > URL: https://issues.apache.org/jira/browse/SPARK-43754 > Project: Spark > Issue Type: Epic > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > > Currently, queries in Spark Connect are executed within the RPC handler. > We want to detach the RPC interface from actual sessions and execution, so > that we can make the interface more flexible > * maintain long running sessions, independent of unbroken GRPC channel > * be able to cancel queries > * have different interfaces to query results than push from server -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44642) ExecutePlanResponseReattachableIterator should release all after error
Juliusz Sompolski created SPARK-44642: - Summary: ExecutePlanResponseReattachableIterator should release all after error Key: SPARK-44642 URL: https://issues.apache.org/jira/browse/SPARK-44642 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0, 4.0.0 Reporter: Juliusz Sompolski -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44642) ExecutePlanResponseReattachableIterator should release all after error
[ https://issues.apache.org/jira/browse/SPARK-44642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-44642: -- Epic Link: SPARK-43754 > ExecutePlanResponseReattachableIterator should release all after error > -- > > Key: SPARK-44642 > URL: https://issues.apache.org/jira/browse/SPARK-44642 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Juliusz Sompolski >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44636) Leave no dangling iterators in Spark Connect Scala
[ https://issues.apache.org/jira/browse/SPARK-44636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-44636: -- Epic Link: SPARK-43754 > Leave no dangling iterators in Spark Connect Scala > -- > > Key: SPARK-44636 > URL: https://issues.apache.org/jira/browse/SPARK-44636 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Alice Sayutina >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44637) ExecuteRelease needs to synchronize
[ https://issues.apache.org/jira/browse/SPARK-44637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-44637: -- Epic Link: SPARK-43754 > ExecuteRelease needs to synchronize > --- > > Key: SPARK-44637 > URL: https://issues.apache.org/jira/browse/SPARK-44637 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Juliusz Sompolski >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44637) ExecuteRelease needs to synchronize
Juliusz Sompolski created SPARK-44637: - Summary: ExecuteRelease needs to synchronize Key: SPARK-44637 URL: https://issues.apache.org/jira/browse/SPARK-44637 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0, 4.0.0 Reporter: Juliusz Sompolski -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44624) Spark Connect reattachable Execute when initial ExecutePlan didn't reach server
[ https://issues.apache.org/jira/browse/SPARK-44624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-44624: -- Epic Link: SPARK-43754 > Spark Connect reattachable Execute when initial ExecutePlan didn't reach > server > --- > > Key: SPARK-44624 > URL: https://issues.apache.org/jira/browse/SPARK-44624 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Juliusz Sompolski >Priority: Major > > If the ExecutePlan never reached the server, a ReattachExecute will fail with > INVALID_HANDLE.OPERATION_NOT_FOUND. In that case, we could try to send > ExecutePlan again. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44624) Spark Connect reattachable Execute when initial ExecutePlan didn't reach server
[ https://issues.apache.org/jira/browse/SPARK-44624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-44624: -- Description: If the ExecutePlan never reached the server, a ReattachExecute will fail with INVALID_HANDLE.OPERATION_NOT_FOUND. In that case, we could try to send ExecutePlan again. (was: Even though we empirically observed that error is throws only from first next() or hasNext() of the response StreamObserver, wrap the initial call in retries as well to not depend on it in case it's just an quirk that's not fully dependable.) > Spark Connect reattachable Execute when initial ExecutePlan didn't reach > server > --- > > Key: SPARK-44624 > URL: https://issues.apache.org/jira/browse/SPARK-44624 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Juliusz Sompolski >Priority: Major > > If the ExecutePlan never reached the server, a ReattachExecute will fail with > INVALID_HANDLE.OPERATION_NOT_FOUND. In that case, we could try to send > ExecutePlan again. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44624) Spark Connect reattachable Execute when initial ExecutePlan didn't reach server
[ https://issues.apache.org/jira/browse/SPARK-44624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-44624: -- Summary: Spark Connect reattachable Execute when initial ExecutePlan didn't reach server (was: Wrap retries around initial streaming GRPC call in connect) > Spark Connect reattachable Execute when initial ExecutePlan didn't reach > server > --- > > Key: SPARK-44624 > URL: https://issues.apache.org/jira/browse/SPARK-44624 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Juliusz Sompolski >Priority: Major > > Even though we empirically observed that error is throws only from first > next() or hasNext() of the response StreamObserver, wrap the initial call in > retries as well to not depend on it in case it's just an quirk that's not > fully dependable. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44625) Spark Connect clean up abandoned executions
Juliusz Sompolski created SPARK-44625: - Summary: Spark Connect clean up abandoned executions Key: SPARK-44625 URL: https://issues.apache.org/jira/browse/SPARK-44625 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0, 4.0.0 Reporter: Juliusz Sompolski With reattachable executions, some executions might get orphaned when ReattachExecute and ReleaseExecute never comes. Add a mechanism to track that and to clean them up. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44624) Wrap retries around initial streaming GRPC call in connect
Juliusz Sompolski created SPARK-44624: - Summary: Wrap retries around initial streaming GRPC call in connect Key: SPARK-44624 URL: https://issues.apache.org/jira/browse/SPARK-44624 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0, 4.0.0 Reporter: Juliusz Sompolski Even though we empirically observed that error is throws only from first next() or hasNext() of the response StreamObserver, wrap the initial call in retries as well to not depend on it in case it's just an quirk that's not fully dependable. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44599) Python client for reattaching to existing execute in Spark Connect (server mechanism)
[ https://issues.apache.org/jira/browse/SPARK-44599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17749370#comment-17749370 ] Juliusz Sompolski commented on SPARK-44599: --- Duplicate of https://issues.apache.org/jira/browse/SPARK-44424 > Python client for reattaching to existing execute in Spark Connect (server > mechanism) > - > > Key: SPARK-44599 > URL: https://issues.apache.org/jira/browse/SPARK-44599 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > Python client for https://issues.apache.org/jira/browse/SPARK-44421 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43923) [CONNECT] Post listenerBus events during ExecutePlanRequest
[ https://issues.apache.org/jira/browse/SPARK-43923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-43923: -- Issue Type: New Feature (was: Bug) > [CONNECT] Post listenerBus events during ExecutePlanRequest > --- > > Key: SPARK-43923 > URL: https://issues.apache.org/jira/browse/SPARK-43923 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Jean-Francois Desjeans Gauthier >Priority: Major > > Post events SparkListenerConnectOperationStarted, > SparkListenerConnectOperationParsed, SparkListenerConnectOperationCanceled, > SparkListenerConnectOperationFailed, SparkListenerConnectOperationFinished, > SparkListenerConnectOperationClosed & SparkListenerConnectSessionClosed. > Mirror what is currently available for in HiveThriftServer2EventManager -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44425) Validate that session_id is an UUID
Juliusz Sompolski created SPARK-44425: - Summary: Validate that session_id is an UUID Key: SPARK-44425 URL: https://issues.apache.org/jira/browse/SPARK-44425 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski Add validation that session_id is an UUID. This is currently the case in the clients, so we could make it an requirement. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44424) Reattach to existing execute in Spark Connect (python client)
Juliusz Sompolski created SPARK-44424: - Summary: Reattach to existing execute in Spark Connect (python client) Key: SPARK-44424 URL: https://issues.apache.org/jira/browse/SPARK-44424 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski Python client for https://issues.apache.org/jira/browse/SPARK-44421 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44423) Reattach to existing execute in Spark Connect (scala client)
Juliusz Sompolski created SPARK-44423: - Summary: Reattach to existing execute in Spark Connect (scala client) Key: SPARK-44423 URL: https://issues.apache.org/jira/browse/SPARK-44423 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski Scala client for https://issues.apache.org/jira/browse/SPARK-44421 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44422) Fine grained interrupt in Spark Connect
Juliusz Sompolski created SPARK-44422: - Summary: Fine grained interrupt in Spark Connect Key: SPARK-44422 URL: https://issues.apache.org/jira/browse/SPARK-44422 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski Next to SparkSession.interruptAll, provide mechanism for interrupting * individual queries * user defined groups of queries in a session (by a tag) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44421) Reattach to existing execute in Spark Connect (server mechanism)
[ https://issues.apache.org/jira/browse/SPARK-44421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-44421: -- Summary: Reattach to existing execute in Spark Connect (server mechanism) (was: Reattach to existing execute in Spark Connect) > Reattach to existing execute in Spark Connect (server mechanism) > > > Key: SPARK-44421 > URL: https://issues.apache.org/jira/browse/SPARK-44421 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > > If the ExecutePlan response stream gets broken, provide a mechanism to > reattach to the execution with a new RPC. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44421) Reattach to existing execute in Spark Connect
Juliusz Sompolski created SPARK-44421: - Summary: Reattach to existing execute in Spark Connect Key: SPARK-44421 URL: https://issues.apache.org/jira/browse/SPARK-44421 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski If the ExecutePlan response stream gets broken, provide a mechanism to reattach to the execution with a new RPC. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44195) Add JobTag APIs to SparkR SparkContext
Juliusz Sompolski created SPARK-44195: - Summary: Add JobTag APIs to SparkR SparkContext Key: SPARK-44195 URL: https://issues.apache.org/jira/browse/SPARK-44195 Project: Spark Issue Type: New Feature Components: SparkR Affects Versions: 3.5.0 Reporter: Juliusz Sompolski Add APIs added in https://issues.apache.org/jira/browse/SPARK-43952 to SparkR: * {{SparkContext.addJobTag(tag: String): Unit}} * {{SparkContext.removeJobTag(tag: String): Unit}} * {{SparkContext.getJobTags(): Set[String]}} * {{SparkContext.clearJobTags(): Unit}} * {{SparkContext.cancelJobsWithTag(tag: String): Unit}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44194) Add JobTag APIs to Pyspark SparkContext
Juliusz Sompolski created SPARK-44194: - Summary: Add JobTag APIs to Pyspark SparkContext Key: SPARK-44194 URL: https://issues.apache.org/jira/browse/SPARK-44194 Project: Spark Issue Type: New Feature Components: PySpark Affects Versions: 3.5.0 Reporter: Juliusz Sompolski Add APIs added in https://issues.apache.org/jira/browse/SPARK-43952 to PySpark: * {{SparkContext.addJobTag(tag: String): Unit}} * {{SparkContext.removeJobTag(tag: String): Unit}} * {{SparkContext.getJobTags(): Set[String]}} * {{SparkContext.clearJobTags(): Unit}} * {{SparkContext.cancelJobsWithTag(tag: String): Unit}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43744) Spark Connect scala UDF serialization pulling in unrelated classes not available on server
[ https://issues.apache.org/jira/browse/SPARK-43744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-43744: -- Summary: Spark Connect scala UDF serialization pulling in unrelated classes not available on server (was: Maven test failure when "interrupt all" tests are moved to ClientE2ETestSuite) > Spark Connect scala UDF serialization pulling in unrelated classes not > available on server > -- > > Key: SPARK-43744 > URL: https://issues.apache.org/jira/browse/SPARK-43744 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > Labels: SPARK-43745 > > [https://github.com/apache/spark/pull/41487] moved "interrupt all - > background queries, foreground interrupt" and "interrupt all - foreground > queries, background interrupt" tests from ClientE2ETestSuite into a new > isolated suite SparkSessionE2ESuite to avoid an unexplicable UDF > serialization issue. > > When these tests are moved back to ClientE2ETestSuite and when testing with > {code:java} > build/mvn clean install -DskipTests -Phive > build/mvn test -pl connector/connect/client/jvm -Dtest=none > -DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite{code} > > the tests fails with > {code:java} > 23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . > SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf. > java.lang.NoClassDefFoundError: > org/apache/spark/sql/connect/client/SparkResult > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.getDeclaredMethod(Class.java:2128) > at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643) > at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79) > at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520) > at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:494) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681) > at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852) > at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) > at org.apache.spark.util.Utils$.deserialize(Utils.scala:148) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.ap
[jira] [Updated] (SPARK-43744) Maven test failure when "interrupt all" tests are moved to ClientE2ETestSuite
[ https://issues.apache.org/jira/browse/SPARK-43744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-43744: -- Description: [https://github.com/apache/spark/pull/41487] moved "interrupt all - background queries, foreground interrupt" and "interrupt all - foreground queries, background interrupt" tests from ClientE2ETestSuite into a new isolated suite SparkSessionE2ESuite to avoid an unexplicable UDF serialization issue. When these tests are moved back to ClientE2ETestSuite and when testing with {code:java} build/mvn clean install -DskipTests -Phive build/mvn test -pl connector/connect/client/jvm -Dtest=none -DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite{code} the tests fails with {code:java} 23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf. java.lang.NoClassDefFoundError: org/apache/spark/sql/connect/client/SparkResult at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.getDeclaredMethod(Class.java:2128) at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643) at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79) at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520) at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.(ObjectStreamClass.java:494) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852) at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) at org.apache.spark.util.Utils$.deserialize(Utils.scala:148) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353) at org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:825) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$1(SparkConnectStreamHandler.scala:53) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:209) at org.apache.spark.sql.connect.artifact.SparkConnectArtifactManager$.withArtifactClassLoader(SparkConnectArtifactManager.scala:178) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:48) at org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:166) at org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:611) at org.sparkproject.connect.grpc.st
[jira] [Reopened] (SPARK-43744) Maven test failure of ClientE2ETestSuite "interrupt all" tests
[ https://issues.apache.org/jira/browse/SPARK-43744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski reopened SPARK-43744: --- [https://github.com/apache/spark/pull/41487] has swept the issue under the rug by moving the tests around to a different suite. Reopening this to get to the bottom of the issue, as I believe it remains a real issue that is not covered. > Maven test failure of ClientE2ETestSuite "interrupt all" tests > -- > > Key: SPARK-43744 > URL: https://issues.apache.org/jira/browse/SPARK-43744 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > Labels: SPARK-43745 > > When testing with > {code:java} > build/mvn clean install -DskipTests -Phive > build/mvn test -pl connector/connect/client/jvm -Dtest=none > -DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite{code} > > the tests fails with > {code:java} > 23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . > SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf. > java.lang.NoClassDefFoundError: > org/apache/spark/sql/connect/client/SparkResult > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.getDeclaredMethod(Class.java:2128) > at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643) > at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79) > at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520) > at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:494) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681) > at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852) > at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) > at org.apache.spark.util.Utils$.deserialize(Utils.scala:148) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:825) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$1(SparkConnectStreamHandler.scala:53) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:209) > at > org.apache.spark.sql.connect.artifact.S
[jira] [Updated] (SPARK-43744) Maven test failure when "interrupt all" tests are moved to ClientE2ETestSuite
[ https://issues.apache.org/jira/browse/SPARK-43744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-43744: -- Summary: Maven test failure when "interrupt all" tests are moved to ClientE2ETestSuite (was: Maven test failure of ClientE2ETestSuite "interrupt all" tests) > Maven test failure when "interrupt all" tests are moved to ClientE2ETestSuite > - > > Key: SPARK-43744 > URL: https://issues.apache.org/jira/browse/SPARK-43744 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > Labels: SPARK-43745 > > When testing with > {code:java} > build/mvn clean install -DskipTests -Phive > build/mvn test -pl connector/connect/client/jvm -Dtest=none > -DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite{code} > > the tests fails with > {code:java} > 23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . > SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf. > java.lang.NoClassDefFoundError: > org/apache/spark/sql/connect/client/SparkResult > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.getDeclaredMethod(Class.java:2128) > at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643) > at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79) > at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520) > at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:494) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681) > at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852) > at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) > at org.apache.spark.util.Utils$.deserialize(Utils.scala:148) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:825) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$1(SparkConnectStreamHandler.scala:53) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:209) > at > org.apache.spark.sql.connect.artifact.SparkConnectArtifactManager$.withArtifactClassLoa
[jira] [Commented] (SPARK-43952) Cancel Spark jobs not only by a single "jobgroup", but allow multiple "job tags"
[ https://issues.apache.org/jira/browse/SPARK-43952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728791#comment-17728791 ] Juliusz Sompolski commented on SPARK-43952: --- Indirectly related with https://issues.apache.org/jira/browse/SPARK-43754 to allow Spark Connect cancellation of queries not conflict with other places setting job groups. > Cancel Spark jobs not only by a single "jobgroup", but allow multiple "job > tags" > > > Key: SPARK-43952 > URL: https://issues.apache.org/jira/browse/SPARK-43952 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > > Currently, the only way to cancel running Spark Jobs is by using > SparkContext.cancelJobGroup, using a job group name that was previously set > using SparkContext.setJobGroup. This is problematic if multiple different > parts of the system want to do cancellation, and set their own ids. > For example, > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L133] > sets it's own job group, which may override job group set by user. This way, > if user cancels the job group they set, it will not cancel these broadcast > jobs launches from within their jobs... > As a solution, consider adding SparkContext.addJobTag / > SparkContext.removeJobTag, which would allow to have multiple "tags" on the > jobs, and introduce SparkContext.cancelJobsByTag to allow more flexible > cancelling of jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43754) Spark Connect Session & Query lifecycle
[ https://issues.apache.org/jira/browse/SPARK-43754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728790#comment-17728790 ] Juliusz Sompolski commented on SPARK-43754: --- Indirectly related to https://issues.apache.org/jira/browse/SPARK-43952 (to allow Spark Connect cancellation of queries not conflict with other places setting job groups) > Spark Connect Session & Query lifecycle > --- > > Key: SPARK-43754 > URL: https://issues.apache.org/jira/browse/SPARK-43754 > Project: Spark > Issue Type: Epic > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > > Currently, queries in Spark Connect are executed within the RPC handler. > We want to detach the RPC interface from actual sessions and execution, so > that we can make the interface more flexible > * maintain long running sessions, independent of unbroken GRPC channel > * be able to cancel queries > * have different interfaces to query results than push from server -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43952) Cancel Spark jobs not only by a single "jobgroup", but allow multiple "job tags"
Juliusz Sompolski created SPARK-43952: - Summary: Cancel Spark jobs not only by a single "jobgroup", but allow multiple "job tags" Key: SPARK-43952 URL: https://issues.apache.org/jira/browse/SPARK-43952 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.5.0 Reporter: Juliusz Sompolski Currently, the only way to cancel running Spark Jobs is by using SparkContext.cancelJobGroup, using a job group name that was previously set using SparkContext.setJobGroup. This is problematic if multiple different parts of the system want to do cancellation, and set their own ids. For example, [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L133] sets it's own job group, which may override job group set by user. This way, if user cancels the job group they set, it will not cancel these broadcast jobs launches from within their jobs... As a solution, consider adding SparkContext.addJobTag / SparkContext.removeJobTag, which would allow to have multiple "tags" on the jobs, and introduce SparkContext.cancelJobsByTag to allow more flexible cancelling of jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43756) Spark Connect - prefer to pass around SessionHolder / ExecuteHolder more
Juliusz Sompolski created SPARK-43756: - Summary: Spark Connect - prefer to pass around SessionHolder / ExecuteHolder more Key: SPARK-43756 URL: https://issues.apache.org/jira/browse/SPARK-43756 Project: Spark Issue Type: Task Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski Right now, we pass around inidvidual things like sessionId, userId etc. in multiple places. This needs to the need of additional threading of parameters quite often when something new is added. Better pass around SessionHolder and ExecutePlanHolder, where accessor and utility functions can be added. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43755) Spark Connect - decouple query execution from RPC handler
Juliusz Sompolski created SPARK-43755: - Summary: Spark Connect - decouple query execution from RPC handler Key: SPARK-43755 URL: https://issues.apache.org/jira/browse/SPARK-43755 Project: Spark Issue Type: Story Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski Move actual query execution out of the RPC handler callback. This allows: * (immediately) better control over query cancellation, by interrupting the execution thread. * design changes to the RPC interface to allow different execution models than stream-push from server. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43331) Spark Connect - SparkSession interruptAll
[ https://issues.apache.org/jira/browse/SPARK-43331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-43331: -- Epic Link: SPARK-43754 > Spark Connect - SparkSession interruptAll > - > > Key: SPARK-43331 > URL: https://issues.apache.org/jira/browse/SPARK-43331 > Project: Spark > Issue Type: Story > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Assignee: Juliusz Sompolski >Priority: Major > Fix For: 3.5.0 > > > Add an "interruptAll" Api to client SparkSession, to allow interrupting all > running executions in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43754) Spark Connect Session & Query lifecycle
Juliusz Sompolski created SPARK-43754: - Summary: Spark Connect Session & Query lifecycle Key: SPARK-43754 URL: https://issues.apache.org/jira/browse/SPARK-43754 Project: Spark Issue Type: Epic Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski Currently, queries in Spark Connect are executed within the RPC handler. We want to detach the RPC interface from actual sessions and execution, so that we can make the interface more flexible * maintain long running sessions, independent of unbroken GRPC channel * be able to cancel queries * have different interfaces to query results than push from server -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43744) Maven test failure of ClientE2ETestSuite "interrupt all" tests
[ https://issues.apache.org/jira/browse/SPARK-43744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski resolved SPARK-43744. --- Resolution: Duplicate > Maven test failure of ClientE2ETestSuite "interrupt all" tests > -- > > Key: SPARK-43744 > URL: https://issues.apache.org/jira/browse/SPARK-43744 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > > When testing with > {code:java} > build/mvn clean install -DskipTests -Phive > build/mvn test -pl connector/connect/client/jvm -Dtest=none > -DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite{code} > > the tests fails with > {code:java} > 23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . > SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf. > java.lang.NoClassDefFoundError: > org/apache/spark/sql/connect/client/SparkResult > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.getDeclaredMethod(Class.java:2128) > at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643) > at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79) > at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520) > at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:494) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681) > at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852) > at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) > at org.apache.spark.util.Utils$.deserialize(Utils.scala:148) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:825) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$1(SparkConnectStreamHandler.scala:53) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:209) > at > org.apache.spark.sql.connect.artifact.SparkConnectArtifactManager$.withArtifactClassLoader(SparkConnectArtifactManager.scala:178) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:48) > at > org.apache.spark.sql.connect.service
[jira] [Commented] (SPARK-43648) Maven test failed: `interrupt all - background queries, foreground interrupt` and `interrupt all - foreground queries, background interrupt`
[ https://issues.apache.org/jira/browse/SPARK-43648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725320#comment-17725320 ] Juliusz Sompolski commented on SPARK-43648: --- Duplicated by https://issues.apache.org/jira/browse/SPARK-43744; closing the duplicate. See [https://github.com/apache/spark/pull/41005/files/35e500d4cb72f8d3bee21a7f86ee16cbbc8a936c#r1200551487] thread for some debugging info. It seems it may be related to https://issues.apache.org/jira/browse/SPARK-43227 > Maven test failed: `interrupt all - background queries, foreground interrupt` > and `interrupt all - foreground queries, background interrupt` > > > Key: SPARK-43648 > URL: https://issues.apache.org/jira/browse/SPARK-43648 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > run > {code:java} > build/mvn clean install -DskipTests -Phive > build/mvn test -pl connector/connect/client/jvm -Dtest=none > -DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite {code} > `interrupt all - background queries, foreground interrupt` and `interrupt all > - foreground queries, background interrupt` failed as follows: > {code:java} > 23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . > SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf. > java.lang.NoClassDefFoundError: > org/apache/spark/sql/connect/client/SparkResult > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.getDeclaredMethod(Class.java:2128) > at > java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643) > at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79) > at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520) > at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:494) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852) > at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) > at org.apache.spark.util.Utils$.deserialize(Utils.scala:148) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:
[jira] [Updated] (SPARK-43744) Maven test failure of ClientE2ETestSuite "interrupt all" tests
[ https://issues.apache.org/jira/browse/SPARK-43744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-43744: -- Description: When testing with {code:java} build/mvn clean install -DskipTests -Phive build/mvn test -pl connector/connect/client/jvm -Dtest=none -DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite{code} the tests fails with {code:java} 23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf. java.lang.NoClassDefFoundError: org/apache/spark/sql/connect/client/SparkResult at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.getDeclaredMethod(Class.java:2128) at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643) at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79) at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520) at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.(ObjectStreamClass.java:494) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852) at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) at org.apache.spark.util.Utils$.deserialize(Utils.scala:148) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353) at org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:825) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$1(SparkConnectStreamHandler.scala:53) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:209) at org.apache.spark.sql.connect.artifact.SparkConnectArtifactManager$.withArtifactClassLoader(SparkConnectArtifactManager.scala:178) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:48) at org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:166) at org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:611) at org.sparkproject.connect.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) at org.sparkproject.connect.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:352) at org.sparkproject.connect.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866) at
[jira] [Commented] (SPARK-43227) Fix deserialisation issue when UDFs contain a lambda expression
[ https://issues.apache.org/jira/browse/SPARK-43227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725315#comment-17725315 ] Juliusz Sompolski commented on SPARK-43227: --- Possibly related to https://issues.apache.org/jira/browse/SPARK-43744? > Fix deserialisation issue when UDFs contain a lambda expression > --- > > Key: SPARK-43227 > URL: https://issues.apache.org/jira/browse/SPARK-43227 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > The following code: > {code:java} > class A(x: Int) { def get = x * 20 + 5 } > val dummyUdf = (x: Int) => new A(x).get > val myUdf = udf(dummyUdf) > spark.range(5).select(myUdf(col("id"))).as[Int].collect() {code} > hits the following error: > {noformat} > io.grpc.StatusRuntimeException: INTERNAL: cannot assign instance of > java.lang.invoke.SerializedLambda to field > ammonite.$sess.cmd26$Helper.dummyUdf of type scala.Function1 in instance of > ammonite.$sess.cmd26$Helper > io.grpc.Status.asRuntimeException(Status.java:535) > > io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660) > > org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:62) > > org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:114) > > org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:131) > org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2687) > org.apache.spark.sql.Dataset.withResult(Dataset.scala:3088) > org.apache.spark.sql.Dataset.collect(Dataset.scala:2686) > ammonite.$sess.cmd28$Helper.(cmd28.sc:1) > ammonite.$sess.cmd28$.(cmd28.sc:7) > ammonite.$sess.cmd28$.(cmd28.sc){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43744) Maven test failure of ClientE2ETestSuite "interrupt all" tests
Juliusz Sompolski created SPARK-43744: - Summary: Maven test failure of ClientE2ETestSuite "interrupt all" tests Key: SPARK-43744 URL: https://issues.apache.org/jira/browse/SPARK-43744 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski When testing with ``` build/mvn clean install -DskipTests -Phive build/mvn test -pl connector/connect/client/jvm -Dtest=none -DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite ``` the tests fails with ``` 23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf. java.lang.NoClassDefFoundError: org/apache/spark/sql/connect/client/SparkResult at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.getDeclaredMethod(Class.java:2128) at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643) at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79) at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520) at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.(ObjectStreamClass.java:494) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852) at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) at org.apache.spark.util.Utils$.deserialize(Utils.scala:148) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353) at org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:825) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$1(SparkConnectStreamHandler.scala:53) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:209) at org.apache.spark.sql.connect.artifact.SparkConnectArtifactManager$.withArtifactClassLoader(SparkConnectArtifactManager.scala:178) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:48) at org.apache.spark.sql.connect.service.SparkConne
[jira] [Created] (SPARK-43331) Spark Connect - SparkSession interruptAll
Juliusz Sompolski created SPARK-43331: - Summary: Spark Connect - SparkSession interruptAll Key: SPARK-43331 URL: https://issues.apache.org/jira/browse/SPARK-43331 Project: Spark Issue Type: Story Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski Add an "interruptAll" Api to client SparkSession, to allow interrupting all running executions in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-41497) Accumulator undercounting in the case of retry task with rdd cache
[ https://issues.apache.org/jira/browse/SPARK-41497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653648#comment-17653648 ] Juliusz Sompolski edited comment on SPARK-41497 at 1/2/23 4:22 PM: --- Note that this issue leads to a correctness issue in Delta Merge, because it depends on a SetAccumulator as a side output channel for gathering files that need to be rewritten by the Merge: https://github.com/delta-io/delta/blob/master/core/src/main/scala/org/apache/spark/sql/delta/commands/MergeIntoCommand.scala#L445-L449 Delta assumes that Spark accumulators can overcount (in some cases where task retries update them in duplicate), but it was assumed that it should never undercount and lose output like that... Missing some files there can result in duplicate records being inserted instead of existing records being updated. was (Author: juliuszsompolski): Note that this issue leads to a correctness issue in Delta Merge, because it depends on a SetAccumulator as a side output channel for gathering files that need to be rewritten by the Merge: https://github.com/delta-io/delta/blob/master/core/src/main/scala/org/apache/spark/sql/delta/commands/MergeIntoCommand.scala#L445-L449 Missing some files there can result in duplicate records being inserted instead of existing records being updated. > Accumulator undercounting in the case of retry task with rdd cache > -- > > Key: SPARK-41497 > URL: https://issues.apache.org/jira/browse/SPARK-41497 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.8, 3.0.3, 3.1.3, 3.2.2, 3.3.1 >Reporter: wuyi >Priority: Major > > Accumulator could be undercounted when the retried task has rdd cache. See > the example below and you could also find the completed and reproducible > example at > [https://github.com/apache/spark/compare/master...Ngone51:spark:fix-acc] > > {code:scala} > test("SPARK-XXX") { > // Set up a cluster with 2 executors > val conf = new SparkConf() > .setMaster("local-cluster[2, 1, > 1024]").setAppName("TaskSchedulerImplSuite") > sc = new SparkContext(conf) > // Set up a custom task scheduler. The scheduler will fail the first task > attempt of the job > // submitted below. In particular, the failed first attempt task would > success on computation > // (accumulator accounting, result caching) but only fail to report its > success status due > // to the concurrent executor lost. The second task attempt would success. > taskScheduler = setupSchedulerWithCustomStatusUpdate(sc) > val myAcc = sc.longAccumulator("myAcc") > // Initiate a rdd with only one partition so there's only one task and > specify the storage level > // with MEMORY_ONLY_2 so that the rdd result will be cached on both two > executors. > val rdd = sc.parallelize(0 until 10, 1).mapPartitions { iter => > myAcc.add(100) > iter.map(x => x + 1) > }.persist(StorageLevel.MEMORY_ONLY_2) > // This will pass since the second task attempt will succeed > assert(rdd.count() === 10) > // This will fail due to `myAcc.add(100)` won't be executed during the > second task attempt's > // execution. Because the second task attempt will load the rdd cache > directly instead of > // executing the task function so `myAcc.add(100)` is skipped. > assert(myAcc.value === 100) > } {code} > > We could also hit this issue with decommission even if the rdd only has one > copy. For example, decommission could migrate the rdd cache block to another > executor (the result is actually the same with 2 copies) and the > decommissioned executor lost before the task reports its success status to > the driver. > > And the issue is a bit more complicated than expected to fix. I have tried to > give some fixes but all of them are not ideal: > Option 1: Clean up any rdd cache related to the failed task: in practice, > this option can already fix the issue in most cases. However, theoretically, > rdd cache could be reported to the driver right after the driver cleans up > the failed task's caches due to asynchronous communication. So this option > can’t resolve the issue thoroughly; > Option 2: Disallow rdd cache reuse across the task attempts for the same > task: this option can 100% fix the issue. The problem is this way can also > affect the case where rdd cache can be reused across the attempts (e.g., when > there is no accumulator operation in the task), which can have perf > regression; > Option 3: Introduce accumulator cache: first, this requires a new framework > for supporting accumulator cache; second, the driver should improve its logic > to distinguish whether the accumulator cache value should be reported to the > user to avoid overcounting.
[jira] [Commented] (SPARK-41497) Accumulator undercounting in the case of retry task with rdd cache
[ https://issues.apache.org/jira/browse/SPARK-41497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653648#comment-17653648 ] Juliusz Sompolski commented on SPARK-41497: --- Note that this issue leads to a correctness issue in Delta Merge, because it depends on a SetAccumulator as a side output channel for gathering files that need to be rewritten by the Merge: https://github.com/delta-io/delta/blob/master/core/src/main/scala/org/apache/spark/sql/delta/commands/MergeIntoCommand.scala#L445-L449 Missing some files there can result in duplicate records being inserted instead of existing records being updated. > Accumulator undercounting in the case of retry task with rdd cache > -- > > Key: SPARK-41497 > URL: https://issues.apache.org/jira/browse/SPARK-41497 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.8, 3.0.3, 3.1.3, 3.2.2, 3.3.1 >Reporter: wuyi >Priority: Major > > Accumulator could be undercounted when the retried task has rdd cache. See > the example below and you could also find the completed and reproducible > example at > [https://github.com/apache/spark/compare/master...Ngone51:spark:fix-acc] > > {code:scala} > test("SPARK-XXX") { > // Set up a cluster with 2 executors > val conf = new SparkConf() > .setMaster("local-cluster[2, 1, > 1024]").setAppName("TaskSchedulerImplSuite") > sc = new SparkContext(conf) > // Set up a custom task scheduler. The scheduler will fail the first task > attempt of the job > // submitted below. In particular, the failed first attempt task would > success on computation > // (accumulator accounting, result caching) but only fail to report its > success status due > // to the concurrent executor lost. The second task attempt would success. > taskScheduler = setupSchedulerWithCustomStatusUpdate(sc) > val myAcc = sc.longAccumulator("myAcc") > // Initiate a rdd with only one partition so there's only one task and > specify the storage level > // with MEMORY_ONLY_2 so that the rdd result will be cached on both two > executors. > val rdd = sc.parallelize(0 until 10, 1).mapPartitions { iter => > myAcc.add(100) > iter.map(x => x + 1) > }.persist(StorageLevel.MEMORY_ONLY_2) > // This will pass since the second task attempt will succeed > assert(rdd.count() === 10) > // This will fail due to `myAcc.add(100)` won't be executed during the > second task attempt's > // execution. Because the second task attempt will load the rdd cache > directly instead of > // executing the task function so `myAcc.add(100)` is skipped. > assert(myAcc.value === 100) > } {code} > > We could also hit this issue with decommission even if the rdd only has one > copy. For example, decommission could migrate the rdd cache block to another > executor (the result is actually the same with 2 copies) and the > decommissioned executor lost before the task reports its success status to > the driver. > > And the issue is a bit more complicated than expected to fix. I have tried to > give some fixes but all of them are not ideal: > Option 1: Clean up any rdd cache related to the failed task: in practice, > this option can already fix the issue in most cases. However, theoretically, > rdd cache could be reported to the driver right after the driver cleans up > the failed task's caches due to asynchronous communication. So this option > can’t resolve the issue thoroughly; > Option 2: Disallow rdd cache reuse across the task attempts for the same > task: this option can 100% fix the issue. The problem is this way can also > affect the case where rdd cache can be reused across the attempts (e.g., when > there is no accumulator operation in the task), which can have perf > regression; > Option 3: Introduce accumulator cache: first, this requires a new framework > for supporting accumulator cache; second, the driver should improve its logic > to distinguish whether the accumulator cache value should be reported to the > user to avoid overcounting. For example, in the case of task retry, the value > should be reported. However, in the case of rdd cache reuse, the value > shouldn’t be reported (should it?); > Option 4: Do task success validation when a task trying to load the rdd > cache: this way defines a rdd cache is only valid/accessible if the task has > succeeded. This way could be either overkill or a bit complex (because > currently Spark would clean up the task state once it’s finished. So we need > to maintain a structure to know if task once succeeded or not. ) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spa
[jira] [Created] (SPARK-40918) Mismatch between ParquetFileFormat and FileSourceScanExec in # columns for WSCG.isTooManyFields when using _metadata
Juliusz Sompolski created SPARK-40918: - Summary: Mismatch between ParquetFileFormat and FileSourceScanExec in # columns for WSCG.isTooManyFields when using _metadata Key: SPARK-40918 URL: https://issues.apache.org/jira/browse/SPARK-40918 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Juliusz Sompolski _metadata.columns are taken into account in FileSourceScanExec.supportColumnar, but not when the parquet reader is created. This can result in Parquet reader outputting columnar (because it has less columns than WSCG.isTooManyFields), whereas FileSourceScanExec wants row output (because with the extra metadata columns it hits the isTooManyFields limit). I have a fix forthcoming. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20624) SPIP: Add better handling for node shutdown
[ https://issues.apache.org/jira/browse/SPARK-20624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17603577#comment-17603577 ] Juliusz Sompolski commented on SPARK-20624: --- [~holden] Are these new APIs documented? I can't seem to find them in the official Spark documentation. Should they be mentioned e.g. in https://spark.apache.org/docs/latest/job-scheduling.html#graceful-decommission-of-executors ? > SPIP: Add better handling for node shutdown > --- > > Key: SPARK-20624 > URL: https://issues.apache.org/jira/browse/SPARK-20624 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Holden Karau >Priority: Major > > While we've done some good work with better handling when Spark is choosing > to decommission nodes (SPARK-7955), it might make sense in environments where > we get preempted without our own choice (e.g. YARN over-commit, EC2 spot > instances, GCE Preemptiable instances, etc.) to do something for the data on > the node (or at least not schedule any new tasks). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities
[ https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski resolved SPARK-37090. --- Resolution: Duplicate > Upgrade libthrift to resolve security vulnerabilities > - > > Key: SPARK-37090 > URL: https://issues.apache.org/jira/browse/SPARK-37090 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Juliusz Sompolski >Priority: Major > > Currently, Spark uses libthrift 0.12, which has reported high severity > security vulnerabilities > https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift > Upgrade to 0.14 to get rid of vulnerabilities. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities
[ https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432731#comment-17432731 ] Juliusz Sompolski commented on SPARK-37090: --- Duplicate of https://issues.apache.org/jira/browse/SPARK-36994 > Upgrade libthrift to resolve security vulnerabilities > - > > Key: SPARK-37090 > URL: https://issues.apache.org/jira/browse/SPARK-37090 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Juliusz Sompolski >Priority: Major > > Currently, Spark uses libthrift 0.12, which has reported high severity > security vulnerabilities > https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift > Upgrade to 0.14 to get rid of vulnerabilities. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities
Juliusz Sompolski created SPARK-37090: - Summary: Upgrade libthrift to resolve security vulnerabilities Key: SPARK-37090 URL: https://issues.apache.org/jira/browse/SPARK-37090 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0, 3.1.0, 3.0.0, 3.3.0 Reporter: Juliusz Sompolski Currently, Spark uses libthrift 0.12, which has reported high severity security vulnerabilities https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift Upgrade to 0.14 to get rid of vulnerabilities. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34152) CreateViewStatement.child should be a real child
[ https://issues.apache.org/jira/browse/SPARK-34152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267166#comment-17267166 ] Juliusz Sompolski commented on SPARK-34152: --- Same applies to AlterViewStatement. > CreateViewStatement.child should be a real child > > > Key: SPARK-34152 > URL: https://issues.apache.org/jira/browse/SPARK-34152 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > > Similar to `CreateTableAsSelectStatement`, the input query of > `CreateViewStatement` should be a child and get analyzed during the analysis > phase. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32132) Thriftserver interval returns "4 weeks 2 days" in 2.4 and "30 days" in 3.0
[ https://issues.apache.org/jira/browse/SPARK-32132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148848#comment-17148848 ] Juliusz Sompolski commented on SPARK-32132: --- Also 2.4 adds "interval" at the start, while 3.0 does not. E.g. "interval 3 days" in 2.4 and "3 days" in 3.0. I actually think that the new 3.0 results are better / more standard, and I haven't heard about anyone complaining that it broke the way they parse it. Edit: [~cloud_fan] posting now the above comment that I thought I posted yesterday, but it stayed open and not send in an open tab. It causes some issues with unit tests, but I think it shouldn't cause real world problems, and in any case the new format is likely better for the future. Thanks for explaining. > Thriftserver interval returns "4 weeks 2 days" in 2.4 and "30 days" in 3.0 > -- > > Key: SPARK-32132 > URL: https://issues.apache.org/jira/browse/SPARK-32132 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Juliusz Sompolski >Priority: Minor > > In https://github.com/apache/spark/pull/26418, a setting > spark.sql.dialect.intervalOutputStyle was implemented, to control interval > output style. This PR also removed "toString" from CalendarInterval. This > change got reverted in https://github.com/apache/spark/pull/27304, and the > CalendarInterval.toString got implemented back in > https://github.com/apache/spark/pull/26572. > But it behaves differently now: In 2.4 "4 weeks 2 days" are returned, and 3.0 > returns "30 days". > Thriftserver uses HiveResults.toHiveString, which uses > CalendarInterval.toString to return interval results as string. The results > are now different in 3.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32132) Thriftserver interval returns "4 weeks 2 days" in 2.4 and "30 days" in 3.0
[ https://issues.apache.org/jira/browse/SPARK-32132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148086#comment-17148086 ] Juliusz Sompolski commented on SPARK-32132: --- cc [~hyukjin.kwon] [~cloud_fan] [~Qin Yao] > Thriftserver interval returns "4 weeks 2 days" in 2.4 and "30 days" in 3.0 > -- > > Key: SPARK-32132 > URL: https://issues.apache.org/jira/browse/SPARK-32132 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Juliusz Sompolski >Priority: Minor > > In https://github.com/apache/spark/pull/26418, a setting > spark.sql.dialect.intervalOutputStyle was implemented, to control interval > output style. This PR also removed "toString" from CalendarInterval. This > change got reverted in https://github.com/apache/spark/pull/27304, and the > CalendarInterval.toString got implemented back in > https://github.com/apache/spark/pull/26572. > But it behaves differently now: In 2.4 "4 weeks 2 days" are returned, and 3.0 > returns "30 days". > Thriftserver uses HiveResults.toHiveString, which uses > CalendarInterval.toString to return interval results as string. The results > are now different in 3.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32132) Thriftserver interval returns "4 weeks 2 days" in 2.4 and "30 days" in 3.0
Juliusz Sompolski created SPARK-32132: - Summary: Thriftserver interval returns "4 weeks 2 days" in 2.4 and "30 days" in 3.0 Key: SPARK-32132 URL: https://issues.apache.org/jira/browse/SPARK-32132 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Juliusz Sompolski In https://github.com/apache/spark/pull/26418, a setting spark.sql.dialect.intervalOutputStyle was implemented, to control interval output style. This PR also removed "toString" from CalendarInterval. This change got reverted in https://github.com/apache/spark/pull/27304, and the CalendarInterval.toString got implemented back in https://github.com/apache/spark/pull/26572. But it behaves differently now: In 2.4 "4 weeks 2 days" are returned, and 3.0 returns "30 days". Thriftserver uses HiveResults.toHiveString, which uses CalendarInterval.toString to return interval results as string. The results are now different in 3.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32021) make_interval does not accept seconds >100
[ https://issues.apache.org/jira/browse/SPARK-32021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-32021: -- Description: In make_interval(years, months, weeks, days, hours, mins, secs), secs are defined as Decimal(8, 6), which turns into null if the value of the expression overflows 100 seconds. Larger seconds values should be allowed. This has been reported by Simba, who wants to use make_interval to implement translation for TIMESTAMP_ADD ODBC function in Spark 3.0. ODBC {fn TIMESTAMPADD(SECOND, integer_exp, timestamp} fails when integer_exp returns seconds values >= 100. was: In make_interval(years, months, weeks, days, hours, mins, secs), secs are defined as Decimal(8, 6), which turns into null if the value of the expression overflows 100 seconds. Larger seconds values should be allowed. This has been reported by Simba, who wants to use make_interval to implement translation for TIMESTAMP_ADD ODBC function in Spark 3.0. > make_interval does not accept seconds >100 > -- > > Key: SPARK-32021 > URL: https://issues.apache.org/jira/browse/SPARK-32021 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Juliusz Sompolski >Priority: Major > > In make_interval(years, months, weeks, days, hours, mins, secs), secs are > defined as Decimal(8, 6), which turns into null if the value of the > expression overflows 100 seconds. > Larger seconds values should be allowed. > This has been reported by Simba, who wants to use make_interval to implement > translation for TIMESTAMP_ADD ODBC function in Spark 3.0. > ODBC {fn TIMESTAMPADD(SECOND, integer_exp, timestamp} fails when integer_exp > returns seconds values >= 100. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32021) make_interval does not accept seconds >100
Juliusz Sompolski created SPARK-32021: - Summary: make_interval does not accept seconds >100 Key: SPARK-32021 URL: https://issues.apache.org/jira/browse/SPARK-32021 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Juliusz Sompolski In make_interval(years, months, weeks, days, hours, mins, secs), secs are defined as Decimal(8, 6), which turns into null if the value of the expression overflows 100 seconds. Larger seconds values should be allowed. This has been reported by Simba, who wants to use make_interval to implement translation for TIMESTAMP_ADD ODBC function in Spark 3.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31863) Thriftserver not setting active SparkSession, SQLConf.get not getting session configs correctly
Juliusz Sompolski created SPARK-31863: - Summary: Thriftserver not setting active SparkSession, SQLConf.get not getting session configs correctly Key: SPARK-31863 URL: https://issues.apache.org/jira/browse/SPARK-31863 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Juliusz Sompolski Thriftserver is not setting the active SparkSession. Because of that, configuration obtained with SQLConf.get is not the session configuration. This makes many configs set by "set" in the session not work correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31859) Thriftserver with spark.sql.datetime.java8API.enabled=true
[ https://issues.apache.org/jira/browse/SPARK-31859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119002#comment-17119002 ] Juliusz Sompolski commented on SPARK-31859: --- Actually, it's already in HiveResult.toHiveString... What happens is: - its taken from SparkRow in SparkExecuteStatement.addNonNullColumnValue using "to += from.getAs[Timestamp](ordinal)". This appears to not complain that it's in fact an Instant, not Timestamp. - in ColumnValue.timestampValue it gets turned into a String as value.toString(). That somehow also doesn't complain that the object is an Instant, not a Timestamp? - this gets returned to the client as String, which complains that it cannot read it back into Timestamp.. I will fix it together with https://issues.apache.org/jira/browse/SPARK-31861 > Thriftserver with spark.sql.datetime.java8API.enabled=true > -- > > Key: SPARK-31859 > URL: https://issues.apache.org/jira/browse/SPARK-31859 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Juliusz Sompolski >Priority: Major > > {code} > test("spark.sql.datetime.java8API.enabled=true") { > withJdbcStatement() { st => > st.execute("set spark.sql.datetime.java8API.enabled=true") > val rs = st.executeQuery("select timestamp '2020-05-28 00:00:00'") > rs.next() > // scalastyle:off > println(rs.getObject(1)) > } > } > {code} > fails with > {code} > HiveThriftBinaryServerSuite: > java.lang.IllegalArgumentException: Timestamp format must be -mm-dd > hh:mm:ss[.f] > at java.sql.Timestamp.valueOf(Timestamp.java:204) > at > org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:444) > at > org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:424) > at > org.apache.hive.jdbc.HiveBaseResultSet.getObject(HiveBaseResultSet.java:464 > {code} > It seems it might be needed in HiveResult.toHiveString? > cc [~maxgekk] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31861) Thriftserver collecting timestamp not using spark.sql.session.timeZone
Juliusz Sompolski created SPARK-31861: - Summary: Thriftserver collecting timestamp not using spark.sql.session.timeZone Key: SPARK-31861 URL: https://issues.apache.org/jira/browse/SPARK-31861 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Juliusz Sompolski If JDBC client is in TimeZone PST, and sets spark.sql.session.timeZone to PST, and sends a query "SELECT timestamp '2020-05-20 12:00:00'", and the JVM timezone of the Spark cluster is e.g. CET, then - the timestamp literal in the query is interpreted as 12:00:00 PST, i.e. 21:00:00 CET - but currently when it's returned, the timestamps are collected from the query with a collect() in https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala#L299, and then in the end Timestamps are turned into strings using a t.toString() in https://github.com/apache/spark/blob/master/sql/hive-thriftserver/v2.3/src/main/java/org/apache/hive/service/cli/ColumnValue.java#L138 This will use the Spark cluster TimeZone. That results in "21:00:00" returned to the JDBC application. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31859) Thriftserver with spark.sql.datetime.java8API.enabled=true
[ https://issues.apache.org/jira/browse/SPARK-31859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-31859: -- Issue Type: Bug (was: Improvement) > Thriftserver with spark.sql.datetime.java8API.enabled=true > -- > > Key: SPARK-31859 > URL: https://issues.apache.org/jira/browse/SPARK-31859 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Juliusz Sompolski >Priority: Major > > {code} > test("spark.sql.datetime.java8API.enabled=true") { > withJdbcStatement() { st => > st.execute("set spark.sql.datetime.java8API.enabled=true") > val rs = st.executeQuery("select timestamp '2020-05-28 00:00:00'") > rs.next() > // scalastyle:off > println(rs.getObject(1)) > } > } > {code} > fails with > {code} > HiveThriftBinaryServerSuite: > java.lang.IllegalArgumentException: Timestamp format must be -mm-dd > hh:mm:ss[.f] > at java.sql.Timestamp.valueOf(Timestamp.java:204) > at > org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:444) > at > org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:424) > at > org.apache.hive.jdbc.HiveBaseResultSet.getObject(HiveBaseResultSet.java:464 > {code} > It seems it might be needed in HiveResult.toHiveString? > cc [~maxgekk] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31859) Thriftserver with spark.sql.datetime.java8API.enabled=true
Juliusz Sompolski created SPARK-31859: - Summary: Thriftserver with spark.sql.datetime.java8API.enabled=true Key: SPARK-31859 URL: https://issues.apache.org/jira/browse/SPARK-31859 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Juliusz Sompolski {code} test("spark.sql.datetime.java8API.enabled=true") { withJdbcStatement() { st => st.execute("set spark.sql.datetime.java8API.enabled=true") val rs = st.executeQuery("select timestamp '2020-05-28 00:00:00'") rs.next() // scalastyle:off println(rs.getObject(1)) } } {code} fails with {code} HiveThriftBinaryServerSuite: java.lang.IllegalArgumentException: Timestamp format must be -mm-dd hh:mm:ss[.f] at java.sql.Timestamp.valueOf(Timestamp.java:204) at org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:444) at org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:424) at org.apache.hive.jdbc.HiveBaseResultSet.getObject(HiveBaseResultSet.java:464 {code} It seems it might be needed in HiveResult.toHiveString? cc [~maxgekk] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31388) org.apache.spark.sql.hive.thriftserver.CliSuite result matching is flaky
Juliusz Sompolski created SPARK-31388: - Summary: org.apache.spark.sql.hive.thriftserver.CliSuite result matching is flaky Key: SPARK-31388 URL: https://issues.apache.org/jira/browse/SPARK-31388 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Juliusz Sompolski CliSuite.runCliWithin result matching has issues. Will describe in PR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31204) HiveResult compatibility for DatasourceV2 command
Juliusz Sompolski created SPARK-31204: - Summary: HiveResult compatibility for DatasourceV2 command Key: SPARK-31204 URL: https://issues.apache.org/jira/browse/SPARK-31204 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Juliusz Sompolski HiveResult performs some compatibility matches and conversions for commands to be compatible with Hive output, e.g.: {code} case ExecutedCommandExec(_: DescribeCommandBase) => // If it is a describe command for a Hive table, we want to have the output format // be similar with Hive. ... // SHOW TABLES in Hive only output table names, while ours output database, table name, isTemp. case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended => {code} It is needed for DatasourceV2 commands as well (eg. ShowTablesExec...). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31051) Thriftserver operations other than SparkExecuteStatementOperation do not call onOperationClosed
[ https://issues.apache.org/jira/browse/SPARK-31051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski resolved SPARK-31051. --- Resolution: Not A Problem I'm just blind. They are there. > Thriftserver operations other than SparkExecuteStatementOperation do not call > onOperationClosed > --- > > Key: SPARK-31051 > URL: https://issues.apache.org/jira/browse/SPARK-31051 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Juliusz Sompolski >Priority: Major > > In Spark 3.0 onOperationClosed was implemented in HIveThriftServer2Listener > to track closing the operation in the thriftserver (after client finishes > fetching). > However, it seems that only SparkExecuteStatementOperation calls it in it's > close() function. Other operations need to do this as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31051) Thriftserver operations other than SparkExecuteStatementOperation do not call onOperationClosed
Juliusz Sompolski created SPARK-31051: - Summary: Thriftserver operations other than SparkExecuteStatementOperation do not call onOperationClosed Key: SPARK-31051 URL: https://issues.apache.org/jira/browse/SPARK-31051 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Juliusz Sompolski In Spark 3.0 onOperationClosed was implemented in HIveThriftServer2Listener to track closing the operation in the thriftserver (after client finishes fetching). However, it seems that only SparkExecuteStatementOperation calls it in it's close() function. Other operations need to do this as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org