from:"Juliusz Sompolski \(JIRA\)"

[jira] [Created] (SPARK-49263) Spark Connect python client: Consistently handle boolean Dataframe reader options

2024-08-16 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-49263:
-

 Summary: Spark Connect python client: Consistently handle boolean 
Dataframe reader options
 Key: SPARK-49263
 URL: https://issues.apache.org/jira/browse/SPARK-49263
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski


Python connect client spark.read.option should be using to_str



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48196) Make lazy val plans in QueryExecution Trys to not re-execute on error

2024-05-08 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-48196:
-

 Summary: Make lazy val plans in QueryExecution Trys to not 
re-execute on error
 Key: SPARK-48196
 URL: https://issues.apache.org/jira/browse/SPARK-48196
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski


To achieve more consistency, keep capture errors from 
analysis/optimization/execution  if the lazy val evaluation fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48195) Keep and RDD created by SparkPlan doExecute

2024-05-08 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-48195:
-

 Summary: Keep and RDD created by SparkPlan doExecute
 Key: SPARK-48195
 URL: https://issues.apache.org/jira/browse/SPARK-48195
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski


For more consistency, don't make SparkPlan execute generate a new RDD each 
time, but reuse the one created by doExecute once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46186) Invalid Spark Connect execution state transition if interrupted before thread started

2023-11-30 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-46186:
-

 Summary: Invalid Spark Connect execution state transition if 
interrupted before thread started
 Key: SPARK-46186
 URL: https://issues.apache.org/jira/browse/SPARK-46186
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski


Fix edge case where interrupting execution before the ExecuteThreadRunner 
started could lead to illegal state transition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46011) Spark Connect session heartbeat / keepalive

2023-11-23 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-46011:
--
Epic Link: SPARK-43754

> Spark Connect session heartbeat / keepalive
> ---
>
> Key: SPARK-46011
> URL: https://issues.apache.org/jira/browse/SPARK-46011
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46011) Spark Connect session heartbeat / keepalive

2023-11-23 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski resolved SPARK-46011.
---
Resolution: Won't Fix

Decided to not add this at this point.

> Spark Connect session heartbeat / keepalive
> ---
>
> Key: SPARK-46011
> URL: https://issues.apache.org/jira/browse/SPARK-46011
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46075) Refactor SparkConnectSessionManager to not use guava cache

2023-11-23 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-46075:
-

 Summary: Refactor SparkConnectSessionManager to not use guava cache
 Key: SPARK-46075
 URL: https://issues.apache.org/jira/browse/SPARK-46075
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski


Guava cache gives limited control over session eviction. For example, there 
can't be more complex policies of session eviction. Refactor it to be more 
similar to SparkConnectExecutionManager. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46011) Spark Connect session heartbeat / keepalive

2023-11-20 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-46011:
-

 Summary: Spark Connect session heartbeat / keepalive
 Key: SPARK-46011
 URL: https://issues.apache.org/jira/browse/SPARK-46011
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45780) Propagate all Spark Connect client threadlocal in InheritableThread

2023-11-03 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-45780:
-

 Summary: Propagate all Spark Connect client threadlocal in 
InheritableThread
 Key: SPARK-45780
 URL: https://issues.apache.org/jira/browse/SPARK-45780
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski


Propagate all thread locals that can be set in SparkConnectClient, not only 
'tags'



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43754) Spark Connect Session & Query lifecycle

2023-10-26 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-43754:
--
Affects Version/s: 4.0.0

> Spark Connect Session & Query lifecycle
> ---
>
> Key: SPARK-43754
> URL: https://issues.apache.org/jira/browse/SPARK-43754
> Project: Spark
>  Issue Type: Epic
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Currently, queries in Spark Connect are executed within the RPC handler.
> We want to detach the RPC interface from actual sessions and execution, so 
> that we can make the interface more flexible
>  * maintain long running sessions, independent of unbroken GRPC channel
>  * be able to cancel queries
>  * have different interfaces to query results than push from server



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45680) ReleaseSession to close Spark Connect session

2023-10-26 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-45680:
-

 Summary: ReleaseSession to close Spark Connect session
 Key: SPARK-45680
 URL: https://issues.apache.org/jira/browse/SPARK-45680
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45647) Spark Connect API to propagate per request context

2023-10-24 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-45647:
-

 Summary: Spark Connect API to propagate per request context
 Key: SPARK-45647
 URL: https://issues.apache.org/jira/browse/SPARK-45647
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski


There is an extension point to pass arbitrary proto extension in Spark Connect 
UserContext, but there is no API to do this in the client. Add a SparkSession 
API to attach extra protos that will be sent with all requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45435) Document that lazy checkpoint may not be a consistent

2023-10-06 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-45435:
--
Summary: Document that lazy checkpoint may not be a consistent  (was: 
Document that lazy checkpoint may cause undeterministm)

> Document that lazy checkpoint may not be a consistent
> -
>
> Key: SPARK-45435
> URL: https://issues.apache.org/jira/browse/SPARK-45435
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>  Labels: pull-request-available
>
> Some people may want to use checkpoint to get a consistent snapshot of the 
> Dataset / RDD. Warn that this is not the case with lazy checkpoint, because 
> checkpoint is computed only at the end of the first action, and the data used 
> during the first action may be different because of non-determinism and 
> retries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45435) Document that lazy checkpoint may cause undeterministm

2023-10-06 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-45435:
-

 Summary: Document that lazy checkpoint may cause undeterministm
 Key: SPARK-45435
 URL: https://issues.apache.org/jira/browse/SPARK-45435
 Project: Spark
  Issue Type: Documentation
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski


Some people may want to use checkpoint to get a consistent snapshot of the 
Dataset / RDD. Warn that this is not the case with lazy checkpoint, because 
checkpoint is computed only at the end of the first action, and the data used 
during the first action may be different because of non-determinism and retries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45416) Sanity check that Spark Connect returns arrow batches in order

2023-10-04 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-45416:
-

 Summary: Sanity check that Spark Connect returns arrow batches in 
order
 Key: SPARK-45416
 URL: https://issues.apache.org/jira/browse/SPARK-45416
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45206) Shutting down SparkConnectClient can have outstanding ReleaseExecute

2023-09-18 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-45206:
-

 Summary: Shutting down SparkConnectClient can have outstanding 
ReleaseExecute
 Key: SPARK-45206
 URL: https://issues.apache.org/jira/browse/SPARK-45206
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski


In Spark Connect scala client, there can be outstanding asynchronous 
ReleaseExecute calls when the client is shutdown. 

In ExecutePlanResponseReattachableIterator we (ab)use a grpc thread in 
createRetryingReleaseExecuteResponseObserver to run it.

 
When we do {{{}SparkConnectClient.shutdown{}}}, which does 
{{channel.shutdownNow()}}
it kills it before it finishes the ReleaseExecute. Maybe a more graceful 
shutdown would work, or we should move to a more explicit threadpool that we 
manage, like in python?

See discussion in https://github.com/apache/spark/pull/42929/files#r1329071826
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45167) Python Spark Connect client does not call `releaseAll`

2023-09-14 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-45167:
--
Epic Link: SPARK-43754  (was: SPARK-39375)

> Python Spark Connect client does not call `releaseAll`
> --
>
> Key: SPARK-45167
> URL: https://issues.apache.org/jira/browse/SPARK-45167
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Martin Grund
>Priority: Major
>
> The Python client does not call release all previous responses on the server 
> and thus does not properly close the queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45133) Mark Spark Connect queries as finished when all result tasks are finished, not sent

2023-09-12 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-45133:
-

 Summary: Mark Spark Connect queries as finished when all result 
tasks are finished, not sent
 Key: SPARK-45133
 URL: https://issues.apache.org/jira/browse/SPARK-45133
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44872) Testing reattachable execute

2023-08-18 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44872:
-

 Summary: Testing reattachable execute
 Key: SPARK-44872
 URL: https://issues.apache.org/jira/browse/SPARK-44872
 Project: Spark
  Issue Type: Task
  Components: Connect
Affects Versions: 3.5.0, 4.0.0
Reporter: Juliusz Sompolski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44855) Small tweaks to attaching ExecuteGrpcResponseSender to ExecuteResponseObserver

2023-08-17 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44855:
-

 Summary: Small tweaks to attaching ExecuteGrpcResponseSender to 
ExecuteResponseObserver
 Key: SPARK-44855
 URL: https://issues.apache.org/jira/browse/SPARK-44855
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski


Small improvements can be made to the way new ExecuteGrpcResponseSender is 
attached to observer.
 * Since now we have addGrpcResponseSender in ExecuteHolder, it should be 
ExecuteHolder responsibility to interrupt the old sender and that there is only 
one at a time, and to ExecuteResponseObserver's responsibility
 * executeObserver is used as a lock for synchronization. An explicit lock 
object could be better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44730) Spark Connect: Cleaner thread not stopped when SparkSession stops

2023-08-17 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski resolved SPARK-44730.
---
Resolution: Not A Problem

> Spark Connect: Cleaner thread not stopped when SparkSession stops
> -
>
> Key: SPARK-44730
> URL: https://issues.apache.org/jira/browse/SPARK-44730
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Minor
>
> Spark Connect scala client SparkSession has a cleaner, which starts a daemon 
> thread to clean up Closeable objects after GC. This daemon thread is never 
> stopped, and every SparkSession creates a new one.
> Cleaner implements a stop() function, but no-one ever calls it. Possibly 
> because even after SparkSession.stop(), the cleaner may still be needed when 
> remaining references are GCed... For this reason it seems that the Cleaner 
> should rather be a global singleton than within a session.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44730) Spark Connect: Cleaner thread not stopped when SparkSession stops

2023-08-17 Thread Juliusz Sompolski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755613#comment-17755613
 ] 

Juliusz Sompolski commented on SPARK-44730:
---

Silly me, Cleaner is already global and I looked wrong.

> Spark Connect: Cleaner thread not stopped when SparkSession stops
> -
>
> Key: SPARK-44730
> URL: https://issues.apache.org/jira/browse/SPARK-44730
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Minor
>
> Spark Connect scala client SparkSession has a cleaner, which starts a daemon 
> thread to clean up Closeable objects after GC. This daemon thread is never 
> stopped, and every SparkSession creates a new one.
> Cleaner implements a stop() function, but no-one ever calls it. Possibly 
> because even after SparkSession.stop(), the cleaner may still be needed when 
> remaining references are GCed... For this reason it seems that the Cleaner 
> should rather be a global singleton than within a session.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44849) Expose SparkConnectExecutionManager.listActiveExecutions

2023-08-17 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44849:
-

 Summary: Expose SparkConnectExecutionManager.listActiveExecutions
 Key: SPARK-44849
 URL: https://issues.apache.org/jira/browse/SPARK-44849
 Project: Spark
  Issue Type: Task
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44835) SparkConnect ReattachExecute could raise before ExecutePlan even attaches.

2023-08-16 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44835:
-

 Summary: SparkConnect ReattachExecute could raise before 
ExecutePlan even attaches.
 Key: SPARK-44835
 URL: https://issues.apache.org/jira/browse/SPARK-44835
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski


If a ReattachExecute is sent very quickly after ExecutePlan, the following 
could happen:
 * ExecutePlan didn't reach 
*executeHolder.runGrpcResponseSender(responseSender)* in 
SparkConnectExecutePlanHandler yet.
 * ReattachExecute races around and reaches 
*executeHolder.runGrpcResponseSender(responseSender)* in 
SparkConnectReattachExecuteHandler first.
 * When ExecutePlan reaches 
{*}executeHolder.runGrpcResponseSender(responseSender){*}, and 
executionObserver.attachConsumer(this) is called in ExecuteGrpcResponseSender 
of ExecutePlan, it will kick out the ExecuteGrpcResponseSender or 
ReattachExecute.

So even though ReattachExecute came later, it will get interrupted by the 
earlier ExecutePlan and finish with a *SparkSQLException(errorClass = 
"INVALID_CURSOR.DISCONNECTED", Map.empty)* (which was assumed to be a situation 
where a stale hanging RPC is replaced by a reconnection.

 

That would be very unlikely to happen in practice, because ExecutePlan 
shouldn't be abandoned so fast, but because of  
https://issues.apache.org/jira/browse/SPARK-44833 it is slightly more likely 
(though there there is also a 50ms sleep before retry, which again make it 
unlikely)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44833) Spark Connect reattach when initial ExecutePlan didn't reach server doing too eager Reattach

2023-08-16 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44833:
-

 Summary: Spark Connect reattach when initial ExecutePlan didn't 
reach server doing too eager Reattach
 Key: SPARK-44833
 URL: https://issues.apache.org/jira/browse/SPARK-44833
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski


In
{code:java}
case ex: StatusRuntimeException
if Option(StatusProto.fromThrowable(ex))
  .exists(_.getMessage.contains("INVALID_HANDLE.OPERATION_NOT_FOUND")) =>
  if (lastReturnedResponseId.isDefined) {
throw new IllegalStateException(
  "OPERATION_NOT_FOUND on the server but responses were already received 
from it.",
  ex)
  }
  // Try a new ExecutePlan, and throw upstream for retry.
->  iter = rawBlockingStub.executePlan(initialRequest)
->  throw new GrpcRetryHandler.RetryException {code}
we call executePlan, and throw RetryException to have an exception handled 
upstream.

Then it goes to
{code:java}
retry {
  if (firstTry) {
// on first try, we use the existing iter.
firstTry = false
  } else {
// on retry, the iter is borked, so we need a new one
->iter = rawBlockingStub.reattachExecute(createReattachExecuteRequest())
  } {code}
and because it's not firstTry, immediately does reattach.

This causes no failure - the reattach will work and attach to the query, the 
original executePlan will get detached. But it could be improved.

Same issue is also present in python reattach.py.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44765) Make ReleaseExecute retry in ExecutePlanResponseReattachableIterator reuse common mechanism

2023-08-16 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski resolved SPARK-44765.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Make ReleaseExecute retry in ExecutePlanResponseReattachableIterator reuse 
> common mechanism
> ---
>
> Key: SPARK-44765
> URL: https://issues.apache.org/jira/browse/SPARK-44765
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43754) Spark Connect Session & Query lifecycle

2023-08-14 Thread Juliusz Sompolski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754182#comment-17754182
 ] 

Juliusz Sompolski commented on SPARK-43754:
---

Needed for testing of this: https://issues.apache.org/jira/browse/SPARK-44806

> Spark Connect Session & Query lifecycle
> ---
>
> Key: SPARK-43754
> URL: https://issues.apache.org/jira/browse/SPARK-43754
> Project: Spark
>  Issue Type: Epic
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Currently, queries in Spark Connect are executed within the RPC handler.
> We want to detach the RPC interface from actual sessions and execution, so 
> that we can make the interface more flexible
>  * maintain long running sessions, independent of unbroken GRPC channel
>  * be able to cancel queries
>  * have different interfaces to query results than push from server



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44806) Separate connect-client-jvm-internal

2023-08-14 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44806:
-

 Summary: Separate connect-client-jvm-internal
 Key: SPARK-44806
 URL: https://issues.apache.org/jira/browse/SPARK-44806
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44784) Failure in testing `SparkSessionE2ESuite` using Maven

2023-08-14 Thread Juliusz Sompolski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753974#comment-17753974
 ] 

Juliusz Sompolski commented on SPARK-44784:
---

This is a classloader issue with the serialization of an UDF pulling classes 
from the client's class context that it doesn't necessarily need, and then 
resulting in ClassNotFoundException on the server.
org.apache.spark.SparkException: org/apache/spark/sql/connect/client/SparkResult
It's not specifically related to the tests in that suite, but can happen 
anyplace with UDFs.

> Failure in testing `SparkSessionE2ESuite` using Maven
> -
>
> Key: SPARK-44784
> URL: https://issues.apache.org/jira/browse/SPARK-44784
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Blocker
>
> [https://github.com/apache/spark/actions/runs/5832898984/job/15819181762]
>  
> The following failures exist in the daily Maven tests, we should fix them 
> before Apache Spark 3.5.0 release:
> {code:java}
> SparkSessionE2ESuite:
> 4638- interrupt all - background queries, foreground interrupt *** FAILED ***
> 4639  The code passed to eventually never returned normally. Attempted 30 
> times over 20.092924822 seconds. Last failure message: Some("unexpected 
> failure in q1: org.apache.spark.SparkException: 
> org/apache/spark/sql/connect/client/SparkResult") was not empty Error not 
> empty: Some(unexpected failure in q1: org.apache.spark.SparkException: 
> org/apache/spark/sql/connect/client/SparkResult). 
> (SparkSessionE2ESuite.scala:71)
> 4640- interrupt all - foreground queries, background interrupt *** FAILED ***
> 4641  "org/apache/spark/sql/connect/client/SparkResult" did not contain 
> "OPERATION_CANCELED" Unexpected exception: org.apache.spark.SparkException: 
> org/apache/spark/sql/connect/client/SparkResult 
> (SparkSessionE2ESuite.scala:105)
> 4642- interrupt tag *** FAILED ***
> 4643  The code passed to eventually never returned normally. Attempted 30 
> times over 20.069445587 seconds. Last failure message: ListBuffer() had 
> length 0 instead of expected length 2 Interrupted operations: ListBuffer().. 
> (SparkSessionE2ESuite.scala:199)
> 4644- interrupt operation *** FAILED ***
> 4645  org.apache.spark.SparkException: 
> org/apache/spark/sql/connect/client/SparkResult
> 4646  at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toThrowable(GrpcExceptionConverter.scala:89)
> 4647  at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:38)
> 4648  at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:46)
> 4649  at 
> org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:83)
> 4650  at 
> org.apache.spark.sql.connect.client.SparkResult.operationId(SparkResult.scala:174)
> 4651  at 
> org.apache.spark.sql.SparkSessionE2ESuite.$anonfun$new$31(SparkSessionE2ESuite.scala:243)
> 4652  at 
> org.apache.spark.sql.connect.client.util.RemoteSparkSession.$anonfun$test$1(RemoteSparkSession.scala:243)
> 4653  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> 4654  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> 4655  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> 4656  ... {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44765) Make ReleaseExecute retry in ExecutePlanResponseReattachableIterator reuse common mechanism

2023-08-10 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44765:
-

 Summary: Make ReleaseExecute retry in 
ExecutePlanResponseReattachableIterator reuse common mechanism
 Key: SPARK-44765
 URL: https://issues.apache.org/jira/browse/SPARK-44765
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44762) Add more documentation and examples for using job tags for interrupt

2023-08-10 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44762:
-

 Summary: Add more documentation and examples for using job tags 
for interrupt
 Key: SPARK-44762
 URL: https://issues.apache.org/jira/browse/SPARK-44762
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski


Add documentation to spark.addJob tag with similar examples and explanation 
like SparkContext.setJobGroup



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44738) Spark Connect Reattach misses metadata propagation

2023-08-09 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-44738:
--
Epic Link: SPARK-43754

> Spark Connect Reattach misses metadata propagation
> --
>
> Key: SPARK-44738
> URL: https://issues.apache.org/jira/browse/SPARK-44738
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Blocker
> Fix For: 3.5.0, 4.0.0
>
>
> Currently, in the Spark Connect Reattach handler client metadata is not 
> propgated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44730) Spark Connect: Cleaner thread not stopped when SparkSession stops

2023-08-08 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44730:
-

 Summary: Spark Connect: Cleaner thread not stopped when 
SparkSession stops
 Key: SPARK-44730
 URL: https://issues.apache.org/jira/browse/SPARK-44730
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0, 4.0.0
Reporter: Juliusz Sompolski


Spark Connect scala client SparkSession has a cleaner, which starts a daemon 
thread to clean up Closeable objects after GC. This daemon thread is never 
stopped, and every SparkSession creates a new one.

Cleaner implements a stop() function, but no-one ever calls it. Possibly 
because even after SparkSession.stop(), the cleaner may still be needed when 
remaining references are GCed... For this reason it seems that the Cleaner 
should rather be a global singleton than within a session.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43754) Spark Connect Session & Query lifecycle

2023-08-08 Thread Juliusz Sompolski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752130#comment-17752130
 ] 

Juliusz Sompolski commented on SPARK-43754:
---

Not in epic, but nice-to-have refactoring: 
https://issues.apache.org/jira/browse/SPARK-43756

> Spark Connect Session & Query lifecycle
> ---
>
> Key: SPARK-43754
> URL: https://issues.apache.org/jira/browse/SPARK-43754
> Project: Spark
>  Issue Type: Epic
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Currently, queries in Spark Connect are executed within the RPC handler.
> We want to detach the RPC interface from actual sessions and execution, so 
> that we can make the interface more flexible
>  * maintain long running sessions, independent of unbroken GRPC channel
>  * be able to cancel queries
>  * have different interfaces to query results than push from server



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43756) Spark Connect - prefer to pass around SessionHolder / ExecuteHolder more

2023-08-08 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-43756:
--
Epic Link:   (was: SPARK-43754)

> Spark Connect - prefer to pass around SessionHolder / ExecuteHolder more
> 
>
> Key: SPARK-43756
> URL: https://issues.apache.org/jira/browse/SPARK-43756
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Right now, we pass around inidvidual things like sessionId, userId etc. in 
> multiple places. This needs to the need of additional threading of parameters 
> quite often when something new is added.
> Better pass around SessionHolder and ExecutePlanHolder, where accessor and 
> utility functions can be added.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44722) reattach.py: AttributeError: 'NoneType' object has no attribute 'message'

2023-08-08 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44722:
-

 Summary: reattach.py: AttributeError: 'NoneType' object has no 
attribute 'message'
 Key: SPARK-44722
 URL: https://issues.apache.org/jira/browse/SPARK-44722
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0, 4.0.0
Reporter: Juliusz Sompolski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44709) Fix flow control in ExecuteGrpcResponseSender

2023-08-07 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44709:
-

 Summary: Fix flow control in ExecuteGrpcResponseSender
 Key: SPARK-44709
 URL: https://issues.apache.org/jira/browse/SPARK-44709
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0, 4.0.0
Reporter: Juliusz Sompolski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44676) Ensure Spark Connect scala client CloseableIterator is closed in all cases where exception could be thrown

2023-08-04 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44676:
-

 Summary: Ensure Spark Connect scala client CloseableIterator is 
closed in all cases where exception could be thrown
 Key: SPARK-44676
 URL: https://issues.apache.org/jira/browse/SPARK-44676
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski


We currently already ensure that CloseableIterator is consumed in all places, 
and in  ExecutePlanResponseReattachableIterator we ensure that the iterator is 
closed in case of GRPC error.

We should also ensure that all places in the client that use a 
CloseableIterator will close it gracefully, also in case of another exception 
being thrown, including InterruptedException.

Some try \{ } finally \{ iteretor.close() } blocks may be needed for that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44656) Close dangling iterators in SparkResult too (Spark Connect Scala)

2023-08-03 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-44656:
--
Epic Link: SPARK-43754

> Close dangling iterators in SparkResult too (Spark Connect Scala)
> -
>
> Key: SPARK-44656
> URL: https://issues.apache.org/jira/browse/SPARK-44656
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Alice Sayutina
>Priority: Major
>
> SPARK-44636 followup. We didn't address iterators grabbed in SparkResult 
> there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43754) Spark Connect Session & Query lifecycle

2023-08-02 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-43754:
--
Epic Name: connect-query-lifecycle  (was: sc-query-lifecycle)

> Spark Connect Session & Query lifecycle
> ---
>
> Key: SPARK-43754
> URL: https://issues.apache.org/jira/browse/SPARK-43754
> Project: Spark
>  Issue Type: Epic
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Currently, queries in Spark Connect are executed within the RPC handler.
> We want to detach the RPC interface from actual sessions and execution, so 
> that we can make the interface more flexible
>  * maintain long running sessions, independent of unbroken GRPC channel
>  * be able to cancel queries
>  * have different interfaces to query results than push from server



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43754) Spark Connect Session & Query lifecycle

2023-08-02 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-43754:
--
Epic Name: sc-query-lifecycle  (was: sc-session-lifecycle)

> Spark Connect Session & Query lifecycle
> ---
>
> Key: SPARK-43754
> URL: https://issues.apache.org/jira/browse/SPARK-43754
> Project: Spark
>  Issue Type: Epic
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Currently, queries in Spark Connect are executed within the RPC handler.
> We want to detach the RPC interface from actual sessions and execution, so 
> that we can make the interface more flexible
>  * maintain long running sessions, independent of unbroken GRPC channel
>  * be able to cancel queries
>  * have different interfaces to query results than push from server



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44642) ExecutePlanResponseReattachableIterator should release all after error

2023-08-02 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44642:
-

 Summary: ExecutePlanResponseReattachableIterator should release 
all after error
 Key: SPARK-44642
 URL: https://issues.apache.org/jira/browse/SPARK-44642
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0, 4.0.0
Reporter: Juliusz Sompolski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44642) ExecutePlanResponseReattachableIterator should release all after error

2023-08-02 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-44642:
--
Epic Link: SPARK-43754

> ExecutePlanResponseReattachableIterator should release all after error
> --
>
> Key: SPARK-44642
> URL: https://issues.apache.org/jira/browse/SPARK-44642
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44636) Leave no dangling iterators in Spark Connect Scala

2023-08-02 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-44636:
--
Epic Link: SPARK-43754

> Leave no dangling iterators in Spark Connect Scala
> --
>
> Key: SPARK-44636
> URL: https://issues.apache.org/jira/browse/SPARK-44636
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Alice Sayutina
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44637) ExecuteRelease needs to synchronize

2023-08-02 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-44637:
--
Epic Link: SPARK-43754

> ExecuteRelease needs to synchronize
> ---
>
> Key: SPARK-44637
> URL: https://issues.apache.org/jira/browse/SPARK-44637
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44637) ExecuteRelease needs to synchronize

2023-08-02 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44637:
-

 Summary: ExecuteRelease needs to synchronize
 Key: SPARK-44637
 URL: https://issues.apache.org/jira/browse/SPARK-44637
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0, 4.0.0
Reporter: Juliusz Sompolski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44624) Spark Connect reattachable Execute when initial ExecutePlan didn't reach server

2023-08-01 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-44624:
--
Epic Link: SPARK-43754

> Spark Connect reattachable Execute when initial ExecutePlan didn't reach 
> server
> ---
>
> Key: SPARK-44624
> URL: https://issues.apache.org/jira/browse/SPARK-44624
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> If the ExecutePlan never reached the server, a ReattachExecute will fail with 
> INVALID_HANDLE.OPERATION_NOT_FOUND. In that case, we could try to send 
> ExecutePlan again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44624) Spark Connect reattachable Execute when initial ExecutePlan didn't reach server

2023-08-01 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-44624:
--
Description: If the ExecutePlan never reached the server, a ReattachExecute 
will fail with INVALID_HANDLE.OPERATION_NOT_FOUND. In that case, we could try 
to send ExecutePlan again.  (was: Even though we empirically observed that 
error is throws only from first next() or hasNext() of the response 
StreamObserver, wrap the initial call in retries as well to not depend on it in 
case it's just an quirk that's not fully dependable.)

> Spark Connect reattachable Execute when initial ExecutePlan didn't reach 
> server
> ---
>
> Key: SPARK-44624
> URL: https://issues.apache.org/jira/browse/SPARK-44624
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> If the ExecutePlan never reached the server, a ReattachExecute will fail with 
> INVALID_HANDLE.OPERATION_NOT_FOUND. In that case, we could try to send 
> ExecutePlan again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44624) Spark Connect reattachable Execute when initial ExecutePlan didn't reach server

2023-08-01 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-44624:
--
Summary: Spark Connect reattachable Execute when initial ExecutePlan didn't 
reach server  (was: Wrap retries around initial streaming GRPC call in connect)

> Spark Connect reattachable Execute when initial ExecutePlan didn't reach 
> server
> ---
>
> Key: SPARK-44624
> URL: https://issues.apache.org/jira/browse/SPARK-44624
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Even though we empirically observed that error is throws only from first 
> next() or hasNext() of the response StreamObserver, wrap the initial call in 
> retries as well to not depend on it in case it's just an quirk that's not 
> fully dependable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44625) Spark Connect clean up abandoned executions

2023-08-01 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44625:
-

 Summary: Spark Connect clean up abandoned executions
 Key: SPARK-44625
 URL: https://issues.apache.org/jira/browse/SPARK-44625
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0, 4.0.0
Reporter: Juliusz Sompolski


With reattachable executions, some executions might get orphaned when 
ReattachExecute and ReleaseExecute never comes. Add a mechanism to track that 
and to clean them up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44624) Wrap retries around initial streaming GRPC call in connect

2023-08-01 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44624:
-

 Summary: Wrap retries around initial streaming GRPC call in connect
 Key: SPARK-44624
 URL: https://issues.apache.org/jira/browse/SPARK-44624
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0, 4.0.0
Reporter: Juliusz Sompolski


Even though we empirically observed that error is throws only from first next() 
or hasNext() of the response StreamObserver, wrap the initial call in retries 
as well to not depend on it in case it's just an quirk that's not fully 
dependable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44599) Python client for reattaching to existing execute in Spark Connect (server mechanism)

2023-07-31 Thread Juliusz Sompolski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17749370#comment-17749370
 ] 

Juliusz Sompolski commented on SPARK-44599:
---

Duplicate of https://issues.apache.org/jira/browse/SPARK-44424

> Python client for reattaching to existing execute in Spark Connect (server 
> mechanism)
> -
>
> Key: SPARK-44599
> URL: https://issues.apache.org/jira/browse/SPARK-44599
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Python client for https://issues.apache.org/jira/browse/SPARK-44421



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43923) [CONNECT] Post listenerBus events during ExecutePlanRequest

2023-07-14 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-43923:
--
Issue Type: New Feature  (was: Bug)

> [CONNECT] Post listenerBus events during ExecutePlanRequest
> ---
>
> Key: SPARK-43923
> URL: https://issues.apache.org/jira/browse/SPARK-43923
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Jean-Francois Desjeans Gauthier
>Priority: Major
>
> Post events SparkListenerConnectOperationStarted, 
> SparkListenerConnectOperationParsed, SparkListenerConnectOperationCanceled,  
> SparkListenerConnectOperationFailed, SparkListenerConnectOperationFinished, 
> SparkListenerConnectOperationClosed & SparkListenerConnectSessionClosed.
> Mirror what is currently available for in HiveThriftServer2EventManager



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44425) Validate that session_id is an UUID

2023-07-14 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44425:
-

 Summary: Validate that session_id is an UUID
 Key: SPARK-44425
 URL: https://issues.apache.org/jira/browse/SPARK-44425
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski


Add validation that session_id is an UUID. This is currently the case in the 
clients, so we could make it an requirement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44424) Reattach to existing execute in Spark Connect (python client)

2023-07-14 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44424:
-

 Summary: Reattach to existing execute in Spark Connect (python 
client)
 Key: SPARK-44424
 URL: https://issues.apache.org/jira/browse/SPARK-44424
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski


Python client for https://issues.apache.org/jira/browse/SPARK-44421



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44423) Reattach to existing execute in Spark Connect (scala client)

2023-07-14 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44423:
-

 Summary: Reattach to existing execute in Spark Connect (scala 
client)
 Key: SPARK-44423
 URL: https://issues.apache.org/jira/browse/SPARK-44423
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski


Scala client for https://issues.apache.org/jira/browse/SPARK-44421



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44422) Fine grained interrupt in Spark Connect

2023-07-14 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44422:
-

 Summary: Fine grained interrupt in Spark Connect
 Key: SPARK-44422
 URL: https://issues.apache.org/jira/browse/SPARK-44422
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski


Next to SparkSession.interruptAll, provide mechanism for interrupting 
 * individual queries
 * user defined groups of queries in a session (by a tag)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44421) Reattach to existing execute in Spark Connect (server mechanism)

2023-07-14 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-44421:
--
Summary: Reattach to existing execute in Spark Connect (server mechanism)  
(was: Reattach to existing execute in Spark Connect)

> Reattach to existing execute in Spark Connect (server mechanism)
> 
>
> Key: SPARK-44421
> URL: https://issues.apache.org/jira/browse/SPARK-44421
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> If the ExecutePlan response stream gets broken, provide a mechanism to 
> reattach to the execution with a new RPC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44421) Reattach to existing execute in Spark Connect

2023-07-14 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44421:
-

 Summary: Reattach to existing execute in Spark Connect
 Key: SPARK-44421
 URL: https://issues.apache.org/jira/browse/SPARK-44421
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski


If the ExecutePlan response stream gets broken, provide a mechanism to reattach 
to the execution with a new RPC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44195) Add JobTag APIs to SparkR SparkContext

2023-06-26 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44195:
-

 Summary: Add JobTag APIs to SparkR SparkContext
 Key: SPARK-44195
 URL: https://issues.apache.org/jira/browse/SPARK-44195
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski


Add APIs added in https://issues.apache.org/jira/browse/SPARK-43952 to SparkR:
 * {{SparkContext.addJobTag(tag: String): Unit}}
 * {{SparkContext.removeJobTag(tag: String): Unit}}
 * {{SparkContext.getJobTags(): Set[String]}}
 * {{SparkContext.clearJobTags(): Unit}}
 * {{SparkContext.cancelJobsWithTag(tag: String): Unit}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44194) Add JobTag APIs to Pyspark SparkContext

2023-06-26 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-44194:
-

 Summary: Add JobTag APIs to Pyspark SparkContext
 Key: SPARK-44194
 URL: https://issues.apache.org/jira/browse/SPARK-44194
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski


Add APIs added in https://issues.apache.org/jira/browse/SPARK-43952 to PySpark:
 * {{SparkContext.addJobTag(tag: String): Unit}}
 * {{SparkContext.removeJobTag(tag: String): Unit}}
 * {{SparkContext.getJobTags(): Set[String]}}
 * {{SparkContext.clearJobTags(): Unit}}
 * {{SparkContext.cancelJobsWithTag(tag: String): Unit}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43744) Spark Connect scala UDF serialization pulling in unrelated classes not available on server

2023-06-12 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-43744:
--
Summary: Spark Connect scala UDF serialization pulling in unrelated classes 
not available on server  (was: Maven test failure when "interrupt all" tests 
are moved to ClientE2ETestSuite)

> Spark Connect scala UDF serialization pulling in unrelated classes not 
> available on server
> --
>
> Key: SPARK-43744
> URL: https://issues.apache.org/jira/browse/SPARK-43744
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Priority: Major
>  Labels: SPARK-43745
>
> [https://github.com/apache/spark/pull/41487] moved "interrupt all - 
> background queries, foreground interrupt" and "interrupt all - foreground 
> queries, background interrupt" tests from ClientE2ETestSuite into a new 
> isolated suite SparkSessionE2ESuite to avoid an unexplicable UDF 
> serialization issue.
>  
> When these tests are moved back to ClientE2ETestSuite and when testing with
> {code:java}
> build/mvn clean install -DskipTests -Phive
> build/mvn test -pl connector/connect/client/jvm -Dtest=none 
> -DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite{code}
>  
> the tests fails with
> {code:java}
> 23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . 
> SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf.
> java.lang.NoClassDefFoundError: 
> org/apache/spark/sql/connect/client/SparkResult
> at java.lang.Class.getDeclaredMethods0(Native Method)
> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> at java.lang.Class.getDeclaredMethod(Class.java:2128)
> at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643)
> at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79)
> at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520)
> at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:494)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391)
> at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681)
> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005)
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852)
> at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
> at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
> at org.apache.spark.util.Utils$.deserialize(Utils.scala:148)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143)
> at 
> org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100)
> at 
> org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87)
> at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> at org.ap

[jira] [Updated] (SPARK-43744) Maven test failure when "interrupt all" tests are moved to ClientE2ETestSuite

2023-06-12 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-43744:
--
Description: 
[https://github.com/apache/spark/pull/41487] moved "interrupt all - background 
queries, foreground interrupt" and "interrupt all - foreground queries, 
background interrupt" tests from ClientE2ETestSuite into a new isolated suite 
SparkSessionE2ESuite to avoid an unexplicable UDF serialization issue.

 

When these tests are moved back to ClientE2ETestSuite and when testing with
{code:java}
build/mvn clean install -DskipTests -Phive
build/mvn test -pl connector/connect/client/jvm -Dtest=none 
-DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite{code}
 
the tests fails with
{code:java}
23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . 
SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf.
java.lang.NoClassDefFoundError: org/apache/spark/sql/connect/client/SparkResult
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.getDeclaredMethod(Class.java:2128)
at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643)
at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79)
at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520)
at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.(ObjectStreamClass.java:494)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391)
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852)
at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
at org.apache.spark.util.Utils$.deserialize(Utils.scala:148)
at 
org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353)
at 
org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761)
at 
org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531)
at 
org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495)
at 
org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143)
at 
org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100)
at 
org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:825)
at 
org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$1(SparkConnectStreamHandler.scala:53)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:209)
at 
org.apache.spark.sql.connect.artifact.SparkConnectArtifactManager$.withArtifactClassLoader(SparkConnectArtifactManager.scala:178)
at 
org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:48)
at 
org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:166)
at 
org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:611)
at 
org.sparkproject.connect.grpc.st

[jira] [Reopened] (SPARK-43744) Maven test failure of ClientE2ETestSuite "interrupt all" tests

2023-06-12 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski reopened SPARK-43744:
---

[https://github.com/apache/spark/pull/41487] has swept the issue under the rug 
by moving the tests around to a different suite.

Reopening this to get to the bottom of the issue, as I believe it remains a 
real issue that is not covered.

> Maven test failure of ClientE2ETestSuite "interrupt all" tests
> --
>
> Key: SPARK-43744
> URL: https://issues.apache.org/jira/browse/SPARK-43744
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Priority: Major
>  Labels: SPARK-43745
>
> When testing with
> {code:java}
> build/mvn clean install -DskipTests -Phive
> build/mvn test -pl connector/connect/client/jvm -Dtest=none 
> -DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite{code}
>  
> the tests fails with
> {code:java}
> 23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . 
> SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf.
> java.lang.NoClassDefFoundError: 
> org/apache/spark/sql/connect/client/SparkResult
> at java.lang.Class.getDeclaredMethods0(Native Method)
> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> at java.lang.Class.getDeclaredMethod(Class.java:2128)
> at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643)
> at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79)
> at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520)
> at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:494)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391)
> at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681)
> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005)
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852)
> at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
> at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
> at org.apache.spark.util.Utils$.deserialize(Utils.scala:148)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143)
> at 
> org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100)
> at 
> org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87)
> at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:825)
> at 
> org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$1(SparkConnectStreamHandler.scala:53)
> at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:209)
> at 
> org.apache.spark.sql.connect.artifact.S

[jira] [Updated] (SPARK-43744) Maven test failure when "interrupt all" tests are moved to ClientE2ETestSuite

2023-06-12 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-43744:
--
Summary: Maven test failure when "interrupt all" tests are moved to 
ClientE2ETestSuite  (was: Maven test failure of ClientE2ETestSuite "interrupt 
all" tests)

> Maven test failure when "interrupt all" tests are moved to ClientE2ETestSuite
> -
>
> Key: SPARK-43744
> URL: https://issues.apache.org/jira/browse/SPARK-43744
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Priority: Major
>  Labels: SPARK-43745
>
> When testing with
> {code:java}
> build/mvn clean install -DskipTests -Phive
> build/mvn test -pl connector/connect/client/jvm -Dtest=none 
> -DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite{code}
>  
> the tests fails with
> {code:java}
> 23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . 
> SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf.
> java.lang.NoClassDefFoundError: 
> org/apache/spark/sql/connect/client/SparkResult
> at java.lang.Class.getDeclaredMethods0(Native Method)
> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> at java.lang.Class.getDeclaredMethod(Class.java:2128)
> at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643)
> at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79)
> at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520)
> at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:494)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391)
> at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681)
> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005)
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852)
> at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
> at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
> at org.apache.spark.util.Utils$.deserialize(Utils.scala:148)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143)
> at 
> org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100)
> at 
> org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87)
> at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:825)
> at 
> org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$1(SparkConnectStreamHandler.scala:53)
> at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:209)
> at 
> org.apache.spark.sql.connect.artifact.SparkConnectArtifactManager$.withArtifactClassLoa

[jira] [Commented] (SPARK-43952) Cancel Spark jobs not only by a single "jobgroup", but allow multiple "job tags"

2023-06-02 Thread Juliusz Sompolski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728791#comment-17728791
 ] 

Juliusz Sompolski commented on SPARK-43952:
---

Indirectly related with https://issues.apache.org/jira/browse/SPARK-43754 to 
allow Spark Connect cancellation of queries not conflict with other places 
setting job groups.

> Cancel Spark jobs not only by a single "jobgroup", but allow multiple "job 
> tags"
> 
>
> Key: SPARK-43952
> URL: https://issues.apache.org/jira/browse/SPARK-43952
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Currently, the only way to cancel running Spark Jobs is by using 
> SparkContext.cancelJobGroup, using a job group name that was previously set 
> using SparkContext.setJobGroup. This is problematic if multiple different 
> parts of the system want to do cancellation, and set their own ids.
> For example, 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L133]
>  sets it's own job group, which may override job group set by user. This way, 
> if user cancels the job group they set, it will not cancel these broadcast 
> jobs launches from within their jobs...
> As a solution, consider adding SparkContext.addJobTag / 
> SparkContext.removeJobTag, which would allow to have multiple "tags" on the 
> jobs, and introduce SparkContext.cancelJobsByTag to allow more flexible 
> cancelling of jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43754) Spark Connect Session & Query lifecycle

2023-06-02 Thread Juliusz Sompolski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728790#comment-17728790
 ] 

Juliusz Sompolski commented on SPARK-43754:
---

Indirectly related to https://issues.apache.org/jira/browse/SPARK-43952 (to 
allow Spark Connect cancellation of queries not conflict with other places 
setting job groups)

> Spark Connect Session & Query lifecycle
> ---
>
> Key: SPARK-43754
> URL: https://issues.apache.org/jira/browse/SPARK-43754
> Project: Spark
>  Issue Type: Epic
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Currently, queries in Spark Connect are executed within the RPC handler.
> We want to detach the RPC interface from actual sessions and execution, so 
> that we can make the interface more flexible
>  * maintain long running sessions, independent of unbroken GRPC channel
>  * be able to cancel queries
>  * have different interfaces to query results than push from server



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43952) Cancel Spark jobs not only by a single "jobgroup", but allow multiple "job tags"

2023-06-02 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-43952:
-

 Summary: Cancel Spark jobs not only by a single "jobgroup", but 
allow multiple "job tags"
 Key: SPARK-43952
 URL: https://issues.apache.org/jira/browse/SPARK-43952
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski


Currently, the only way to cancel running Spark Jobs is by using 
SparkContext.cancelJobGroup, using a job group name that was previously set 
using SparkContext.setJobGroup. This is problematic if multiple different parts 
of the system want to do cancellation, and set their own ids.

For example, 
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L133]
 sets it's own job group, which may override job group set by user. This way, 
if user cancels the job group they set, it will not cancel these broadcast jobs 
launches from within their jobs...

As a solution, consider adding SparkContext.addJobTag / 
SparkContext.removeJobTag, which would allow to have multiple "tags" on the 
jobs, and introduce SparkContext.cancelJobsByTag to allow more flexible 
cancelling of jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43756) Spark Connect - prefer to pass around SessionHolder / ExecuteHolder more

2023-05-23 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-43756:
-

 Summary: Spark Connect - prefer to pass around SessionHolder / 
ExecuteHolder more
 Key: SPARK-43756
 URL: https://issues.apache.org/jira/browse/SPARK-43756
 Project: Spark
  Issue Type: Task
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski


Right now, we pass around inidvidual things like sessionId, userId etc. in 
multiple places. This needs to the need of additional threading of parameters 
quite often when something new is added.

Better pass around SessionHolder and ExecutePlanHolder, where accessor and 
utility functions can be added.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43755) Spark Connect - decouple query execution from RPC handler

2023-05-23 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-43755:
-

 Summary: Spark Connect - decouple query execution from RPC handler
 Key: SPARK-43755
 URL: https://issues.apache.org/jira/browse/SPARK-43755
 Project: Spark
  Issue Type: Story
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski


Move actual query execution out of the RPC handler callback. This allows:
 * (immediately) better control over query cancellation, by interrupting the 
execution thread.
 * design changes to the RPC interface to allow different execution models than 
stream-push from server.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43331) Spark Connect - SparkSession interruptAll

2023-05-23 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-43331:
--
Epic Link: SPARK-43754

> Spark Connect - SparkSession interruptAll
> -
>
> Key: SPARK-43331
> URL: https://issues.apache.org/jira/browse/SPARK-43331
> Project: Spark
>  Issue Type: Story
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
> Fix For: 3.5.0
>
>
> Add an "interruptAll" Api to client SparkSession, to allow interrupting all 
> running executions in Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43754) Spark Connect Session & Query lifecycle

2023-05-23 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-43754:
-

 Summary: Spark Connect Session & Query lifecycle
 Key: SPARK-43754
 URL: https://issues.apache.org/jira/browse/SPARK-43754
 Project: Spark
  Issue Type: Epic
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski


Currently, queries in Spark Connect are executed within the RPC handler.

We want to detach the RPC interface from actual sessions and execution, so that 
we can make the interface more flexible
 * maintain long running sessions, independent of unbroken GRPC channel
 * be able to cancel queries
 * have different interfaces to query results than push from server



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43744) Maven test failure of ClientE2ETestSuite "interrupt all" tests

2023-05-23 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski resolved SPARK-43744.
---
Resolution: Duplicate

> Maven test failure of ClientE2ETestSuite "interrupt all" tests
> --
>
> Key: SPARK-43744
> URL: https://issues.apache.org/jira/browse/SPARK-43744
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> When testing with
> {code:java}
> build/mvn clean install -DskipTests -Phive
> build/mvn test -pl connector/connect/client/jvm -Dtest=none 
> -DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite{code}
>  
> the tests fails with
> {code:java}
> 23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . 
> SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf.
> java.lang.NoClassDefFoundError: 
> org/apache/spark/sql/connect/client/SparkResult
> at java.lang.Class.getDeclaredMethods0(Native Method)
> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> at java.lang.Class.getDeclaredMethod(Class.java:2128)
> at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643)
> at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79)
> at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520)
> at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:494)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391)
> at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681)
> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005)
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852)
> at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
> at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
> at org.apache.spark.util.Utils$.deserialize(Utils.scala:148)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495)
> at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143)
> at 
> org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100)
> at 
> org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87)
> at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:825)
> at 
> org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$1(SparkConnectStreamHandler.scala:53)
> at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:209)
> at 
> org.apache.spark.sql.connect.artifact.SparkConnectArtifactManager$.withArtifactClassLoader(SparkConnectArtifactManager.scala:178)
> at 
> org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:48)
> at 
> org.apache.spark.sql.connect.service

[jira] [Commented] (SPARK-43648) Maven test failed: `interrupt all - background queries, foreground interrupt` and `interrupt all - foreground queries, background interrupt`

2023-05-23 Thread Juliusz Sompolski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725320#comment-17725320
 ] 

Juliusz Sompolski commented on SPARK-43648:
---

Duplicated by https://issues.apache.org/jira/browse/SPARK-43744; closing the 
duplicate.

See 
[https://github.com/apache/spark/pull/41005/files/35e500d4cb72f8d3bee21a7f86ee16cbbc8a936c#r1200551487]
 thread for some debugging info.
It seems it may be related to https://issues.apache.org/jira/browse/SPARK-43227

> Maven test failed: `interrupt all - background queries, foreground interrupt` 
> and `interrupt all - foreground queries, background interrupt`
> 
>
> Key: SPARK-43648
> URL: https://issues.apache.org/jira/browse/SPARK-43648
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> run 
> {code:java}
> build/mvn clean install -DskipTests -Phive
> build/mvn test -pl connector/connect/client/jvm -Dtest=none 
> -DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite {code}
> `interrupt all - background queries, foreground interrupt` and `interrupt all 
> - foreground queries, background interrupt` failed as follows:
> {code:java}
> 23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . 
> SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf.
> java.lang.NoClassDefFoundError: 
> org/apache/spark/sql/connect/client/SparkResult
>   at java.lang.Class.getDeclaredMethods0(Native Method)
>   at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
>   at java.lang.Class.getDeclaredMethod(Class.java:2128)
>   at 
> java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643)
>   at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79)
>   at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520)
>   at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.io.ObjectStreamClass.(ObjectStreamClass.java:494)
>   at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391)
>   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852)
>   at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
>   at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
>   at org.apache.spark.util.Utils$.deserialize(Utils.scala:148)
>   at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353)
>   at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761)
>   at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531)
>   at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495)
>   at 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143)
>   at 
> org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:

[jira] [Updated] (SPARK-43744) Maven test failure of ClientE2ETestSuite "interrupt all" tests

2023-05-23 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-43744:
--
Description: 
When testing with
{code:java}
build/mvn clean install -DskipTests -Phive
build/mvn test -pl connector/connect/client/jvm -Dtest=none 
-DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite{code}
 
the tests fails with


{code:java}
23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . 
SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf.
java.lang.NoClassDefFoundError: org/apache/spark/sql/connect/client/SparkResult
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.getDeclaredMethod(Class.java:2128)
at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643)
at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79)
at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520)
at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.(ObjectStreamClass.java:494)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391)
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852)
at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
at org.apache.spark.util.Utils$.deserialize(Utils.scala:148)
at 
org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353)
at 
org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761)
at 
org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531)
at 
org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495)
at 
org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143)
at 
org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100)
at 
org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:825)
at 
org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$1(SparkConnectStreamHandler.scala:53)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:209)
at 
org.apache.spark.sql.connect.artifact.SparkConnectArtifactManager$.withArtifactClassLoader(SparkConnectArtifactManager.scala:178)
at 
org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:48)
at 
org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:166)
at 
org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:611)
at 
org.sparkproject.connect.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
at 
org.sparkproject.connect.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:352)
at 
org.sparkproject.connect.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866)
at

[jira] [Commented] (SPARK-43227) Fix deserialisation issue when UDFs contain a lambda expression

2023-05-23 Thread Juliusz Sompolski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725315#comment-17725315
 ] 

Juliusz Sompolski commented on SPARK-43227:
---

Possibly related to https://issues.apache.org/jira/browse/SPARK-43744?

> Fix deserialisation issue when UDFs contain a lambda expression
> ---
>
> Key: SPARK-43227
> URL: https://issues.apache.org/jira/browse/SPARK-43227
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> The following code:
> {code:java}
> class A(x: Int) { def get = x * 20 + 5 }
> val dummyUdf = (x: Int) => new A(x).get
> val myUdf = udf(dummyUdf)
> spark.range(5).select(myUdf(col("id"))).as[Int].collect() {code}
> hits the following error:
> {noformat}
> io.grpc.StatusRuntimeException: INTERNAL: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> ammonite.$sess.cmd26$Helper.dummyUdf of type scala.Function1 in instance of 
> ammonite.$sess.cmd26$Helper
>   io.grpc.Status.asRuntimeException(Status.java:535)
>   
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
>   
> org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:62)
>   
> org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:114)
>   
> org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:131)
>   org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2687)
>   org.apache.spark.sql.Dataset.withResult(Dataset.scala:3088)
>   org.apache.spark.sql.Dataset.collect(Dataset.scala:2686)
>   ammonite.$sess.cmd28$Helper.(cmd28.sc:1)
>   ammonite.$sess.cmd28$.(cmd28.sc:7)
>   ammonite.$sess.cmd28$.(cmd28.sc){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43744) Maven test failure of ClientE2ETestSuite "interrupt all" tests

2023-05-23 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-43744:
-

 Summary: Maven test failure of ClientE2ETestSuite "interrupt all" 
tests
 Key: SPARK-43744
 URL: https://issues.apache.org/jira/browse/SPARK-43744
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski


When testing with
```
build/mvn clean install -DskipTests -Phive
build/mvn test -pl connector/connect/client/jvm -Dtest=none 
-DwildcardSuites=org.apache.spark.sql.ClientE2ETestSuite 
```
the tests fails with
```
23/05/22 15:44:11 ERROR SparkConnectService: Error during: execute. UserId: . 
SessionId: 0f4013ca-3af9-443b-a0e5-e339a827e0cf.
java.lang.NoClassDefFoundError: org/apache/spark/sql/connect/client/SparkResult
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.getDeclaredMethod(Class.java:2128)
at 
java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643)
at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79)
at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520)
at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.(ObjectStreamClass.java:494)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391)
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852)
at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
at org.apache.spark.util.Utils$.deserialize(Utils.scala:148)
at 
org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:1353)
at 
org.apache.spark.sql.connect.planner.SparkConnectPlanner$TypedScalaUdf$.apply(SparkConnectPlanner.scala:761)
at 
org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformTypedMapPartitions(SparkConnectPlanner.scala:531)
at 
org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformMapPartitions(SparkConnectPlanner.scala:495)
at 
org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:143)
at 
org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:100)
at 
org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$2(SparkConnectStreamHandler.scala:87)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:825)
at 
org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$1(SparkConnectStreamHandler.scala:53)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:209)
at 
org.apache.spark.sql.connect.artifact.SparkConnectArtifactManager$.withArtifactClassLoader(SparkConnectArtifactManager.scala:178)
at 
org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:48)
at 
org.apache.spark.sql.connect.service.SparkConne

[jira] [Created] (SPARK-43331) Spark Connect - SparkSession interruptAll

2023-05-01 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-43331:
-

 Summary: Spark Connect - SparkSession interruptAll
 Key: SPARK-43331
 URL: https://issues.apache.org/jira/browse/SPARK-43331
 Project: Spark
  Issue Type: Story
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski


Add an "interruptAll" Api to client SparkSession, to allow interrupting all 
running executions in Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-41497) Accumulator undercounting in the case of retry task with rdd cache

2023-01-02 Thread Juliusz Sompolski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653648#comment-17653648
 ] 

Juliusz Sompolski edited comment on SPARK-41497 at 1/2/23 4:22 PM:
---

Note that this issue leads to a correctness issue in Delta Merge, because it 
depends on a SetAccumulator as a side output channel for gathering files that 
need to be rewritten by the Merge: 
https://github.com/delta-io/delta/blob/master/core/src/main/scala/org/apache/spark/sql/delta/commands/MergeIntoCommand.scala#L445-L449
Delta assumes that Spark accumulators can overcount (in some cases where task 
retries update them in duplicate), but it was assumed that it should never 
undercount and lose output like that...

Missing some files there can result in duplicate records being inserted instead 
of existing records being updated.


was (Author: juliuszsompolski):
Note that this issue leads to a correctness issue in Delta Merge, because it 
depends on a SetAccumulator as a side output channel for gathering files that 
need to be rewritten by the Merge: 
https://github.com/delta-io/delta/blob/master/core/src/main/scala/org/apache/spark/sql/delta/commands/MergeIntoCommand.scala#L445-L449

Missing some files there can result in duplicate records being inserted instead 
of existing records being updated.

> Accumulator undercounting in the case of retry task with rdd cache
> --
>
> Key: SPARK-41497
> URL: https://issues.apache.org/jira/browse/SPARK-41497
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.8, 3.0.3, 3.1.3, 3.2.2, 3.3.1
>Reporter: wuyi
>Priority: Major
>
> Accumulator could be undercounted when the retried task has rdd cache.  See 
> the example below and you could also find the completed and reproducible 
> example at 
> [https://github.com/apache/spark/compare/master...Ngone51:spark:fix-acc]
>   
> {code:scala}
> test("SPARK-XXX") {
>   // Set up a cluster with 2 executors
>   val conf = new SparkConf()
> .setMaster("local-cluster[2, 1, 
> 1024]").setAppName("TaskSchedulerImplSuite")
>   sc = new SparkContext(conf)
>   // Set up a custom task scheduler. The scheduler will fail the first task 
> attempt of the job
>   // submitted below. In particular, the failed first attempt task would 
> success on computation
>   // (accumulator accounting, result caching) but only fail to report its 
> success status due
>   // to the concurrent executor lost. The second task attempt would success.
>   taskScheduler = setupSchedulerWithCustomStatusUpdate(sc)
>   val myAcc = sc.longAccumulator("myAcc")
>   // Initiate a rdd with only one partition so there's only one task and 
> specify the storage level
>   // with MEMORY_ONLY_2 so that the rdd result will be cached on both two 
> executors.
>   val rdd = sc.parallelize(0 until 10, 1).mapPartitions { iter =>
> myAcc.add(100)
> iter.map(x => x + 1)
>   }.persist(StorageLevel.MEMORY_ONLY_2)
>   // This will pass since the second task attempt will succeed
>   assert(rdd.count() === 10)
>   // This will fail due to `myAcc.add(100)` won't be executed during the 
> second task attempt's
>   // execution. Because the second task attempt will load the rdd cache 
> directly instead of
>   // executing the task function so `myAcc.add(100)` is skipped.
>   assert(myAcc.value === 100)
> } {code}
>  
> We could also hit this issue with decommission even if the rdd only has one 
> copy. For example, decommission could migrate the rdd cache block to another 
> executor (the result is actually the same with 2 copies) and the 
> decommissioned executor lost before the task reports its success status to 
> the driver. 
>  
> And the issue is a bit more complicated than expected to fix. I have tried to 
> give some fixes but all of them are not ideal:
> Option 1: Clean up any rdd cache related to the failed task: in practice, 
> this option can already fix the issue in most cases. However, theoretically, 
> rdd cache could be reported to the driver right after the driver cleans up 
> the failed task's caches due to asynchronous communication. So this option 
> can’t resolve the issue thoroughly;
> Option 2: Disallow rdd cache reuse across the task attempts for the same 
> task: this option can 100% fix the issue. The problem is this way can also 
> affect the case where rdd cache can be reused across the attempts (e.g., when 
> there is no accumulator operation in the task), which can have perf 
> regression;
> Option 3: Introduce accumulator cache: first, this requires a new framework 
> for supporting accumulator cache; second, the driver should improve its logic 
> to distinguish whether the accumulator cache value should be reported to the 
> user to avoid overcounting.

[jira] [Commented] (SPARK-41497) Accumulator undercounting in the case of retry task with rdd cache

2023-01-02 Thread Juliusz Sompolski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653648#comment-17653648
 ] 

Juliusz Sompolski commented on SPARK-41497:
---

Note that this issue leads to a correctness issue in Delta Merge, because it 
depends on a SetAccumulator as a side output channel for gathering files that 
need to be rewritten by the Merge: 
https://github.com/delta-io/delta/blob/master/core/src/main/scala/org/apache/spark/sql/delta/commands/MergeIntoCommand.scala#L445-L449

Missing some files there can result in duplicate records being inserted instead 
of existing records being updated.

> Accumulator undercounting in the case of retry task with rdd cache
> --
>
> Key: SPARK-41497
> URL: https://issues.apache.org/jira/browse/SPARK-41497
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.8, 3.0.3, 3.1.3, 3.2.2, 3.3.1
>Reporter: wuyi
>Priority: Major
>
> Accumulator could be undercounted when the retried task has rdd cache.  See 
> the example below and you could also find the completed and reproducible 
> example at 
> [https://github.com/apache/spark/compare/master...Ngone51:spark:fix-acc]
>   
> {code:scala}
> test("SPARK-XXX") {
>   // Set up a cluster with 2 executors
>   val conf = new SparkConf()
> .setMaster("local-cluster[2, 1, 
> 1024]").setAppName("TaskSchedulerImplSuite")
>   sc = new SparkContext(conf)
>   // Set up a custom task scheduler. The scheduler will fail the first task 
> attempt of the job
>   // submitted below. In particular, the failed first attempt task would 
> success on computation
>   // (accumulator accounting, result caching) but only fail to report its 
> success status due
>   // to the concurrent executor lost. The second task attempt would success.
>   taskScheduler = setupSchedulerWithCustomStatusUpdate(sc)
>   val myAcc = sc.longAccumulator("myAcc")
>   // Initiate a rdd with only one partition so there's only one task and 
> specify the storage level
>   // with MEMORY_ONLY_2 so that the rdd result will be cached on both two 
> executors.
>   val rdd = sc.parallelize(0 until 10, 1).mapPartitions { iter =>
> myAcc.add(100)
> iter.map(x => x + 1)
>   }.persist(StorageLevel.MEMORY_ONLY_2)
>   // This will pass since the second task attempt will succeed
>   assert(rdd.count() === 10)
>   // This will fail due to `myAcc.add(100)` won't be executed during the 
> second task attempt's
>   // execution. Because the second task attempt will load the rdd cache 
> directly instead of
>   // executing the task function so `myAcc.add(100)` is skipped.
>   assert(myAcc.value === 100)
> } {code}
>  
> We could also hit this issue with decommission even if the rdd only has one 
> copy. For example, decommission could migrate the rdd cache block to another 
> executor (the result is actually the same with 2 copies) and the 
> decommissioned executor lost before the task reports its success status to 
> the driver. 
>  
> And the issue is a bit more complicated than expected to fix. I have tried to 
> give some fixes but all of them are not ideal:
> Option 1: Clean up any rdd cache related to the failed task: in practice, 
> this option can already fix the issue in most cases. However, theoretically, 
> rdd cache could be reported to the driver right after the driver cleans up 
> the failed task's caches due to asynchronous communication. So this option 
> can’t resolve the issue thoroughly;
> Option 2: Disallow rdd cache reuse across the task attempts for the same 
> task: this option can 100% fix the issue. The problem is this way can also 
> affect the case where rdd cache can be reused across the attempts (e.g., when 
> there is no accumulator operation in the task), which can have perf 
> regression;
> Option 3: Introduce accumulator cache: first, this requires a new framework 
> for supporting accumulator cache; second, the driver should improve its logic 
> to distinguish whether the accumulator cache value should be reported to the 
> user to avoid overcounting. For example, in the case of task retry, the value 
> should be reported. However, in the case of rdd cache reuse, the value 
> shouldn’t be reported (should it?);
> Option 4: Do task success validation when a task trying to load the rdd 
> cache: this way defines a rdd cache is only valid/accessible if the task has 
> succeeded. This way could be either overkill or a bit complex (because 
> currently Spark would clean up the task state once it’s finished. So we need 
> to maintain a structure to know if task once succeeded or not. )



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spa

[jira] [Created] (SPARK-40918) Mismatch between ParquetFileFormat and FileSourceScanExec in # columns for WSCG.isTooManyFields when using _metadata

2022-10-26 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-40918:
-

 Summary: Mismatch between ParquetFileFormat and FileSourceScanExec 
in # columns for WSCG.isTooManyFields when using _metadata
 Key: SPARK-40918
 URL: https://issues.apache.org/jira/browse/SPARK-40918
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Juliusz Sompolski


_metadata.columns are taken into account in FileSourceScanExec.supportColumnar, 
but not when the parquet reader is created. This can result in Parquet reader 
outputting columnar (because it has less columns than WSCG.isTooManyFields), 
whereas FileSourceScanExec wants row output (because with the extra metadata 
columns it hits the isTooManyFields limit).

I have a fix forthcoming.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20624) SPIP: Add better handling for node shutdown

2022-09-13 Thread Juliusz Sompolski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-20624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17603577#comment-17603577
 ] 

Juliusz Sompolski commented on SPARK-20624:
---

[~holden] Are these new APIs documented? I can't seem to find them in the 
official Spark documentation.
Should they be mentioned e.g. in 
https://spark.apache.org/docs/latest/job-scheduling.html#graceful-decommission-of-executors
 ?

> SPIP: Add better handling for node shutdown
> ---
>
> Key: SPARK-20624
> URL: https://issues.apache.org/jira/browse/SPARK-20624
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Holden Karau
>Priority: Major
>
> While we've done some good work with better handling when Spark is choosing 
> to decommission nodes (SPARK-7955), it might make sense in environments where 
> we get preempted without our own choice (e.g. YARN over-commit, EC2 spot 
> instances, GCE Preemptiable instances, etc.) to do something for the data on 
> the node (or at least not schedule any new tasks).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities

2021-10-21 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski resolved SPARK-37090.
---
Resolution: Duplicate

> Upgrade libthrift to resolve security vulnerabilities
> -
>
> Key: SPARK-37090
> URL: https://issues.apache.org/jira/browse/SPARK-37090
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Currently, Spark uses libthrift 0.12, which has reported high severity 
> security vulnerabilities 
> https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift
> Upgrade to 0.14 to get rid of vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities

2021-10-21 Thread Juliusz Sompolski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432731#comment-17432731
 ] 

Juliusz Sompolski commented on SPARK-37090:
---

Duplicate of https://issues.apache.org/jira/browse/SPARK-36994

> Upgrade libthrift to resolve security vulnerabilities
> -
>
> Key: SPARK-37090
> URL: https://issues.apache.org/jira/browse/SPARK-37090
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Currently, Spark uses libthrift 0.12, which has reported high severity 
> security vulnerabilities 
> https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift
> Upgrade to 0.14 to get rid of vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities

2021-10-21 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-37090:
-

 Summary: Upgrade libthrift to resolve security vulnerabilities
 Key: SPARK-37090
 URL: https://issues.apache.org/jira/browse/SPARK-37090
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0, 3.1.0, 3.0.0, 3.3.0
Reporter: Juliusz Sompolski


Currently, Spark uses libthrift 0.12, which has reported high severity security 
vulnerabilities https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift
Upgrade to 0.14 to get rid of vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34152) CreateViewStatement.child should be a real child

2021-01-18 Thread Juliusz Sompolski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267166#comment-17267166
 ] 

Juliusz Sompolski commented on SPARK-34152:
---

Same applies to AlterViewStatement.

> CreateViewStatement.child should be a real child
> 
>
> Key: SPARK-34152
> URL: https://issues.apache.org/jira/browse/SPARK-34152
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>
> Similar to `CreateTableAsSelectStatement`, the input query of 
> `CreateViewStatement` should be a child and get analyzed during the analysis 
> phase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32132) Thriftserver interval returns "4 weeks 2 days" in 2.4 and "30 days" in 3.0

2020-06-30 Thread Juliusz Sompolski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148848#comment-17148848
 ] 

Juliusz Sompolski commented on SPARK-32132:
---

Also 2.4 adds "interval" at the start, while 3.0 does not. E.g. "interval 3 
days" in 2.4 and "3 days" in 3.0.
I actually think that the new 3.0 results are better / more standard, and I 
haven't heard about anyone complaining that it broke the way they parse it.

Edit: [~cloud_fan] posting now the above comment that I thought I posted 
yesterday, but it stayed open and not send in an open tab. It causes some 
issues with unit tests, but I think it shouldn't cause real world problems, and 
in any case the new format is likely better for the future. Thanks for 
explaining.

> Thriftserver interval returns "4 weeks 2 days" in 2.4 and "30 days" in 3.0
> --
>
> Key: SPARK-32132
> URL: https://issues.apache.org/jira/browse/SPARK-32132
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Priority: Minor
>
> In https://github.com/apache/spark/pull/26418, a setting 
> spark.sql.dialect.intervalOutputStyle was implemented, to control interval 
> output style. This PR also removed "toString" from CalendarInterval. This 
> change got reverted in https://github.com/apache/spark/pull/27304, and the 
> CalendarInterval.toString got implemented back in 
> https://github.com/apache/spark/pull/26572.
> But it behaves differently now: In 2.4 "4 weeks 2 days" are returned, and 3.0 
> returns "30 days".
> Thriftserver uses HiveResults.toHiveString, which uses 
> CalendarInterval.toString to return interval results as string.  The results 
> are now different in 3.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32132) Thriftserver interval returns "4 weeks 2 days" in 2.4 and "30 days" in 3.0

2020-06-29 Thread Juliusz Sompolski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148086#comment-17148086
 ] 

Juliusz Sompolski commented on SPARK-32132:
---

cc [~hyukjin.kwon] [~cloud_fan] [~Qin Yao]

> Thriftserver interval returns "4 weeks 2 days" in 2.4 and "30 days" in 3.0
> --
>
> Key: SPARK-32132
> URL: https://issues.apache.org/jira/browse/SPARK-32132
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Priority: Minor
>
> In https://github.com/apache/spark/pull/26418, a setting 
> spark.sql.dialect.intervalOutputStyle was implemented, to control interval 
> output style. This PR also removed "toString" from CalendarInterval. This 
> change got reverted in https://github.com/apache/spark/pull/27304, and the 
> CalendarInterval.toString got implemented back in 
> https://github.com/apache/spark/pull/26572.
> But it behaves differently now: In 2.4 "4 weeks 2 days" are returned, and 3.0 
> returns "30 days".
> Thriftserver uses HiveResults.toHiveString, which uses 
> CalendarInterval.toString to return interval results as string.  The results 
> are now different in 3.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32132) Thriftserver interval returns "4 weeks 2 days" in 2.4 and "30 days" in 3.0

2020-06-29 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-32132:
-

 Summary: Thriftserver interval returns "4 weeks 2 days" in 2.4 and 
"30 days" in 3.0
 Key: SPARK-32132
 URL: https://issues.apache.org/jira/browse/SPARK-32132
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Juliusz Sompolski


In https://github.com/apache/spark/pull/26418, a setting 
spark.sql.dialect.intervalOutputStyle was implemented, to control interval 
output style. This PR also removed "toString" from CalendarInterval. This 
change got reverted in https://github.com/apache/spark/pull/27304, and the 
CalendarInterval.toString got implemented back in 
https://github.com/apache/spark/pull/26572.

But it behaves differently now: In 2.4 "4 weeks 2 days" are returned, and 3.0 
returns "30 days".
Thriftserver uses HiveResults.toHiveString, which uses 
CalendarInterval.toString to return interval results as string.  The results 
are now different in 3.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32021) make_interval does not accept seconds >100

2020-06-18 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-32021:
--
Description: 
In make_interval(years, months, weeks, days, hours, mins, secs), secs are 
defined as Decimal(8, 6), which turns into null if the value of the expression 
overflows 100 seconds.
Larger seconds values should be allowed.

This has been reported by Simba, who wants to use make_interval to implement 
translation for TIMESTAMP_ADD ODBC function in Spark 3.0.
ODBC {fn TIMESTAMPADD(SECOND, integer_exp, timestamp} fails when integer_exp 
returns seconds values >= 100.

  was:
In make_interval(years, months, weeks, days, hours, mins, secs), secs are 
defined as Decimal(8, 6), which turns into null if the value of the expression 
overflows 100 seconds.
Larger seconds values should be allowed.

This has been reported by Simba, who wants to use make_interval to implement 
translation for TIMESTAMP_ADD ODBC function in Spark 3.0.


> make_interval does not accept seconds >100
> --
>
> Key: SPARK-32021
> URL: https://issues.apache.org/jira/browse/SPARK-32021
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> In make_interval(years, months, weeks, days, hours, mins, secs), secs are 
> defined as Decimal(8, 6), which turns into null if the value of the 
> expression overflows 100 seconds.
> Larger seconds values should be allowed.
> This has been reported by Simba, who wants to use make_interval to implement 
> translation for TIMESTAMP_ADD ODBC function in Spark 3.0.
> ODBC {fn TIMESTAMPADD(SECOND, integer_exp, timestamp} fails when integer_exp 
> returns seconds values >= 100.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32021) make_interval does not accept seconds >100

2020-06-18 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-32021:
-

 Summary: make_interval does not accept seconds >100
 Key: SPARK-32021
 URL: https://issues.apache.org/jira/browse/SPARK-32021
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Juliusz Sompolski


In make_interval(years, months, weeks, days, hours, mins, secs), secs are 
defined as Decimal(8, 6), which turns into null if the value of the expression 
overflows 100 seconds.
Larger seconds values should be allowed.

This has been reported by Simba, who wants to use make_interval to implement 
translation for TIMESTAMP_ADD ODBC function in Spark 3.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31863) Thriftserver not setting active SparkSession, SQLConf.get not getting session configs correctly

2020-05-28 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-31863:
-

 Summary: Thriftserver not setting active SparkSession, SQLConf.get 
not getting session configs correctly
 Key: SPARK-31863
 URL: https://issues.apache.org/jira/browse/SPARK-31863
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Juliusz Sompolski


Thriftserver is not setting the active SparkSession.
Because of that, configuration obtained with SQLConf.get is not the session 
configuration.
This makes many configs set by "set" in the session not work correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31859) Thriftserver with spark.sql.datetime.java8API.enabled=true

2020-05-28 Thread Juliusz Sompolski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119002#comment-17119002
 ] 

Juliusz Sompolski commented on SPARK-31859:
---

Actually, it's already in HiveResult.toHiveString...
What happens is:
- its taken from SparkRow in SparkExecuteStatement.addNonNullColumnValue  using 
"to += from.getAs[Timestamp](ordinal)". This appears to not complain that it's 
in fact an Instant, not Timestamp.
- in ColumnValue.timestampValue it gets turned into a String as 
value.toString(). That somehow also doesn't complain that the object is an 
Instant, not a Timestamp?
- this gets returned to the client as String, which complains that it cannot 
read it back into Timestamp..

I will fix it together with https://issues.apache.org/jira/browse/SPARK-31861

> Thriftserver with spark.sql.datetime.java8API.enabled=true
> --
>
> Key: SPARK-31859
> URL: https://issues.apache.org/jira/browse/SPARK-31859
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> {code}
>   test("spark.sql.datetime.java8API.enabled=true") {
> withJdbcStatement() { st =>
>   st.execute("set spark.sql.datetime.java8API.enabled=true")
>   val rs = st.executeQuery("select timestamp '2020-05-28 00:00:00'")
>   rs.next()
>   // scalastyle:off
>   println(rs.getObject(1))
> }
>   }
> {code}
> fails with 
> {code}
> HiveThriftBinaryServerSuite:
> java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
> hh:mm:ss[.f]
> at java.sql.Timestamp.valueOf(Timestamp.java:204)
> at 
> org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:444)
> at 
> org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:424)
> at 
> org.apache.hive.jdbc.HiveBaseResultSet.getObject(HiveBaseResultSet.java:464
> {code}
> It seems it might be needed in HiveResult.toHiveString?
> cc [~maxgekk]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31861) Thriftserver collecting timestamp not using spark.sql.session.timeZone

2020-05-28 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-31861:
-

 Summary: Thriftserver collecting timestamp not using 
spark.sql.session.timeZone
 Key: SPARK-31861
 URL: https://issues.apache.org/jira/browse/SPARK-31861
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Juliusz Sompolski


If JDBC client is in TimeZone PST, and sets spark.sql.session.timeZone to PST, 
and sends a query "SELECT timestamp '2020-05-20 12:00:00'", and the JVM 
timezone of the Spark cluster is e.g. CET, then
- the timestamp literal in the query is interpreted as 12:00:00 PST, i.e. 
21:00:00 CET
- but currently when it's returned, the timestamps are collected from the query 
with a collect() in 
https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala#L299,
 and then in the end Timestamps are turned into strings using a t.toString() in 
https://github.com/apache/spark/blob/master/sql/hive-thriftserver/v2.3/src/main/java/org/apache/hive/service/cli/ColumnValue.java#L138
 This will use the Spark cluster TimeZone. That results in "21:00:00" returned 
to the JDBC application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31859) Thriftserver with spark.sql.datetime.java8API.enabled=true

2020-05-28 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-31859:
--
Issue Type: Bug  (was: Improvement)

> Thriftserver with spark.sql.datetime.java8API.enabled=true
> --
>
> Key: SPARK-31859
> URL: https://issues.apache.org/jira/browse/SPARK-31859
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> {code}
>   test("spark.sql.datetime.java8API.enabled=true") {
> withJdbcStatement() { st =>
>   st.execute("set spark.sql.datetime.java8API.enabled=true")
>   val rs = st.executeQuery("select timestamp '2020-05-28 00:00:00'")
>   rs.next()
>   // scalastyle:off
>   println(rs.getObject(1))
> }
>   }
> {code}
> fails with 
> {code}
> HiveThriftBinaryServerSuite:
> java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
> hh:mm:ss[.f]
> at java.sql.Timestamp.valueOf(Timestamp.java:204)
> at 
> org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:444)
> at 
> org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:424)
> at 
> org.apache.hive.jdbc.HiveBaseResultSet.getObject(HiveBaseResultSet.java:464
> {code}
> It seems it might be needed in HiveResult.toHiveString?
> cc [~maxgekk]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31859) Thriftserver with spark.sql.datetime.java8API.enabled=true

2020-05-28 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-31859:
-

 Summary: Thriftserver with spark.sql.datetime.java8API.enabled=true
 Key: SPARK-31859
 URL: https://issues.apache.org/jira/browse/SPARK-31859
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Juliusz Sompolski


{code}
  test("spark.sql.datetime.java8API.enabled=true") {
withJdbcStatement() { st =>
  st.execute("set spark.sql.datetime.java8API.enabled=true")
  val rs = st.executeQuery("select timestamp '2020-05-28 00:00:00'")
  rs.next()
  // scalastyle:off
  println(rs.getObject(1))
}
  }
{code}
fails with 
{code}
HiveThriftBinaryServerSuite:
java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
hh:mm:ss[.f]
at java.sql.Timestamp.valueOf(Timestamp.java:204)
at 
org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:444)
at 
org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:424)
at 
org.apache.hive.jdbc.HiveBaseResultSet.getObject(HiveBaseResultSet.java:464
{code}

It seems it might be needed in HiveResult.toHiveString?
cc [~maxgekk]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31388) org.apache.spark.sql.hive.thriftserver.CliSuite result matching is flaky

2020-04-08 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-31388:
-

 Summary: org.apache.spark.sql.hive.thriftserver.CliSuite result 
matching is flaky
 Key: SPARK-31388
 URL: https://issues.apache.org/jira/browse/SPARK-31388
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Juliusz Sompolski


CliSuite.runCliWithin result matching has issues. Will describe in PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31204) HiveResult compatibility for DatasourceV2 command

2020-03-20 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-31204:
-

 Summary: HiveResult compatibility for DatasourceV2 command
 Key: SPARK-31204
 URL: https://issues.apache.org/jira/browse/SPARK-31204
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Juliusz Sompolski


HiveResult performs some compatibility matches and conversions for commands to 
be compatible with Hive output, e.g.:

{code}
case ExecutedCommandExec(_: DescribeCommandBase) =>
  // If it is a describe command for a Hive table, we want to have the 
output format
  // be similar with Hive.
...
// SHOW TABLES in Hive only output table names, while ours output database, 
table name, isTemp.
case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
{code}

It is needed for DatasourceV2 commands as well (eg. ShowTablesExec...).




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31051) Thriftserver operations other than SparkExecuteStatementOperation do not call onOperationClosed

2020-03-05 Thread Juliusz Sompolski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski resolved SPARK-31051.
---
Resolution: Not A Problem

I'm just blind. They are there.

> Thriftserver operations other than SparkExecuteStatementOperation do not call 
> onOperationClosed
> ---
>
> Key: SPARK-31051
> URL: https://issues.apache.org/jira/browse/SPARK-31051
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> In Spark 3.0 onOperationClosed was implemented in HIveThriftServer2Listener 
> to track closing the operation in the thriftserver (after client finishes 
> fetching).
> However, it seems that only SparkExecuteStatementOperation calls it in it's 
> close() function. Other operations need to do this as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31051) Thriftserver operations other than SparkExecuteStatementOperation do not call onOperationClosed

2020-03-05 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-31051:
-

 Summary: Thriftserver operations other than 
SparkExecuteStatementOperation do not call onOperationClosed
 Key: SPARK-31051
 URL: https://issues.apache.org/jira/browse/SPARK-31051
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Juliusz Sompolski


In Spark 3.0 onOperationClosed was implemented in HIveThriftServer2Listener to 
track closing the operation in the thriftserver (after client finishes 
fetching).
However, it seems that only SparkExecuteStatementOperation calls it in it's 
close() function. Other operations need to do this as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 150 matches

Mail list logo