[ 
https://issues.apache.org/jira/browse/SPARK-52228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-52228:
------------------------------------

    Assignee: Jungtaek Lim

> Introduce the benchmark setup (manual) for state interaction between TWS 
> state server and Python process
> --------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-52228
>                 URL: https://issues.apache.org/jira/browse/SPARK-52228
>             Project: Spark
>          Issue Type: Task
>          Components: PySpark, Structured Streaming
>    Affects Versions: 4.1.0
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Major
>              Labels: pull-request-available
>
> We found that state interactions happening in TWS PySpark between state 
> server (in JVM) and Python worker are taking majority of time on the 
> execution.
> We had been capturing the time via running E2E benchmark query and having 
> debug log to drill down, but this requires a lot of bootstrapping work and 
> adding debug log is a throwaway work.
> This ticket tracks the effort to come up with the approach to microbenchmark 
> state interactions, to avoid constructing env and running E2E benchmark query 
> to understand it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to