Jungtaek Lim created SPARK-52228:
------------------------------------

             Summary: Introduce the benchmark setup (manual) for state 
interaction between TWS state server and Python process
                 Key: SPARK-52228
                 URL: https://issues.apache.org/jira/browse/SPARK-52228
             Project: Spark
          Issue Type: Task
          Components: PySpark, Structured Streaming
    Affects Versions: 4.1.0
            Reporter: Jungtaek Lim


We found that state interactions happening in TWS PySpark between state server 
(in JVM) and Python worker are taking majority of time on the execution.

We had been capturing the time via running E2E benchmark query and having debug 
log to drill down, but this requires a lot of bootstrapping work and adding 
debug log is a throwaway work.

This ticket tracks the effort to come up with the approach to microbenchmark 
state interactions, to avoid constructing env and running E2E benchmark query 
to understand it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to