Jungtaek Lim created SPARK-52228: ------------------------------------ Summary: Introduce the benchmark setup (manual) for state interaction between TWS state server and Python process Key: SPARK-52228 URL: https://issues.apache.org/jira/browse/SPARK-52228 Project: Spark Issue Type: Task Components: PySpark, Structured Streaming Affects Versions: 4.1.0 Reporter: Jungtaek Lim
We found that state interactions happening in TWS PySpark between state server (in JVM) and Python worker are taking majority of time on the execution. We had been capturing the time via running E2E benchmark query and having debug log to drill down, but this requires a lot of bootstrapping work and adding debug log is a throwaway work. This ticket tracks the effort to come up with the approach to microbenchmark state interactions, to avoid constructing env and running E2E benchmark query to understand it. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org