[ https://issues.apache.org/jira/browse/SPARK-52228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jungtaek Lim reassigned SPARK-52228: ------------------------------------ Assignee: Jungtaek Lim > Introduce the benchmark setup (manual) for state interaction between TWS > state server and Python process > -------------------------------------------------------------------------------------------------------- > > Key: SPARK-52228 > URL: https://issues.apache.org/jira/browse/SPARK-52228 > Project: Spark > Issue Type: Task > Components: PySpark, Structured Streaming > Affects Versions: 4.1.0 > Reporter: Jungtaek Lim > Assignee: Jungtaek Lim > Priority: Major > Labels: pull-request-available > > We found that state interactions happening in TWS PySpark between state > server (in JVM) and Python worker are taking majority of time on the > execution. > We had been capturing the time via running E2E benchmark query and having > debug log to drill down, but this requires a lot of bootstrapping work and > adding debug log is a throwaway work. > This ticket tracks the effort to come up with the approach to microbenchmark > state interactions, to avoid constructing env and running E2E benchmark query > to understand it. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org