Hi Spark devs, I would like to call for a vote on the SPIP: Asynchronous Metadata Resolution & Lazy Prefetching for Spark Connect (Phase 1: Client-Side Plan-ID Caching).
*Summary*: This proposal addresses the critical "Death by 1000 RPCs" performance regression in Spark Connect. Currently, interactive workloads suffer from blocking network latency during metadata resolution. The proposal introduces a Client-Side Plan-ID Cache to eliminate redundant RPCs for deterministic plan structures (e.g., select, withColumn), significantly improving interactive performance. *Scope*: Based on the discussion feedback (special thanks to Herman, Erik, Ruifeng, and Holden), this SPIP has been narrowed to Phase 1 only, focusing strictly on the caching infrastructure and excluding the broader asynchronous API changes for now. *Links*: *SPIP *Doc: https://docs.google.com/document/d/1xTvL5YWnHu1jfXvjlKk2KeSv8JJC08dsD7mdbjjo9YE/edit?usp=sharing *JIRA*: https://issues.apache.org/jira/browse/SPARK-55163 *Discussion Thread*: https://lists.apache.org/thread/wxj8mtopvm8bt959l58drzd4p90p6vn1 Please vote on the SPIP for the next 72 hours: [ ] +1: Accept the proposal as an official SPIP [ ] +0 [ ] -1: I don’t think this is a good idea because... Regards, Vaquar Khan *Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/ *Book *- https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true *GitBook*-https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/ *Stack *-https://stackoverflow.com/users/4812170/vaquar-khan *github*-https://github.com/vaquarkhan
