Hi Spark devs,

I would like to call for a vote on the SPIP: Asynchronous Metadata
Resolution & Lazy Prefetching for Spark Connect (Phase 1: Client-Side
Plan-ID Caching).

*Summary*:
This proposal addresses the critical "Death by 1000 RPCs" performance
regression in Spark Connect. Currently, interactive workloads suffer from
blocking network latency during metadata resolution. The proposal
introduces a Client-Side Plan-ID Cache to eliminate redundant RPCs for
deterministic plan structures (e.g., select, withColumn), significantly
improving interactive performance.

*Scope*:
Based on the discussion feedback (special thanks to Herman, Erik, Ruifeng,
and Holden), this SPIP has been narrowed to Phase 1 only, focusing strictly
on the caching infrastructure and excluding the broader asynchronous API
changes for now.
*Links*:

*SPIP *Doc:
https://docs.google.com/document/d/1xTvL5YWnHu1jfXvjlKk2KeSv8JJC08dsD7mdbjjo9YE/edit?usp=sharing

*JIRA*: https://issues.apache.org/jira/browse/SPARK-55163

*Discussion Thread*:
https://lists.apache.org/thread/wxj8mtopvm8bt959l58drzd4p90p6vn1

Please vote on the SPIP for the next 72 hours:

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don’t think this is a good idea because...


Regards,
Vaquar Khan
*Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/
*Book *-
https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true
*GitBook*-https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/
*Stack *-https://stackoverflow.com/users/4812170/vaquar-khan
*github*-https://github.com/vaquarkhan

Reply via email to