Re: [VOTE] SPIP: Asynchronous Metadata Resolution & Lazy Prefetching (Phase 1)

Mich Talebzadeh Tue, 17 Feb 2026 12:07:56 -0800

@Herman van Hovell <[email protected]>

good point. I added a comments in SPIP


HTH
Dr Mich Talebzadeh,
Data Scientist | Distributed Systems (Spark) | Financial Forensics &
Metadata Analytics | Transaction Reconstruction | Audit & Evidence-Based
Analytics

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>





On Tue, 17 Feb 2026 at 19:18, Herman van Hovell via dev <
[email protected]> wrote:

> Hi All,
>
> While I think it is great that we are trying to address this issue in
> Connect, I have concerns about the current proposal (see the comments in
> the doc). I would like to discuss this more in detail before proceeding.
> Given that this is an official vote, I will cast a -1 for now.
>
> Cheers,
> Herman
>
> On Tue, Feb 17, 2026 at 2:39 PM Devin Petersohn via dev <
> [email protected]> wrote:
>
>> +1 (non-binding). We've encountered the patterns described here
>> repeatedly in user workflows, and this proposal will be a big step forward
>> in the Spark Connect user experience.
>>
>> On Tue, Feb 17, 2026 at 12:07 PM Mich Talebzadeh <
>> [email protected]> wrote:
>>
>>> +1 from me
>>>
>>> Dr Mich Talebzadeh,
>>> Data Scientist | Distributed Systems (Spark) | Financial Forensics &
>>> Metadata Analytics | Transaction Reconstruction | Audit & Evidence-Based
>>> Analytics
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, 17 Feb 2026 at 17:54, Holden Karau <[email protected]>
>>> wrote:
>>>
>>>> +1, this fixes a key performance regression between regular Spark and
>>>> Spark connect. In talking with some users they ended up having to implement
>>>> their own caching to work around the death by 1k RPC issue called out here.
>>>>
>>>> Twitter: https://twitter.com/holdenkarau
>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>> <https://www.fighthealthinsurance.com/?q=hk_email>
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>> Pronouns: she/her
>>>>
>>>>
>>>> On Tue, Feb 17, 2026 at 8:28 AM vaquar khan <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Spark devs,
>>>>>
>>>>> I would like to call for a vote on the SPIP: Asynchronous Metadata
>>>>> Resolution & Lazy Prefetching for Spark Connect (Phase 1: Client-Side
>>>>> Plan-ID Caching).
>>>>>
>>>>> *Summary*:
>>>>> This proposal addresses the critical "Death by 1000 RPCs" performance
>>>>> regression in Spark Connect. Currently, interactive workloads suffer from
>>>>> blocking network latency during metadata resolution. The proposal
>>>>> introduces a Client-Side Plan-ID Cache to eliminate redundant RPCs for
>>>>> deterministic plan structures (e.g., select, withColumn), significantly
>>>>> improving interactive performance.
>>>>>
>>>>> *Scope*:
>>>>> Based on the discussion feedback (special thanks to Herman, Erik,
>>>>> Ruifeng, and Holden), this SPIP has been narrowed to Phase 1 only, 
>>>>> focusing
>>>>> strictly on the caching infrastructure and excluding the broader
>>>>> asynchronous API changes for now.
>>>>> *Links*:
>>>>>
>>>>> *SPIP *Doc:
>>>>> https://docs.google.com/document/d/1xTvL5YWnHu1jfXvjlKk2KeSv8JJC08dsD7mdbjjo9YE/edit?usp=sharing
>>>>>
>>>>> *JIRA*: https://issues.apache.org/jira/browse/SPARK-55163
>>>>>
>>>>> *Discussion Thread*:
>>>>> https://lists.apache.org/thread/wxj8mtopvm8bt959l58drzd4p90p6vn1
>>>>>
>>>>> Please vote on the SPIP for the next 72 hours:
>>>>>
>>>>> [ ] +1: Accept the proposal as an official SPIP
>>>>> [ ] +0
>>>>> [ ] -1: I don’t think this is a good idea because...
>>>>>
>>>>>
>>>>> Regards,
>>>>> Vaquar Khan
>>>>> *Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/
>>>>> *Book *-
>>>>> https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true
>>>>> *GitBook*-
>>>>> https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/
>>>>> *Stack *-https://stackoverflow.com/users/4812170/vaquar-khan
>>>>> *github*-https://github.com/vaquarkhan
>>>>>
>>>>

Re: [VOTE] SPIP: Asynchronous Metadata Resolution & Lazy Prefetching (Phase 1)

Reply via email to