Hi Holden,

As promised, here is the additional document
<https://docs.google.com/document/d/1Dx9NqHRNuUpatH9DYoFF9cmvUl2fqHT4Rjbyw4EGLHs/edit?tab=t.0#heading=h.4h01j4b8rjzv>
describing the proposed worker specification design, which I have also
attached to the JIRA ticket: SPARK-55278
<https://issues.apache.org/jira/browse/SPARK-55278>.

As discussed, I hope this helps clarify the concerns raised earlier.

Thank you for your feedback.

Best,

Haiyang

On Thu, Feb 26, 2026 at 10:30 PM Holden Karau <[email protected]>
wrote:

> Looking forward to these additional docs :)
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> <https://www.fighthealthinsurance.com/?q=hk_email>
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>
>
> On Thu, Feb 26, 2026 at 12:50 PM Haiyang Sun via dev <[email protected]>
> wrote:
>
>> Hi Holden,
>>
>> Thanks again for the detailed comments and suggestions.
>>
>> I’ve responded inline in the document and will revise the SPIP to make
>> several areas more explicit. For visibility, here is a short summary:
>>
>> 1) Security (new IPC mechanism)
>>
>> We will add a dedicated security section. Overall, this should not be
>> worse than the current socket-based implementation. Moving to gRPC may
>> actually improve our position by leveraging existing ecosystem support for
>> TLS, authentication, interceptors, and observability — which are harder to
>> standardize correctly on top of a raw socket protocol.
>>
>> 2) Performance assumptions
>>
>> Agreed — we should back claims with systematic benchmarking. We have an
>> early gRPC prototype with preliminary results comparable to the current
>> socket path, but we will avoid strong claims until properly benchmarked.
>> The existing Python/Scala paths will remain, and any default switch would
>> only happen after meeting explicit performance goals.
>>
>> 3) Fallback / migration strategy
>>
>> We will make this explicit in the SPIP. The plan is to separate the
>> transport layer from UDF processing logic in worker.py, allowing gRPC and
>> socket to share the same execution logic. This enables safe fallback and
>> reduces long-term dual-maintenance overhead.
>>
>> 4) Worker specification
>>
>> We do have a more detailed design and can publish it as a supporting
>> document. The SPIP will clarify the expected structure and required
>> metadata without going too deep into implementation detail.
>>
>> 5) Dependency management
>>
>> This will be defined in the worker specification. Each language
>> implementation defines its dependency requirements, and clusters are
>> expected to provision environments accordingly (as is already the case for
>> Python today).
>>
>> 6) Unified query planning concerns
>>
>> The intent is not to force identical planning behavior across languages.
>> The worker specification can expose metadata (e.g., pipelining support,
>> concurrency, memory characteristics, data format constraints), allowing the
>> planner to remain flexible and language-aware without hardcoding
>> per-language rules.
>>
>> 7) Inter-UDF pipelining
>>
>> Pipelining is supported by the protocol design (similar to PySpark). The
>> init message can declare multiple UDFs and define chaining and input
>> mappings. Whether a language supports this can be expressed in worker
>> metadata so planning can respect it.
>>
>> Hopefully this addresses the main concerns. I’ll update the SPIP to
>> reflect these clarifications more explicitly.
>>
>> Thanks again for the thoughtful review.
>>
>>
>>

Reply via email to