Hi Holden,
As promised, here is the additional document <https://docs.google.com/document/d/1Dx9NqHRNuUpatH9DYoFF9cmvUl2fqHT4Rjbyw4EGLHs/edit?tab=t.0#heading=h.4h01j4b8rjzv> describing the proposed worker specification design, which I have also attached to the JIRA ticket: SPARK-55278 <https://issues.apache.org/jira/browse/SPARK-55278>. As discussed, I hope this helps clarify the concerns raised earlier. Thank you for your feedback. Best, Haiyang On Thu, Feb 26, 2026 at 10:30 PM Holden Karau <[email protected]> wrote: > Looking forward to these additional docs :) > > Twitter: https://twitter.com/holdenkarau > Fight Health Insurance: https://www.fighthealthinsurance.com/ > <https://www.fighthealthinsurance.com/?q=hk_email> > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > Pronouns: she/her > > > On Thu, Feb 26, 2026 at 12:50 PM Haiyang Sun via dev <[email protected]> > wrote: > >> Hi Holden, >> >> Thanks again for the detailed comments and suggestions. >> >> I’ve responded inline in the document and will revise the SPIP to make >> several areas more explicit. For visibility, here is a short summary: >> >> 1) Security (new IPC mechanism) >> >> We will add a dedicated security section. Overall, this should not be >> worse than the current socket-based implementation. Moving to gRPC may >> actually improve our position by leveraging existing ecosystem support for >> TLS, authentication, interceptors, and observability — which are harder to >> standardize correctly on top of a raw socket protocol. >> >> 2) Performance assumptions >> >> Agreed — we should back claims with systematic benchmarking. We have an >> early gRPC prototype with preliminary results comparable to the current >> socket path, but we will avoid strong claims until properly benchmarked. >> The existing Python/Scala paths will remain, and any default switch would >> only happen after meeting explicit performance goals. >> >> 3) Fallback / migration strategy >> >> We will make this explicit in the SPIP. The plan is to separate the >> transport layer from UDF processing logic in worker.py, allowing gRPC and >> socket to share the same execution logic. This enables safe fallback and >> reduces long-term dual-maintenance overhead. >> >> 4) Worker specification >> >> We do have a more detailed design and can publish it as a supporting >> document. The SPIP will clarify the expected structure and required >> metadata without going too deep into implementation detail. >> >> 5) Dependency management >> >> This will be defined in the worker specification. Each language >> implementation defines its dependency requirements, and clusters are >> expected to provision environments accordingly (as is already the case for >> Python today). >> >> 6) Unified query planning concerns >> >> The intent is not to force identical planning behavior across languages. >> The worker specification can expose metadata (e.g., pipelining support, >> concurrency, memory characteristics, data format constraints), allowing the >> planner to remain flexible and language-aware without hardcoding >> per-language rules. >> >> 7) Inter-UDF pipelining >> >> Pipelining is supported by the protocol design (similar to PySpark). The >> init message can declare multiple UDFs and define chaining and input >> mappings. Whether a language supports this can be expressed in worker >> metadata so planning can respect it. >> >> Hopefully this addresses the main concerns. I’ll update the SPIP to >> reflect these clarifications more explicitly. >> >> Thanks again for the thoughtful review. >> >> >>
