Hi Spark community,

We’d like to propose a new SPIP to improve the experience of running Apache
Spark on laptops.

SPIP doc:

https://docs.google.com/document/d/1Nphejrf_vh4YRECn0JPgKClqxDS_lB6wufZFJQxyY98/edit?tab=t.0#heading=h.hj76akdx5ul

Summary:

Spark’s execution model is optimized for distributed workloads, but this
introduces noticeable overhead for small datasets (e.g., <100MB), where
even simple queries can take multiple seconds. This makes Spark less
suitable for interactive and exploratory use cases on laptops, and often
pushes users toward alternative single-node tools.

This proposal aims to reduce that overhead in local mode, improving latency
for small queries and making Spark more usable as an entry point for new
users and iterative workflows.

We’d appreciate your review and feedback.

Thanks,
Daniel Tenedorio and Liang-Chi Hsieh

Reply via email to