Hi, Whilst I concur that there is a need for client server architecture, that technology has been around over 30 years. Moreover the current spark had vey efficient connections via JDBC to various databases. In some cases the API to various databases, for example Google BiqQuery is very efficient. I am not sure what this proposal is to trying to address?
HTH On Fri, 3 Jun 2022 at 18:46, Martin Grund ent server <martin.gr...@dd.com.invalid> wrote: > Hi Everyone, > > We would like to start a discussion on the "Spark Connect" proposal. > Please find the links below: > > *JIRA* - https://issues.apache.org/jira/browse/SPARK-39375 > *SPIP Document* - > https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj > > *Excerpt from the document: * > > We propose to extend Apache Spark by building on the DataFrame API and the > underlying unresolved logical plans. The DataFrame API is widely used and > makes it very easy to iteratively express complex logic. We will introduce > Spark Connect, a remote option of the DataFrame API that separates the > client from the Spark server. With Spark Connect, Spark will become > decoupled, allowing for built-in remote connectivity: The decoupled client > SDK can be used to run interactive data exploration and connect to the > server for DataFrame operations. > > Spark Connect will benefit Spark developers in different ways: The > decoupled architecture will result in improved stability, as clients are > separated from the driver. From the Spark Connect client perspective, Spark > will be (almost) versionless, and thus enable seamless upgradability, as > server APIs can evolve without affecting the client API. The decoupled > client-server architecture can be leveraged to build close integrations > with local developer tooling. Finally, separating the client process from > the Spark server process will improve Spark’s overall security posture by > avoiding the tight coupling of the client inside the Spark runtime > environment. > > Spark Connect will strengthen Spark’s position as the modern unified > engine for large-scale data analytics and expand applicability to use cases > and developers we could not reach with the current setup: Spark will become > ubiquitously usable as the DataFrame API can be used with (almost) any > programming language. > > We would like to start a discussion on the document and any feedback is > welcome! > > Thanks a lot in advance, > Martin > -- view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.