Alrighty, let me start the vote to make sure everybody is happy :-). On Wed, 3 Jul 2024 at 09:55, Hyukjin Kwon <gurwls...@apache.org> wrote:
> It will be fine for non-connect users. When we are actually moving client > one, I think we should go with an SPIP cuz that might affect end users.... > > On Tue, 2 Jul 2024 at 23:05, Holden Karau <holden.ka...@gmail.com> wrote: > >> I guess my one concern here would be are we going to expand the >> dependencies that are visible on the class path for non-connect users? >> >> One of the pain points that folks experienced with upgrading can be from >> those changing. >> >> Otherwise this seems pretty reasonable. >> >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> >> >> On Tue, Jul 2, 2024 at 5:36 AM Matthew Powers < >> matthewkevinpow...@gmail.com> wrote: >> >>> This is a great idea and would be a great quality of life improvement. >>> >>> +1 (non-binding) >>> >>> On Tue, Jul 2, 2024 at 4:56 AM Hyukjin Kwon <gurwls...@apache.org> >>> wrote: >>> >>>> > while leaving the connect jvm client in a separate folder looks weird >>>> >>>> I plan to actually put it at the top level together but I feel like >>>> this has to be done with SPIP so I am moving internal server side first >>>> orthogonally >>>> >>>> On Tue, 2 Jul 2024 at 17:54, Cheng Pan <pan3...@gmail.com> wrote: >>>> >>>>> Thanks for raising this discussion, I think putting the connect folder >>>>> on the top level is a good idea to promote Spark Connect, while leaving >>>>> the >>>>> connect jvm client in a separate folder looks weird. I suppose there is no >>>>> contract to leave all optional modules under `connector`? e.g. >>>>> `resource-managers/kubernetes/{docker,integration-tests}`, `hadoop-cloud`. >>>>> What about moving the whole `connect` folder to the top level? >>>>> >>>>> Thanks, >>>>> Cheng Pan >>>>> >>>>> >>>>> On Jul 2, 2024, at 08:19, Hyukjin Kwon <gurwls...@apache.org> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> I would like to discuss moving Spark Connect server to builtin >>>>> package. Right now, users have to specify —packages when they run Spark >>>>> Connect server script, for example: >>>>> >>>>> ./sbin/start-connect-server.sh --jars `ls >>>>> connector/connect/server/target/**/spark-connect*SNAPSHOT.jar` >>>>> >>>>> or >>>>> >>>>> ./sbin/start-connect-server.sh --packages >>>>> org.apache.spark:spark-connect_2.12:3.5.1 >>>>> >>>>> which is a little bit odd that sbin scripts should provide jars to >>>>> start. >>>>> >>>>> Moving it to builtin package is pretty straightforward because most of >>>>> jars are shaded, and the impact would be minimal, I have a prototype here >>>>> apache/spark/#47157 <https://github.com/apache/spark/pull/47157>. >>>>> This also simplifies Python local running logic a lot. >>>>> >>>>> User facing API layer, Spark Connect Client, stays external but I >>>>> would like the internal/admin server layer, Spark Connect Server, >>>>> implementation to be built in Spark. >>>>> >>>>> Please let me know if you have thoughts on this! >>>>> >>>>> >>>>>