It will be fine for non-connect users. When we are actually moving client one, I think we should go with an SPIP cuz that might affect end users....
On Tue, 2 Jul 2024 at 23:05, Holden Karau <holden.ka...@gmail.com> wrote: > I guess my one concern here would be are we going to expand the > dependencies that are visible on the class path for non-connect users? > > One of the pain points that folks experienced with upgrading can be from > those changing. > > Otherwise this seems pretty reasonable. > > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > > > On Tue, Jul 2, 2024 at 5:36 AM Matthew Powers < > matthewkevinpow...@gmail.com> wrote: > >> This is a great idea and would be a great quality of life improvement. >> >> +1 (non-binding) >> >> On Tue, Jul 2, 2024 at 4:56 AM Hyukjin Kwon <gurwls...@apache.org> wrote: >> >>> > while leaving the connect jvm client in a separate folder looks weird >>> >>> I plan to actually put it at the top level together but I feel like this >>> has to be done with SPIP so I am moving internal server side first >>> orthogonally >>> >>> On Tue, 2 Jul 2024 at 17:54, Cheng Pan <pan3...@gmail.com> wrote: >>> >>>> Thanks for raising this discussion, I think putting the connect folder >>>> on the top level is a good idea to promote Spark Connect, while leaving the >>>> connect jvm client in a separate folder looks weird. I suppose there is no >>>> contract to leave all optional modules under `connector`? e.g. >>>> `resource-managers/kubernetes/{docker,integration-tests}`, `hadoop-cloud`. >>>> What about moving the whole `connect` folder to the top level? >>>> >>>> Thanks, >>>> Cheng Pan >>>> >>>> >>>> On Jul 2, 2024, at 08:19, Hyukjin Kwon <gurwls...@apache.org> wrote: >>>> >>>> Hi all, >>>> >>>> I would like to discuss moving Spark Connect server to builtin package. >>>> Right now, users have to specify —packages when they run Spark Connect >>>> server script, for example: >>>> >>>> ./sbin/start-connect-server.sh --jars `ls >>>> connector/connect/server/target/**/spark-connect*SNAPSHOT.jar` >>>> >>>> or >>>> >>>> ./sbin/start-connect-server.sh --packages >>>> org.apache.spark:spark-connect_2.12:3.5.1 >>>> >>>> which is a little bit odd that sbin scripts should provide jars to >>>> start. >>>> >>>> Moving it to builtin package is pretty straightforward because most of >>>> jars are shaded, and the impact would be minimal, I have a prototype here >>>> apache/spark/#47157 <https://github.com/apache/spark/pull/47157>. This >>>> also simplifies Python local running logic a lot. >>>> >>>> User facing API layer, Spark Connect Client, stays external but I would >>>> like the internal/admin server layer, Spark Connect Server, implementation >>>> to be built in Spark. >>>> >>>> Please let me know if you have thoughts on this! >>>> >>>> >>>>