Re: [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

Hyukjin Kwon Tue, 02 Jul 2024 17:55:59 -0700

It will be fine for non-connect users. When we are actually moving client
one, I think we should go with an SPIP cuz that might affect end users....


On Tue, 2 Jul 2024 at 23:05, Holden Karau <holden.ka...@gmail.com> wrote:

> I guess my one concern here would be are we going to expand the
> dependencies that are visible on the class path for non-connect users?
>
> One of the pain points that folks experienced with upgrading can be from
> those changing.
>
> Otherwise this seems pretty reasonable.
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Tue, Jul 2, 2024 at 5:36 AM Matthew Powers <
> matthewkevinpow...@gmail.com> wrote:
>
>> This is a great idea and would be a great quality of life improvement.
>>
>> +1 (non-binding)
>>
>> On Tue, Jul 2, 2024 at 4:56 AM Hyukjin Kwon <gurwls...@apache.org> wrote:
>>
>>> > while leaving the connect jvm client in a separate folder looks weird
>>>
>>> I plan to actually put it at the top level together but I feel like this
>>> has to be done with SPIP so I am moving internal server side first
>>> orthogonally
>>>
>>> On Tue, 2 Jul 2024 at 17:54, Cheng Pan <pan3...@gmail.com> wrote:
>>>
>>>> Thanks for raising this discussion, I think putting the connect folder
>>>> on the top level is a good idea to promote Spark Connect, while leaving the
>>>> connect jvm client in a separate folder looks weird. I suppose there is no
>>>> contract to leave all optional modules under `connector`? e.g.
>>>> `resource-managers/kubernetes/{docker,integration-tests}`, `hadoop-cloud`.
>>>> What about moving the whole `connect` folder to the top level?
>>>>
>>>> Thanks,
>>>> Cheng Pan
>>>>
>>>>
>>>> On Jul 2, 2024, at 08:19, Hyukjin Kwon <gurwls...@apache.org> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I would like to discuss moving Spark Connect server to builtin package.
>>>> Right now, users have to specify —packages when they run Spark Connect
>>>> server script, for example:
>>>>
>>>> ./sbin/start-connect-server.sh --jars `ls 
>>>> connector/connect/server/target/**/spark-connect*SNAPSHOT.jar`
>>>>
>>>> or
>>>>
>>>> ./sbin/start-connect-server.sh --packages 
>>>> org.apache.spark:spark-connect_2.12:3.5.1
>>>>
>>>> which is a little bit odd that sbin scripts should provide jars to
>>>> start.
>>>>
>>>> Moving it to builtin package is pretty straightforward because most of
>>>> jars are shaded, and the impact would be minimal, I have a prototype here
>>>> apache/spark/#47157 <https://github.com/apache/spark/pull/47157>. This
>>>> also simplifies Python local running logic a lot.
>>>>
>>>> User facing API layer, Spark Connect Client, stays external but I would
>>>> like the internal/admin server layer, Spark Connect Server, implementation
>>>> to be built in Spark.
>>>>
>>>> Please let me know if you have thoughts on this!
>>>>
>>>>
>>>>

Re: [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

Reply via email to