Re: [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

Hyukjin Kwon Tue, 02 Jul 2024 17:56:59 -0700

Alrighty, let me start the vote to make sure everybody is happy :-).

On Wed, 3 Jul 2024 at 09:55, Hyukjin Kwon <[email protected]> wrote:


> It will be fine for non-connect users. When we are actually moving client
> one, I think we should go with an SPIP cuz that might affect end users....
>
> On Tue, 2 Jul 2024 at 23:05, Holden Karau <[email protected]> wrote:
>
>> I guess my one concern here would be are we going to expand the
>> dependencies that are visible on the class path for non-connect users?
>>
>> One of the pain points that folks experienced with upgrading can be from
>> those changing.
>>
>> Otherwise this seems pretty reasonable.
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>
>> On Tue, Jul 2, 2024 at 5:36 AM Matthew Powers <
>> [email protected]> wrote:
>>
>>> This is a great idea and would be a great quality of life improvement.
>>>
>>> +1 (non-binding)
>>>
>>> On Tue, Jul 2, 2024 at 4:56 AM Hyukjin Kwon <[email protected]>
>>> wrote:
>>>
>>>> > while leaving the connect jvm client in a separate folder looks weird
>>>>
>>>> I plan to actually put it at the top level together but I feel like
>>>> this has to be done with SPIP so I am moving internal server side first
>>>> orthogonally
>>>>
>>>> On Tue, 2 Jul 2024 at 17:54, Cheng Pan <[email protected]> wrote:
>>>>
>>>>> Thanks for raising this discussion, I think putting the connect folder
>>>>> on the top level is a good idea to promote Spark Connect, while leaving 
>>>>> the
>>>>> connect jvm client in a separate folder looks weird. I suppose there is no
>>>>> contract to leave all optional modules under `connector`? e.g.
>>>>> `resource-managers/kubernetes/{docker,integration-tests}`, `hadoop-cloud`.
>>>>> What about moving the whole `connect` folder to the top level?
>>>>>
>>>>> Thanks,
>>>>> Cheng Pan
>>>>>
>>>>>
>>>>> On Jul 2, 2024, at 08:19, Hyukjin Kwon <[email protected]> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I would like to discuss moving Spark Connect server to builtin
>>>>> package. Right now, users have to specify —packages when they run Spark
>>>>> Connect server script, for example:
>>>>>
>>>>> ./sbin/start-connect-server.sh --jars `ls 
>>>>> connector/connect/server/target/**/spark-connect*SNAPSHOT.jar`
>>>>>
>>>>> or
>>>>>
>>>>> ./sbin/start-connect-server.sh --packages 
>>>>> org.apache.spark:spark-connect_2.12:3.5.1
>>>>>
>>>>> which is a little bit odd that sbin scripts should provide jars to
>>>>> start.
>>>>>
>>>>> Moving it to builtin package is pretty straightforward because most of
>>>>> jars are shaded, and the impact would be minimal, I have a prototype here
>>>>> apache/spark/#47157 <https://github.com/apache/spark/pull/47157>.
>>>>> This also simplifies Python local running logic a lot.
>>>>>
>>>>> User facing API layer, Spark Connect Client, stays external but I
>>>>> would like the internal/admin server layer, Spark Connect Server,
>>>>> implementation to be built in Spark.
>>>>>
>>>>> Please let me know if you have thoughts on this!
>>>>>
>>>>>
>>>>>

Re: [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

Reply via email to