Support for UDFs would work in the same way as they work today. The
closures are serialized on the client and sent via the driver to the worker.

While there is no difference in the execution of the UDF, there can be
potential challenges with the dependencies required for execution. This is
true both for Python and Scala. I would like to avoid bringing dependency
management into this SPIP and I believe this can be solved in principle by
explicitly adding the JARs for the depency so that they are available in
the classpath.

In its current form, the SPIP does not propose to add new language support
for UDFs, but in theory it becomes possible to do so as long as closures
can be serialized either as code or binary and dynamically loaded on the
other side.

I hope this answers the question.

Thanks
Martin

On Sat 4. Jun 2022 at 05:04 Koert Kuipers <ko...@tresata.com> wrote:

> how would scala udfs be supported in this?
>
> On Fri, Jun 3, 2022 at 1:52 PM Martin Grund
> <martin.gr...@databricks.com.invalid> wrote:
>
>> Hi Everyone,
>>
>> We would like to start a discussion on the "Spark Connect" proposal.
>> Please find the links below:
>>
>> *JIRA* - https://issues.apache.org/jira/browse/SPARK-39375
>> *SPIP Document* -
>> https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj
>>
>> *Excerpt from the document: *
>>
>> We propose to extend Apache Spark by building on the DataFrame API and
>> the underlying unresolved logical plans. The DataFrame API is widely used
>> and makes it very easy to iteratively express complex logic. We will
>> introduce Spark Connect, a remote option of the DataFrame API that
>> separates the client from the Spark server. With Spark Connect, Spark will
>> become decoupled, allowing for built-in remote connectivity: The decoupled
>> client SDK can be used to run interactive data exploration and connect to
>> the server for DataFrame operations.
>>
>> Spark Connect will benefit Spark developers in different ways: The
>> decoupled architecture will result in improved stability, as clients are
>> separated from the driver. From the Spark Connect client perspective, Spark
>> will be (almost) versionless, and thus enable seamless upgradability, as
>> server APIs can evolve without affecting the client API. The decoupled
>> client-server architecture can be leveraged to build close integrations
>> with local developer tooling. Finally, separating the client process from
>> the Spark server process will improve Spark’s overall security posture by
>> avoiding the tight coupling of the client inside the Spark runtime
>> environment.
>>
>> Spark Connect will strengthen Spark’s position as the modern unified
>> engine for large-scale data analytics and expand applicability to use cases
>> and developers we could not reach with the current setup: Spark will become
>> ubiquitously usable as the DataFrame API can be used with (almost) any
>> programming language.
>>
>> We would like to start a discussion on the document and any feedback is
>> welcome!
>>
>> Thanks a lot in advance,
>> Martin
>>
>
> CONFIDENTIALITY NOTICE: This electronic communication and any files
> transmitted with it are confidential, privileged and intended solely for
> the use of the individual or entity to whom they are addressed. If you are
> not the intended recipient, you are hereby notified that any disclosure,
> copying, distribution (electronic or otherwise) or forwarding of, or the
> taking of any action in reliance on the contents of this transmission is
> strictly prohibited. Please notify the sender immediately by e-mail if you
> have received this email by mistake and delete this email from your system.
>
> Is it necessary to print this email? If you care about the environment
> like we do, please refrain from printing emails. It helps to keep the
> environment forested and litter-free.

Reply via email to