Thanks for the reply! I had originally thought that this would incur a cost
of spinning up a VM every time the UDF is called but thinking about it
again you might be right. I guess if I make the VM accessible via a
transient property on the UDF class then it would only be initialized once
per executor right? Or would it be once per task?

I also was worried that this would mean you end up paying a lot in SerDe
cost if you send each row over to the VM one by one?

On Mon, Jun 27, 2022 at 10:02 PM Sean Owen <sro...@gmail.com> wrote:

> Rather than reimplement a new UDF, why not indeed just use an embedded
> interpreter? if something can turn javascript into something executable you
> can wrap that in a normal Java/Scala UDF and go.
>
> On Mon, Jun 27, 2022 at 10:42 PM Matt Hawes <hawes.i...@gmail.com> wrote:
>
>> Hi all, I'm thinking about trying to implement the ability to write spark
>> UDFs using javascript.
>>
>> For the use case I have in mind, a lot of the code is already written in
>> javascript and so it would be very convenient to be able to call this
>> directly from spark.
>>
>> I wanted to post here first before I start digging into the UDF code to
>> see if anyone has attempted this already or if people have thoughts on it.
>> I couldn't find anything in the Jira. I'd be especially appreciative of any
>> pointers towards relevant sections of the code to get started!
>>
>> My rough plan is to do something similar to how python UDFs work (as I
>> understand them). I.e. call out to a javascript process, potentially just
>> something in GraalJs for example: https://github.com/oracle/graaljs.
>>
>> I understand that there's probably a long discussion to be had here with
>> regards to making this part of Spark core, but I wanted to start that
>> discussion. :)
>>
>> Best,
>> Matt
>>
>>

Reply via email to