Re: Javascript Based UDFs

2022-06-27 Thread Sean Owen
I don't know how these frameworks work, but I'd hope that it takes
JavaScript and gives you some invokeable object. It's just Java executing
like anything else, nothing special, no special consideration for being in
a task? Why would it need another JVM?

On Mon, Jun 27, 2022, 11:18 PM Matt Hawes  wrote:

> Thanks for the reply! I had originally thought that this would incur a
> cost of spinning up a VM every time the UDF is called but thinking about it
> again you might be right. I guess if I make the VM accessible via a
> transient property on the UDF class then it would only be initialized once
> per executor right? Or would it be once per task?
>
> I also was worried that this would mean you end up paying a lot in SerDe
> cost if you send each row over to the VM one by one?
>
> On Mon, Jun 27, 2022 at 10:02 PM Sean Owen  wrote:
>
>> Rather than reimplement a new UDF, why not indeed just use an embedded
>> interpreter? if something can turn javascript into something executable you
>> can wrap that in a normal Java/Scala UDF and go.
>>
>> On Mon, Jun 27, 2022 at 10:42 PM Matt Hawes  wrote:
>>
>>> Hi all, I'm thinking about trying to implement the ability to write
>>> spark UDFs using javascript.
>>>
>>> For the use case I have in mind, a lot of the code is already written in
>>> javascript and so it would be very convenient to be able to call this
>>> directly from spark.
>>>
>>> I wanted to post here first before I start digging into the UDF code to
>>> see if anyone has attempted this already or if people have thoughts on it.
>>> I couldn't find anything in the Jira. I'd be especially appreciative of any
>>> pointers towards relevant sections of the code to get started!
>>>
>>> My rough plan is to do something similar to how python UDFs work (as I
>>> understand them). I.e. call out to a javascript process, potentially just
>>> something in GraalJs for example: https://github.com/oracle/graaljs.
>>>
>>> I understand that there's probably a long discussion to be had here with
>>> regards to making this part of Spark core, but I wanted to start that
>>> discussion. :)
>>>
>>> Best,
>>> Matt
>>>
>>>


Re: Javascript Based UDFs

2022-06-27 Thread Matt Hawes
Thanks for the reply! I had originally thought that this would incur a cost
of spinning up a VM every time the UDF is called but thinking about it
again you might be right. I guess if I make the VM accessible via a
transient property on the UDF class then it would only be initialized once
per executor right? Or would it be once per task?

I also was worried that this would mean you end up paying a lot in SerDe
cost if you send each row over to the VM one by one?

On Mon, Jun 27, 2022 at 10:02 PM Sean Owen  wrote:

> Rather than reimplement a new UDF, why not indeed just use an embedded
> interpreter? if something can turn javascript into something executable you
> can wrap that in a normal Java/Scala UDF and go.
>
> On Mon, Jun 27, 2022 at 10:42 PM Matt Hawes  wrote:
>
>> Hi all, I'm thinking about trying to implement the ability to write spark
>> UDFs using javascript.
>>
>> For the use case I have in mind, a lot of the code is already written in
>> javascript and so it would be very convenient to be able to call this
>> directly from spark.
>>
>> I wanted to post here first before I start digging into the UDF code to
>> see if anyone has attempted this already or if people have thoughts on it.
>> I couldn't find anything in the Jira. I'd be especially appreciative of any
>> pointers towards relevant sections of the code to get started!
>>
>> My rough plan is to do something similar to how python UDFs work (as I
>> understand them). I.e. call out to a javascript process, potentially just
>> something in GraalJs for example: https://github.com/oracle/graaljs.
>>
>> I understand that there's probably a long discussion to be had here with
>> regards to making this part of Spark core, but I wanted to start that
>> discussion. :)
>>
>> Best,
>> Matt
>>
>>


Re: Javascript Based UDFs

2022-06-27 Thread Sean Owen
Rather than reimplement a new UDF, why not indeed just use an embedded
interpreter? if something can turn javascript into something executable you
can wrap that in a normal Java/Scala UDF and go.

On Mon, Jun 27, 2022 at 10:42 PM Matt Hawes  wrote:

> Hi all, I'm thinking about trying to implement the ability to write spark
> UDFs using javascript.
>
> For the use case I have in mind, a lot of the code is already written in
> javascript and so it would be very convenient to be able to call this
> directly from spark.
>
> I wanted to post here first before I start digging into the UDF code to
> see if anyone has attempted this already or if people have thoughts on it.
> I couldn't find anything in the Jira. I'd be especially appreciative of any
> pointers towards relevant sections of the code to get started!
>
> My rough plan is to do something similar to how python UDFs work (as I
> understand them). I.e. call out to a javascript process, potentially just
> something in GraalJs for example: https://github.com/oracle/graaljs.
>
> I understand that there's probably a long discussion to be had here with
> regards to making this part of Spark core, but I wanted to start that
> discussion. :)
>
> Best,
> Matt
>
>


Javascript Based UDFs

2022-06-27 Thread Matt Hawes
Hi all, I'm thinking about trying to implement the ability to write spark
UDFs using javascript.

For the use case I have in mind, a lot of the code is already written in
javascript and so it would be very convenient to be able to call this
directly from spark.

I wanted to post here first before I start digging into the UDF code to see
if anyone has attempted this already or if people have thoughts on it. I
couldn't find anything in the Jira. I'd be especially appreciative of any
pointers towards relevant sections of the code to get started!

My rough plan is to do something similar to how python UDFs work (as I
understand them). I.e. call out to a javascript process, potentially just
something in GraalJs for example: https://github.com/oracle/graaljs.

I understand that there's probably a long discussion to be had here with
regards to making this part of Spark core, but I wanted to start that
discussion. :)

Best,
Matt


Docker images for Spark 3.3.0 release are now available

2022-06-27 Thread Gengliang Wang
Hi all,

The official Docker images for Spark 3.3.0 release are now available!

   - To run Spark with Scala/Java API only:
   https://hub.docker.com/r/apache/spark
   - To run Python on Spark: https://hub.docker.com/r/apache/spark-py
   - To run R on Spark: https://hub.docker.com/r/apache/spark-r


Gengliang


Observed consistent test failure in master (ParquetIOSuite)

2022-06-27 Thread Jungtaek Lim
Hi,

I just observed the test failure in ParquetIOSuite which I can consistently
reproduce with IntelliJ. Haven't had a chance to run a test with maven/sbt.

I filed SPARK-39622  for
this failure.

It'd be awesome if someone having context looks into this sooner.

Thanks!
Jungtaek Lim (HeartSaVioR)


[FINAL CALL] - Travel Assistance to ApacheCon New Orleans 2022

2022-06-27 Thread Gavin McDonald
 To all committers and non-committers.

This is a final call to apply for travel/hotel assistance to get to and
stay in New Orleans
for ApacheCon 2022.

Applications have been extended by one week and so the application deadline
is now the 8th July 2022.

The rest of this email is a copy of what has been sent out previously.

We will be supporting ApacheCon North America in New Orleans, Louisiana,
on October 3rd through 6th, 2022.

TAC exists to help those that would like to attend ApacheCon events, but
are unable to do so for financial reasons. This year, We are supporting
both committers and non-committers involved with projects at the
Apache Software Foundation, or open source projects in general.

For more info on this year's applications and qualifying criteria, please
visit the TAC website at http://www.apache.org/travel/
Applications have been extended until the 8th of July 2022.

Important: Applicants have until the closing date above to submit their
applications (which should contain as much supporting material as required
to efficiently and accurately process their request), this will enable TAC
to announce successful awards shortly afterwards.

As usual, TAC expects to deal with a range of applications from a diverse
range of backgrounds. We therefore encourage (as always) anyone thinking
about sending in an application to do so ASAP.

Why should you attend as a TAC recipient? We encourage you to read stories
from
past recipients at https://apache.org/travel/stories/ . Also note that
previous TAC recipients have gone on to become Committers, PMC Members, ASF
Members, Directors of the ASF Board and Infrastructure Staff members.
Others have gone from Committer to full time Open Source Developers!

How far can you go! - Let TAC help get you there.


===

Gavin McDonald on behalf of the Travel Assistance Committee.