First we'll want to choose whether we want to target Wasm, WASI or Wagi.
WASI adds a lot of simple things like access to a clock, random number
generator, ... that would expand the scope of what transpiled code can do.
It is debatable whether we'll want the power to run the transpiled code as
a microservice. Using UDFs for XLang and UDFs and UDAFs for SQL as our
expected use cases seem to make WASI the best choice. The issue is in the
details as there is a hodgepodge of what language runtimes support and what
are the limits of transpiling from a language to WebAssembly.

Assuming WASI then it breaks down to these two aspects:
1) Does the host language have a runtime?
Java: https://github.com/wasmerio/wasmer-java
Python: https://github.com/wasmerio/wasmer-python
Go: https://github.com/wasmerio/wasmer-go

2) How good is compilation from source language to WebAssembly
<https://github.com/appcypher/awesome-wasm-langs>?
Java (very limited):
Issues with garbage collection and the need to transpile/replace much of
the VM's capabilities plus the large standard library that everyone uses
causes a lot of challenges.
JWebAssembly can do simple things like basic classes, strings, method
calls. Should be able to compile trivial lambdas to Wasm. There are other
choices but to my knowledge all are very limited.

Python <https://pythondev.readthedocs.io/wasm.html> (quite good):
Features CPython Emscripten browser CPython Emscripten node Pyodide
subprocess (fork, exec) no no no
threads no YES WIP
file system no (only MEMFS) YES (Node raw FS) YES (IDB, Node, …)
shared extension modules WIP WIP YES
PyPI packages no no YES
sockets ? ? ?
urllib, asyncio no no WebAPI fetch / WebSocket
signals no WIP YES

Go (excellent): Native support in go compiler

On Tue, Jul 12, 2022 at 5:51 PM Chamikara Jayalath via dev <
dev@beam.apache.org> wrote:

>
>
> On Wed, Jun 29, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote:
>
>> I have had interest in integrating Wasm within Beam as well as I have had
>> a lot of interest in improving language portability.
>>
>> Wasm has a lot of benefits over using docker containers to provide a
>> place for code to execute. From experience implementing working on the
>> Beam's portability layer and internal Flume knowledge:
>> * encoding and decoding data is expensive, anything which ensures that
>> in-memory representations for data being transferred from the host to the
>> guest and back without transcoding/re-interpreting will be a big win.
>> * reducing the amount of times we need to pass data between guest and
>> host and back is important
>>   * fusing transforms reduces the number of data passing points
>>   * batching (row or columnar) data reduces the amount of times we need
>> to pass data at each data passing point
>> * there are enough complicated use cases (state & timers, large
>> iterables, side inputs) where handling the trivial map/flatmap usecase will
>> provide little value since it will prevent fusion
>>
>> I have been meaning to work on a prototype where we replace the current
>> gRPC + docker path with one in which we use Wasm to execute a fused graph
>> re-using large parts of the existing code base written to support
>> portability.
>>
>
> This sounds very interesting. Probably using Wasm to implement proper UDF
> support for x-lang (for example, executing Python timestamp/watermark
> functions provided through the Kafka Python x-lang wrapper on the Java
> Kafka transform) will be a good first target ? My main question for this at
> this point is whether Wasm has adequate support for existing SDKs that use
> x-lang to implement this in a useful way.
>
> Thanks,
> Cham
>
>
>>
>>
>> On Fri, Jun 17, 2022 at 2:19 PM Brian Hulette <bhule...@google.com>
>> wrote:
>>
>>> Re: Arrow - it's long been my dream to use Arrow for interchange in Beam
>>> [1]. I'm trying to move us in that direction with
>>> https://s.apache.org/batched-dofns (arrow is discussed briefly in the
>>> Future Work section). This gives the Python SDK a concept of batches of
>>> logical elements. My goal is Beam schemas + batches of logical elements ->
>>> Arrow RecordBatches.
>>>
>>> The Batched DoFn infrastructure is stable as of the 2.40.0 release cut
>>> and I'm currently working on adding what I'm calling a "BatchConverter" [2]
>>> for Beam Rows -> Arrow RecordBatch. Once that's done it could be
>>> interesting to experiment with a "WasmDoFn" that uses Arrow for interchange.
>>>
>>> Brian
>>>
>>> [1]
>>> https://docs.google.com/presentation/d/1D9vigwYTCuAuz_CO8nex3GK3h873acmQJE5Ui8TFsDY/edit#slide=id.g608e662464_0_160
>>> [2]
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/batch.py
>>>
>>>
>>> On Thu, Jun 16, 2022 at 10:55 AM Sean Jensen-Grey <jenseng...@google.com>
>>> wrote:
>>>
>>>> Interesting.
>>>>
>>>> Robert, I was just served an ad for Redpanda when I searched for
>>>> "golang wasm" :)
>>>>
>>>> The storage and execution grid systems are all embracing wasm in some
>>>> way.
>>>>
>>>> https://redpanda.com/
>>>> https://www.fluvio.io/
>>>> https://temporal.io/ (Cadence fork by the Cadence folks, I met Maxim
>>>> the lead at Temporal at the 2020 Wasm Summit)
>>>> https://github.com/pachyderm/pachyderm no mention of wasm, yet.
>>>>
>>>> Keep the Wasm+Beam demos coming.
>>>>
>>>> Sean
>>>>
>>>>
>>>>
>>>> On Thu, Jun 16, 2022 at 4:23 AM Steven van Rossum <
>>>> sjvanros...@google.com> wrote:
>>>>
>>>>> I caught up with all the replies through the web interface, but I
>>>>> didn't have my list subscription set up correctly so my reply (TL;DR 
>>>>> sample
>>>>> code available at https://github.com/sjvanrossum/beam-wasm) didn't
>>>>> come through until a bit later yesterday I think.
>>>>>
>>>>> Sean, I agree with your suggestion of Arrow as the interchange format
>>>>> for Wasm transforms and it's something I thought about exploring when I 
>>>>> was
>>>>> adding serialization/deserialization of complex (meaning anything that's
>>>>> not an integer or float in the context of Wasm) data types in the demo.
>>>>> It's an unfortunate bit of overhead which could very well be solved with
>>>>> Arrow and shared memory between Wasm modules.
>>>>> I've seen Wasm transforms pop up in a few other places, notably in
>>>>> streaming data platforms like Fluvio and Redpanda and they seem to incur
>>>>> the same overhead when moving data into and out of the guest context so
>>>>> maybe it's negligible, but I haven't done any serious benchmark yet to
>>>>> validate that.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Steve
>>>>>
>>>>> On Thu, Jun 16, 2022 at 3:04 AM Robert Burke <rob...@frantil.com>
>>>>> wrote:
>>>>>
>>>>>> Obligatory mention that WASM is basically an architecture that any
>>>>>> well meaning compiler can target, eg the Go compiler
>>>>>>
>>>>>>
>>>>>> https://www.bradcypert.com/an-introduction-to-targeting-web-assembly-with-golang/
>>>>>>
>>>>>> (Among many articles for the last few years)
>>>>>>
>>>>>> Robert Burke
>>>>>> Beam Go Busybody
>>>>>>
>>>>>> On Wed, Jun 15, 2022, 2:04 PM Sean Jensen-Grey <jenseng...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Heh, my stage fright was so strong, I didn't realize that the talk
>>>>>>> was recorded. :)
>>>>>>>
>>>>>>> Steven, I'd love to chat about Wasm in Beam. This email is a bit
>>>>>>> rough.
>>>>>>>
>>>>>>> I haven't explored Wasm in Beam much since that talk. I think the
>>>>>>> most compelling use is in the portability of logic between data 
>>>>>>> processing
>>>>>>> systems. Esp in the use of probabilistic data structures like Bloom
>>>>>>> Filters, Count-Min-Sketch, HyperLogLog, where it is nice to persist the
>>>>>>> data structure and use it on a different system. Like generating a bloom
>>>>>>> filter in Beam and using it inside of a BQ query w/o having to 
>>>>>>> reimplement
>>>>>>> and test across many platforms.
>>>>>>>
>>>>>>> I have used Wasm in BQ, as BQ UDFs are driven by V8. Anywhere V8
>>>>>>> exists, Wasm support exists for free unless the embedder goes out of 
>>>>>>> their
>>>>>>> way to disable it. So it is supported in Deno/Node as well. In Python, 
>>>>>>> Wasm
>>>>>>> support via Wasmtime <https://github.com/bytecodealliance/wasmtime>
>>>>>>> is really good.  There are *many* options for execution environments, 
>>>>>>> one
>>>>>>> of the downsides of passing through JS one is in string and number
>>>>>>> support(float/int64) issues, afaik. I could be wrong, maybe JS has fixed
>>>>>>> all this by now.
>>>>>>>
>>>>>>> The qualities in order of importance (for me) are
>>>>>>>
>>>>>>>    1. Portability, run the same code everywhere
>>>>>>>    2. Security, memory safety for the caller. Running Wasm inside
>>>>>>>    of Python should never crash your Python interpreter. The capability 
>>>>>>> model
>>>>>>>    ensures that the Wasm module can only do what you allow it to
>>>>>>>    3. Performance (portable), compile once and run everywhere
>>>>>>>    within some margin of native.  Python makes this look good :)
>>>>>>>
>>>>>>> I think something worth exploring is moving opaque-ish Arrow objects
>>>>>>> around via Beam, so that Beam is now mostly in the control plane and
>>>>>>> computation happens in Wasm, this should reduce the serialization 
>>>>>>> overhead
>>>>>>> and also get Python out of the datapath.
>>>>>>>
>>>>>>> I see someone exploring Wasm+Arrow here,
>>>>>>> https://github.com/domoritz/arrow-wasm
>>>>>>>
>>>>>>> Another possibly interesting avenue to explore is compiling command
>>>>>>> line programs to Wasi (WebAssembly System Interface), the POSIX like 
>>>>>>> shim,
>>>>>>> so that they can be run inprocess without the fork/exec/pipe overhead of
>>>>>>> running a subprocess. A neat demo might be running something like Jq
>>>>>>> <https://stedolan.github.io/jq/> inside of a Beam job.
>>>>>>>
>>>>>>> Not to make Wasm sound like a Python only technology, it can be used
>>>>>>> via Java/JVM via
>>>>>>>
>>>>>>>    - https://www.graalvm.org/22.1/reference-manual/wasm/
>>>>>>>    - https://github.com/kawamuray/wasmtime-java
>>>>>>>
>>>>>>> Sean
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 15, 2022 at 9:35 AM Pablo Estrada <pabl...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> adding Steven in case he didn't get the replies : )
>>>>>>>>
>>>>>>>> On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins <
>>>>>>>> dpcoll...@google.com> wrote:
>>>>>>>>
>>>>>>>>> If we ever do anything with the JS runtime, this would seem to be
>>>>>>>>> the best place to run WASM.
>>>>>>>>>
>>>>>>>>> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <bhule...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> FYI: @Sean Jensen-Grey <jenseng...@google.com> gave a talk back
>>>>>>>>>> in 2020 where he had integrated Rust with the Python SDK. I thought 
>>>>>>>>>> he used
>>>>>>>>>> WebAssembly for that, but it looks like he used some other 
>>>>>>>>>> approaches, and
>>>>>>>>>> his talk mentioned WebAssembly as future work. Not sure if that was 
>>>>>>>>>> ever
>>>>>>>>>> explored.
>>>>>>>>>>
>>>>>>>>>> https://www.youtube.com/watch?v=fZK_Tiu7q1o
>>>>>>>>>> https://github.com/seanjensengrey/beam-rust-python-java
>>>>>>>>>>
>>>>>>>>>> Brian
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested in
>>>>>>>>>>> the WebAssembly topic.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <
>>>>>>>>>>> pabl...@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Would you open a pull request for it? Or at least share a
>>>>>>>>>>>> branch? : )
>>>>>>>>>>>> Even if we don't want to merge it, it would be great to have a
>>>>>>>>>>>> PR as a way to showcase the work, its usefulness, and receive 
>>>>>>>>>>>> comments on
>>>>>>>>>>>> this thread once we can see something more specific.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>>>>>>>>>>> sjvanros...@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I had some spare time yesterday and thought it'd be fun to
>>>>>>>>>>>>> implement a transform which runs WebAssembly modules as a 
>>>>>>>>>>>>> lightweight way
>>>>>>>>>>>>> to implement cross language transforms for languages which don't 
>>>>>>>>>>>>> (yet) have
>>>>>>>>>>>>> a SDK implementation.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've got a small proof of concept running in the Python SDK as
>>>>>>>>>>>>> a DoFn with Wasmer as the WebAssembly runtime and simple support 
>>>>>>>>>>>>> for
>>>>>>>>>>>>> marshalling between the host and guest environment with the 
>>>>>>>>>>>>> RowCoder. The
>>>>>>>>>>>>> module I've constructed is mostly useless, but demonstrates the 
>>>>>>>>>>>>> host
>>>>>>>>>>>>> copying the encoded element into the guest's memory, the guest 
>>>>>>>>>>>>> copying
>>>>>>>>>>>>> those bytes elsewhere in its linear memory buffer, the guest 
>>>>>>>>>>>>> calling back
>>>>>>>>>>>>> to the host with the offset and size and the host copying and 
>>>>>>>>>>>>> decoding from
>>>>>>>>>>>>> the guest's memory.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any thoughts/interest? I'm not sure where I was going with
>>>>>>>>>>>>> this, since it was mostly just a "wouldn't it be cool if..." on a 
>>>>>>>>>>>>> Monday
>>>>>>>>>>>>> afternoon, but I can see a few use cases for this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Steve
>>>>>>>>>>>>>
>>>>>>>>>>>>> Steven van Rossum |  Strategic Cloud Engineer |
>>>>>>>>>>>>> sjvanros...@google.com |  (+31) (0)6 21174069
>>>>>>>>>>>>> <+31%206%2021174069>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Google Netherlands B.V.*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>>>>>>>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *If you received this communication by mistake, please don't
>>>>>>>>>>>>> forward it to anyone else (it may contain confidential or 
>>>>>>>>>>>>> privileged
>>>>>>>>>>>>> information), please erase all copies of it, including all 
>>>>>>>>>>>>> attachments, and
>>>>>>>>>>>>> please let the sender know it went to the wrong person. Thanks.*
>>>>>>>>>>>>>
>>>>>>>>>>>>> *The above terms reflect a potential business arrangement, are
>>>>>>>>>>>>> provided solely as a basis for further discussion, and are not 
>>>>>>>>>>>>> intended to
>>>>>>>>>>>>> be and do not constitute a legally binding obligation. No legally 
>>>>>>>>>>>>> binding
>>>>>>>>>>>>> obligations will be created, implied, or inferred until an 
>>>>>>>>>>>>> agreement in
>>>>>>>>>>>>> final form is executed in writing by all parties involved.*
>>>>>>>>>>>>>
>>>>>>>>>>>>

Reply via email to