Re: Designing standards for "sandboxed" Arrow user-defined functions [was Re: User defined "Arrow Compute Function"]

David Li Mon, 25 Apr 2022 14:12:20 -0700

The WebAssembly documentation has a rundown of the techniques used: 
https://webassembly.org/docs/security/


I think usually you would run WASM in-process, though we could indeed also put 
it in a subprocess to further isolate things.

It would be interesting to define the Flight "harness" protocol. Handling 
heterogeneous arguments may require some evolution in Flight (e.g. if the 
function is non scalar and arguments are of different length - we'd need 
something like the ColumnBag proposal, so this might be a good reason to revive 
that).

On Mon, Apr 25, 2022, at 16:35, Antoine Pitrou wrote:
> Le 25/04/2022 à 22:19, Wes McKinney a écrit :
>> I was going to reply to this e-mail thread on user@ but thought I
>> would start a new thread on dev@.
>> 
>> Executing user-defined functions in memory, especially untrusted
>> functions, in general is unsafe. For "trusted" functions, having an
>> in-memory API for writing them in user languages is very useful. I
>> remember tinkering with adding UDFs in Impala with LLVM IR, which
>> would allow UDFs to have performance consistent with built-ins
>> (because built-in functions are all inlined into code-generated
>> expressions), but segfaults would bring down the server, so only
>> admins could be trusted to add new UDFs.
>> 
>> However, I wonder if we should eventually define an "external UDF"
>> protocol and an example UDF "harness", using Flight to do RPC across
>> the process boundaries. So the idea is that an external local UDF
>> Flight execution service is spun up, and then data is sent to the UDF
>> in a DoExchange call.
>> 
>> As Jacques pointed out in an interview 1], a compelling solution to
>> the UDF sandboxing problem is WASM. This allows "untrusted" WASM
>> functions to be run safely in-process.
>
> How does the sandboxing work in this case? Is it simply executing in a 
> separate process with restricted capabilities, or are other mechanisms 
> put in place?

Re: Designing standards for "sandboxed" Arrow user-defined functions [was Re: User defined "Arrow Compute Function"]

Reply via email to