Re: [I] C++/Rust UDF implementation [texera]

via GitHub Mon, 25 May 2026 20:28:43 -0700


carloea2 commented on issue #5162:
URL: https://github.com/apache/texera/issues/5162#issuecomment-4539728462


   Good question. In the closed prototype PR #5078, I did not use Arrow IPC 
yet. The MVP used a simple typed row-frame protocol over stdin/stdout:
   
   - The JVM executor selected the input columns, then wrote the column count, 
base64-encoded column names, type tags, and row data.
   - Each field was encoded as `<type-tag>:<is-null>:<payload>`; strings/binary 
used base64, while numeric/boolean/timestamp values used textual payloads.
   - The native side returned typed output rows on stdout, and the JVM 
parsed/enforced them against the declared output schema.
   
   The prototype supported three execution APIs:
   
   - `process_tuple`: one-row frames, mainly for simple or low-latency cases.
   - `process_batch`: the default path, accumulating a configurable batch size 
before sending a frame.
   - `process_table`: collect input and send once on finish for whole-table 
algorithms.
   
   One thing I would change from #5078 is the lifecycle. The hackathon 
prototype was intentionally simple and started the compiled executable per 
flush. For the real sidecar design, I think the executor should compile/reuse 
the binary in `open()`, start one persistent native process, send the schema 
once, stream tuple/batch/table frames to it, and shut it down in `close()`. In 
that design, tuple mode does not mean spawning one process per tuple.
   
   So my preference for the first version is batch-basis by default, while 
still exposing tuple and table APIs. Arrow IPC is a good candidate for a later 
transport, especially for larger batches, binary-heavy data, or columnar data. 
I would keep the transport pluggable: start with a simple debuggable framed 
protocol like the prototype, then add Arrow IPC after the API/lifecycle is 
settled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] C++/Rust UDF implementation [texera]

Reply via email to