I'll note that PyArrow also allows defining user-defined functions and they are vectorized (the function arguments can be PyArrow arrays or scalars, depending on the context in which a function is being executed):
https://arrow.apache.org/docs/python/compute.html#user-defined-functions

My vote would be for a separate repository.

Regards

Antoine.


Le 28/06/2024 à 16:06, Andrew Lamb a écrit :
Hi Xuanwo,

Sorry for the delay in responding. I think  the ability to easily write
functions that "feel" like native functions in whatever language and be
able to generate arrow / vectorized versions of them is quite valuable.
This is my understanding of what this proposal is about.

I left some additional comments on the markdown.

One thing that might be worth doing is articulate some other potential
locations for where the code might go. One option, as I think you propose,
is to make its own repository.  Another option could be to donate the code
and put the various language bindings in the same repo as the arrow
language implementations (e.g arrow-rs, arrow for python, etc) which would
likely make it easier to maintain and discover.

I am curious about what other devs / users feel about this?

Andrew



On Thu, Jun 20, 2024 at 3:04 AM Xuanwo <xua...@apache.org> wrote:

Hello, everyone.

I start this thread to disscuss the donation of a User-Defined Function
Framework for Apache Arrow.

Feel free to review and leave your comments here. For live review, please
visit:

https://hackmd.io/@xuanwo/apache-arrow-udf

The original content also pasted here for a quick reading:

------

## Abstract

Arrow UDF is a User-Defined Function Framework for Apache Arrow.

## Proposal

Arrow UDF allows user to easily create and run user-defined functions
(UDF) in Rust, Python, Java or JavaScript based on Apache Arrow. The
functions can be executed natively, or in WebAssembly, or in a remote
server via Arrow Flight.

Arrow UDF was originally designed to be used by the RisingWave project but
is now being used by Databend and several database startups.

We believe that the Arrow UDF project will provide diversity value to the
entire Arrow community.

## Background

Arrow UDF is being developed by an open-source community from day one and
is owned by RisingWaveLabs. The project has been launched in December 2023.

## Initial Goals

By transferring ownership of the project to the Apache Arrow, Arrow UDF
expects to ensure its neutrality and further encourage and facilitate the
adoption of Arrow UDF by the community.

## Current Status

Contributors: 5

Users:

-   [RisingWave]: A Distributed SQL Database for Stream Processing.
-   [Databend]: An open-source cloud data warehouse that serves as a
cost-effective alternative to Snowflake.

## Documentation

The document of Arrow UDF is hosted at
https://docs.rs/arrow-udf/latest/arrow_udf/.

## Initial Source

The project currently holds a GitHub repository and multiple packages:

- https://github.com/risingwavelabs/arrow-udf

Rust:

- https://crates.io/arrow-udf/
- https://crates.io/arrow-udf-python/
- https://crates.io/arrow-udf-js/
- https://crates.io/arrow-udf-js-deno/
- https://crates.io/arrow-udf-wasm/

Python:

- https://pypi.org/project/arrow-udf/

Those packge will retain its name, while the repository will be moved to
apache org.

## Required Resources

### Mailing Lists

We can reuse the existing mailing lists that arrow have.

### Git Repositories

From

- https://github.com/risingwavelabs/arrow-udf

To

- https://gitbox.apache.org/asf/repos/arrow-udf
- https://github.com/apache/arrow-udf

### Issue Tracking

The project would like to continue using GitHub Issues.

### Other Resources

The project has already chosen GitHub actions as continuous integration
tools.

## Initial Committers

- Runji Wang wangrunji0...@163.com
- Giovanny Gutiérrez
- sundy-li sund...@apache.org
- Xuanwo xua...@apache.org
- Max Justus Spransy maxjus...@gmail.com

[RisingWave]: https://github.com/risingwavelabs/risingwave
[Databend]: https://github.com/datafuselabs/databend

Xuanwo


Reply via email to