I wonder if the DataFusion project might be a more natural home for this 
functionality? UDFs are more of a query engine concept, whereas arrow-rs is 
more focused on purely physical execution?

On 28 June 2024 19:41:39 BST, Runji Wang <wangrunji0...@163.com> wrote:
>Hi Felipe,
>
>Vectorization will be applied whenever possible. When all input and output 
>types of a function are primitive (int16, int32, int64, float32, float64) and 
>do not involve any Option or Result, the macro will automatically generate 
>code based on unary <https://docs.rs/arrow/latest/arrow/compute/fn.unary.html> 
>or binary <https://docs.rs/arrow/latest/arrow/compute/fn.binary.html> kernels, 
>which potentially allows for vectorization.
>
>Both examples you showed are not vectorized. The `div` function is due to the 
>Result output, while `gcd` is due to the loop in its implementation. However, 
>if the function is simple enough, like an `add` function:
>
>#[function("add(int, int) -> int")]
>fn add(a: i32, b: i32) -> i32 {
>    a + b
>}
>
>It can be auto-vectorized by llvm.
>
>Runji
>
>
>On 2024/06/28 17:13:16 Felipe Oliveira Carvalho wrote:
>> On Fri, Jun 28, 2024 at 11:07 AM Andrew Lamb <al...@influxdata.com> wrote:
>> >
>> > Hi Xuanwo,
>> >
>> > Sorry for the delay in responding. I think  the ability to easily write
>> > functions that "feel" like native functions in whatever language and be
>> > able to generate arrow / vectorized versions of them is quite valuable.
>> > This is my understanding of what this proposal is about.
>> 
>> My understanding is that it's not vectorized. From the examples in
>> risingwavelabs/arrow-udf, <https://github.com/risingwavelabs/arrow-udf> it
>> looks like the macros generate code that gathers values from columns into
>> local scalars that are passed as scalar parameters to user functions. Is
>> the hope here that rustc/llvm will auto-vectorize the code?
>> 
>> #[function("gcd(int, int) -> int")]
>> fn gcd(mut a: i32, mut b: i32) -> i32 {
>>     while b != 0 {
>>         (a, b) = (b, a % b);
>>     }
>>     a
>> }
>> 
>> #[function("div(int, int) -> int")]
>> fn div(x: i32, y: i32) -> Result<i32, &'static str> {
>>     if y == 0 {
>>         return Err("division by zero");
>>     }
>>     Ok(x / y)
>> }
>> 
>> > I left some additional comments on the markdown.
>> >
>> > One thing that might be worth doing is articulate some other potential
>> > locations for where the code might go. One option, as I think you propose,
>> > is to make its own repository.  Another option could be to donate the code
>> > and put the various language bindings in the same repo as the arrow
>> > language implementations (e.g arrow-rs, arrow for python, etc) which would
>> > likely make it easier to maintain and discover.
>> >
>> > I am curious about what other devs / users feel about this?
>> >
>> > Andrew
>> >
>> >
>> >
>> > On Thu, Jun 20, 2024 at 3:04 AM Xuanwo <xu...@apache.org> wrote:
>> >
>> > > Hello, everyone.
>> > >
>> > > I start this thread to disscuss the donation of a User-Defined Function
>> > > Framework for Apache Arrow.
>> > >
>> > > Feel free to review and leave your comments here. For live review,
>> please
>> > > visit:
>> > >
>> > > https://hackmd.io/@xuanwo/apache-arrow-udf
>> > >
>> > > The original content also pasted here for a quick reading:
>> > >
>> > > ------
>> > >
>> > > ## Abstract
>> > >
>> > > Arrow UDF is a User-Defined Function Framework for Apache Arrow.
>> > >
>> > > ## Proposal
>> > >
>> > > Arrow UDF allows user to easily create and run user-defined functions
>> > > (UDF) in Rust, Python, Java or JavaScript based on Apache Arrow. The
>> > > functions can be executed natively, or in WebAssembly, or in a remote
>> > > server via Arrow Flight.
>> > >
>> > > Arrow UDF was originally designed to be used by the RisingWave project
>> but
>> > > is now being used by Databend and several database startups.
>> > >
>> > > We believe that the Arrow UDF project will provide diversity value to
>> the
>> > > entire Arrow community.
>> > >
>> > > ## Background
>> > >
>> > > Arrow UDF is being developed by an open-source community from day one
>> and
>> > > is owned by RisingWaveLabs. The project has been launched in December
>> 2023.
>> > >
>> > > ## Initial Goals
>> > >
>> > > By transferring ownership of the project to the Apache Arrow, Arrow UDF
>> > > expects to ensure its neutrality and further encourage and facilitate
>> the
>> > > adoption of Arrow UDF by the community.
>> > >
>> > > ## Current Status
>> > >
>> > > Contributors: 5
>> > >
>> > > Users:
>> > >
>> > > -   [RisingWave]: A Distributed SQL Database for Stream Processing.
>> > > -   [Databend]: An open-source cloud data warehouse that serves as a
>> > > cost-effective alternative to Snowflake.
>> > >
>> > > ## Documentation
>> > >
>> > > The document of Arrow UDF is hosted at
>> > > https://docs.rs/arrow-udf/latest/arrow_udf/.
>> > >
>> > > ## Initial Source
>> > >
>> > > The project currently holds a GitHub repository and multiple packages:
>> > >
>> > > - https://github.com/risingwavelabs/arrow-udf
>> > >
>> > > Rust:
>> > >
>> > > - https://crates.io/arrow-udf/
>> > > - https://crates.io/arrow-udf-python/
>> > > - https://crates.io/arrow-udf-js/
>> > > - https://crates.io/arrow-udf-js-deno/
>> > > - https://crates.io/arrow-udf-wasm/
>> > >
>> > > Python:
>> > >
>> > > - https://pypi.org/project/arrow-udf/
>> > >
>> > > Those packge will retain its name, while the repository will be moved to
>> > > apache org.
>> > >
>> > > ## Required Resources
>> > >
>> > > ### Mailing Lists
>> > >
>> > > We can reuse the existing mailing lists that arrow have.
>> > >
>> > > ### Git Repositories
>> > >
>> > > From
>> > >
>> > > - https://github.com/risingwavelabs/arrow-udf
>> > >
>> > > To
>> > >
>> > > - https://gitbox.apache.org/asf/repos/arrow-udf
>> > > - https://github.com/apache/arrow-udf
>> > >
>> > > ### Issue Tracking
>> > >
>> > > The project would like to continue using GitHub Issues.
>> > >
>> > > ### Other Resources
>> > >
>> > > The project has already chosen GitHub actions as continuous integration
>> > > tools.
>> > >
>> > > ## Initial Committers
>> > >
>> > > - Runji Wang wangrunji0...@163.com
>> > > - Giovanny Gutiérrez
>> > > - sundy-li sund...@apache.org
>> > > - Xuanwo xua...@apache.org
>> > > - Max Justus Spransy maxjus...@gmail.com
>> > >
>> > > [RisingWave]: https://github.com/risingwavelabs/risingwave
>> > > [Databend]: https://github.com/datafuselabs/databend
>> > >
>> > > Xuanwo
>> > >
>> 

Reply via email to