[GitHub] [arrow] westonpace commented on issue #15103: Weighted stat aggregations in arrow-compute

2023-01-26 Thread via GitHub


westonpace commented on issue #15103:
URL: https://github.com/apache/arrow/issues/15103#issuecomment-1406042300

   > I am also curious how the development on Apache Arrow and [Apache Arrow 
Datafusion](https://arrow.apache.org/datafusion/) (or [python 
bindings](https://github.com/apache/arrow-datafusion-python) work together.
   
   Pyarrow, datasets, and the streaming execution engine (Acero) are all based 
on arrow-C++.  Datafusion is based on arrow-rs (rust).  So they are two 
separate stacks.
   
   That being said, they should be very interoperable.  Both can import and 
export to the C data interface for intra-process communication.  Both can read 
and write the arrow IPC format for IPC over something like flight for 
inter-process communication.
   
   > Do the projects work closely to re-use/align overlapping functionality?
   
   Unfortunately, a universal UDF harness has not yet been created.  In 
general, datafusion, based on rust, is not likely to trust a C/C++ UDFs for 
safety reasons (at least, that is my understanding, I could be wrong).  Going 
the other way (using datafusion UDFs in pyarrow) should be possible but no one 
has done it yet.  Some very interesting ideas have been proposed for other 
alternatives (there was a discussion a while back on the mailing list).  
Perhaps the most interesting to me is the idea of using WASM for UDFs.  LLVM 
has also been proposed though it doesn't fix the safety concerns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] westonpace commented on issue #15103: Weighted stat aggregations in arrow-compute

2023-01-20 Thread via GitHub


westonpace commented on issue #15103:
URL: https://github.com/apache/arrow/issues/15103#issuecomment-1399100195

   There is a proposal to support aggregate UDFs but I don't know that it is 
high priority for anyone.  I agree this sounds like a nice feature.  Would you 
be interested in creating a PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org