kinow commented on issue #35508:
URL: https://github.com/apache/arrow/issues/35508#issuecomment-1542123031

   So from what I could understand, in the `pyarrow` code;
   
   ```py
   from pyarrow._compute import function_registry
   tdigest' in reg.list_functions() # True
   ```
   
   And here, where 
[`_make_global_functions`](https://github.com/apache/arrow/blob/ec29c6ffc3cb1af4db4903d9877b2f0b548a3ad9/python/pyarrow/compute.py#L302C5-L331)
 is automatically called, the functions are exposed in the Python module.
   
   That's how we have `pyarrow.compute.tdigest` (somewhere else `tdigest` must 
be added to the registry of functions).
   
   What's actually added is [a wrapper to the actual arrow tdigest 
function](https://github.com/apache/arrow/blob/ec29c6ffc3cb1af4db4903d9877b2f0b548a3ad9/python/pyarrow/compute.py#L121-L127).
   
   ```bash
   >>> pac.tdigest.__arrow_compute_function__
   {'name': 'tdigest', 'arity': 1, 'options_class': 'TDigestOptions', 
'options_required': False}
   ```
   
   I couldn't find a way to access any instance of TDigest. The namespaces of 
the C++ classes are within `arrow.internal`. So I guess the Arrow C++ 
implementation was intentionally hidden from the public, intended to be used 
internally, and exposed via aggregation functions or via a call in Python that 
returns a panda array-like object. Does that sound correct @westonpace ?
   
   If that's the case, I guess our best option would be to try to create a bind 
for the Arrow C++ code ourselves, or use another implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to