kinow commented on issue #35508: URL: https://github.com/apache/arrow/issues/35508#issuecomment-1542123031
So from what I could understand, in the `pyarrow` code; ```py from pyarrow._compute import function_registry tdigest' in reg.list_functions() # True ``` And here, where [`_make_global_functions`](https://github.com/apache/arrow/blob/ec29c6ffc3cb1af4db4903d9877b2f0b548a3ad9/python/pyarrow/compute.py#L302C5-L331) is automatically called, the functions are exposed in the Python module. That's how we have `pyarrow.compute.tdigest` (somewhere else `tdigest` must be added to the registry of functions). What's actually added is [a wrapper to the actual arrow tdigest function](https://github.com/apache/arrow/blob/ec29c6ffc3cb1af4db4903d9877b2f0b548a3ad9/python/pyarrow/compute.py#L121-L127). ```bash >>> pac.tdigest.__arrow_compute_function__ {'name': 'tdigest', 'arity': 1, 'options_class': 'TDigestOptions', 'options_required': False} ``` I couldn't find a way to access any instance of TDigest. The namespaces of the C++ classes are within `arrow.internal`. So I guess the Arrow C++ implementation was intentionally hidden from the public, intended to be used internally, and exposed via aggregation functions or via a call in Python that returns a panda array-like object. Does that sound correct @westonpace ? If that's the case, I guess our best option would be to try to create a bind for the Arrow C++ code ourselves, or use another implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org