[GitHub] [arrow] westonpace commented on issue #35508: [C++][Python] Adding data to tdigest in pyarrow

via GitHub Tue, 09 May 2023 14:47:26 -0700


westonpace commented on issue #35508:
URL: https://github.com/apache/arrow/issues/35508#issuecomment-1540932823


   > Is it possible to alter the scale function used in the implementation? I'm 
not sure which scale function you have implemented but it would be nice to have 
some control over this!
   
   I don't know enough about tdigest to answer this.  Here are the current 
options:
   
   ```
   Help on function tdigest in module pyarrow.compute:
   
   tdigest(array, /, q=0.5, *, delta=100, buffer_size=500, skip_nulls=True, 
min_count=0, options=None, memory_pool=None)
       Approximate quantiles of a numeric array with T-Digest algorithm.
       
       By default, 0.5 quantile (median) is returned.
       Nulls and NaNs are ignored.
       An array of nulls is returned if there is no valid data point.
       
       Parameters
       ----------
       array : Array-like
           Argument to compute function.
       q : double or sequence of double, default 0.5
           Quantiles to approximate. All values must be in [0, 1].
       delta : int, default 100
           Compression parameter for the T-digest algorithm.
       buffer_size : int, default 500
           Buffer size for the T-digest algorithm.
       skip_nulls : bool, default True
           Whether to skip (ignore) nulls in the input.
           If False, any null in the input forces the output to null.
       min_count : int, default 0
           Minimum number of non-null values in the input.  If the number
           of non-null values is below `min_count`, the output is null.
       options : pyarrow.compute.TDigestOptions, optional
           Alternative way of passing options.
       memory_pool : pyarrow.MemoryPool, optional
           If not passed, will allocate memory from the default memory pool.
   ```
   
   If that isn't enough then you could probably open a separate issue to 
request a configurable scale.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #35508: [C++][Python] Adding data to tdigest in pyarrow

Reply via email to