westonpace commented on issue #35508: URL: https://github.com/apache/arrow/issues/35508#issuecomment-1540932823
> Is it possible to alter the scale function used in the implementation? I'm not sure which scale function you have implemented but it would be nice to have some control over this! I don't know enough about tdigest to answer this. Here are the current options: ``` Help on function tdigest in module pyarrow.compute: tdigest(array, /, q=0.5, *, delta=100, buffer_size=500, skip_nulls=True, min_count=0, options=None, memory_pool=None) Approximate quantiles of a numeric array with T-Digest algorithm. By default, 0.5 quantile (median) is returned. Nulls and NaNs are ignored. An array of nulls is returned if there is no valid data point. Parameters ---------- array : Array-like Argument to compute function. q : double or sequence of double, default 0.5 Quantiles to approximate. All values must be in [0, 1]. delta : int, default 100 Compression parameter for the T-digest algorithm. buffer_size : int, default 500 Buffer size for the T-digest algorithm. skip_nulls : bool, default True Whether to skip (ignore) nulls in the input. If False, any null in the input forces the output to null. min_count : int, default 0 Minimum number of non-null values in the input. If the number of non-null values is below `min_count`, the output is null. options : pyarrow.compute.TDigestOptions, optional Alternative way of passing options. memory_pool : pyarrow.MemoryPool, optional If not passed, will allocate memory from the default memory pool. ``` If that isn't enough then you could probably open a separate issue to request a configurable scale. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org