Erm, just
https://spark.apache.org/docs/2.3.0/api/sql/index.html#approx_percentile ?
On Tue, Apr 27, 2021 at 3:52 AM Ivan Petrov wrote:
> Hi, I have billions, potentially dozens of billions of observations. Each
> observation is a decimal number.
> I need to calculate percentiles 1, 25, 50, 75,
Hi, I have billions, potentially dozens of billions of observations. Each
observation is a decimal number.
I need to calculate percentiles 1, 25, 50, 75, 95 for these observations
using Scala Spark. I can use both RDD and Dataset API. Whatever would work
better.
What I can do in terms of perf