Re: How to calculate percentiles in Scala Spark 2.4.x

2021-04-27 Thread Sean Owen
Erm, just https://spark.apache.org/docs/2.3.0/api/sql/index.html#approx_percentile ? On Tue, Apr 27, 2021 at 3:52 AM Ivan Petrov wrote: > Hi, I have billions, potentially dozens of billions of observations. Each > observation is a decimal number. > I need to calculate percentiles 1, 25, 50, 75,

How to calculate percentiles in Scala Spark 2.4.x

2021-04-27 Thread Ivan Petrov
Hi, I have billions, potentially dozens of billions of observations. Each observation is a decimal number. I need to calculate percentiles 1, 25, 50, 75, 95 for these observations using Scala Spark. I can use both RDD and Dataset API. Whatever would work better. What I can do in terms of perf