hello,
I am a newbie to spark and trying to figure out how to get percentile against a
big data set. Actually, I googled this topic but not find any very useful code
example and explanation. Seems that I can use transformer SortBykey to get my
data set in order, but not pretty sure how can I get value of , for example,
percentile 66.
Should I use take() to pick up the value of percentile 66? I don't believe any
machine can load my data set in memory. I believe there must be more efficient
approaches.
Can anyone shed some light on this problem?
Regards