Depending on your tolerance for error you could also use percentile_approx().
On Mon, Nov 11, 2019 at 10:14 AM Jerry Vinokurov <grapesmo...@gmail.com> wrote: > Do you mean that you are trying to compute the percent rank of some data? > You can use the SparkSQL percent_rank function for that, but I don't think > that's going to give you any improvement over calling the percentRank > function on the data frame. Are you currently using a user-defined function > for this task? Because I bet that's what's slowing you down. > > On Mon, Nov 11, 2019 at 9:46 AM Tzahi File <tzahi.f...@ironsrc.com> wrote: > >> Hi, >> >> Currently, I'm using hive huge cluster(m5.24xl * 40 workers) to run a >> percentile function. I'm trying to improve this job by moving it to run >> with spark SQL. >> >> Any suggestions on how to use a percentile function in Spark? >> >> >> Thanks, >> -- >> Tzahi File >> Data Engineer >> [image: ironSource] <http://www.ironsrc.com/> >> >> email tzahi.f...@ironsrc.com >> mobile +972-546864835 >> fax +972-77-5448273 >> ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv >> ironsrc.com <http://www.ironsrc.com/> >> [image: linkedin] <https://www.linkedin.com/company/ironsource>[image: >> twitter] <https://twitter.com/ironsource>[image: facebook] >> <https://www.facebook.com/ironSource>[image: googleplus] >> <https://plus.google.com/+ironsrc> >> This email (including any attachments) is for the sole use of the >> intended recipient and may contain confidential information which may be >> protected by legal privilege. If you are not the intended recipient, or the >> employee or agent responsible for delivering it to the intended recipient, >> you are hereby notified that any use, dissemination, distribution or >> copying of this communication and/or its content is strictly prohibited. If >> you are not the intended recipient, please immediately notify us by reply >> email or by telephone, delete this email and destroy any copies. Thank you. >> > > > -- > http://www.google.com/profiles/grapesmoker > -- *Patrick McCarthy * Senior Data Scientist, Machine Learning Engineering Dstillery 470 Park Ave South, 17th Floor, NYC 10016