+ Liang-Chi and Herman, I think this is a common requirement to get top N records. For now we guarantee it by the `TakeOrderedAndProject` operator. However, this operator may not be used if the spark.sql.execution.topKSortFallbackThreshold config has a small value.
Shall we reconsider https://github.com/apache/spark/commit/5c27b0d4f8d378bd7889d26fb358f478479b9996 ? Or we don't expect users to set a small value for spark.sql.execution.topKSortFallbackThreshold? On Wed, Sep 5, 2018 at 11:24 AM Chetan Khatri <chetan.opensou...@gmail.com> wrote: > Thanks > > On Wed 5 Sep, 2018, 2:15 AM Russell Spitzer, <russell.spit...@gmail.com> > wrote: > >> RDD: Top >> >> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD@top(num:Int)(implicitord:Ordering[T]):Array[T >> ] >> Which is pretty much what Sean suggested >> >> For Dataframes I think doing a order and limit would be equivalent after >> optimizations. >> >> On Tue, Sep 4, 2018 at 2:28 PM Sean Owen <sro...@gmail.com> wrote: >> >>> Sort and take head(n)? >>> >>> On Tue, Sep 4, 2018 at 12:07 PM Chetan Khatri < >>> chetan.opensou...@gmail.com> wrote: >>> >>>> Dear Spark dev, anything equivalent in spark ? >>>> >>>