Within the realm of ANSI SQL there is ORDER BY but no SORT BY. ORDERR BY sorts the result set in ascending or descending order. In SQL sorting is the term and ORDER BY is part of the syntax.
In map-reduce pragma for example in Hive QL, SORT BY sorts data per reducer. As I understand the difference between ORDER BY and SORT BY is that ORDER BY guarantees total order in the output while SORT BY only guarantees ordering of the rows within a reducer. If there are more than one reducer, SORT BY may give partially ordered final results. hive> select prod_id, quantity_sold, amount_sold from sales *sort by* quantity_sold asc, amount_sold desc compared to hive> select prod_id, quantity_sold, amount_sold from sales *order by* quantity_sold asc, amount_sold desc Personally, I stick to order by HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 29 July 2016 at 17:49, Daniel Santana <dan...@everymundo.com> wrote: > As far as I know *sort* is just an alias of *orderBy* (or vice-versa) > > And your last operation is taking longer because you are sorting it twice. > > -- > *Daniel Santana* > Senior Software Engineer > > EVERY*MUNDO* > 25 SE 2nd Ave., Suite 900 > Miami, FL 33131 USA > main:+1 (305) 375-0045 > EveryMundo.com <http://www.everymundo.com/#whoweare> > > *Confidentiality Notice: *This email and any files transmitted with it > are confidential and intended solely for the use of the individual or > entity to whom they are addressed. If you have received this email in > error, please notify the system manager. > > On Fri, Jul 29, 2016 at 12:20 PM, Ashok Kumar < > ashok34...@yahoo.com.invalid> wrote: > >> Hi, >> >> In Spark programing I can use >> >> df.filter(col("transactiontype") === >> "DEB").groupBy("transactiondate").agg(sum("debitamount").cast("Float").as("Total >> Debit Card")).orderBy("transactiondate").show(5) >> >> or >> >> df.filter(col("transactiontype") === >> "DEB").groupBy("transactiondate").agg(sum("debitamount").cast("Float").as("Total >> Debit Card")).sort("transactiondate").show(5) >> >> i get the same results >> >> and i can use both as well >> >> df.ilter(col("transactiontype") === >> "DEB").groupBy("transactiondate").agg(sum("debitamount").cast("Float").as("Total >> Debit Card")).orderBy("transactiondate").sort("transactiondate").show(5) >> >> but the last one takes more time. >> >> what is the use case for both these please. does it make sense to use >> both? >> >> Thanks >> > >