Within the realm of ANSI SQL there is ORDER BY but no SORT BY.

ORDERR BY sorts the result set in ascending or descending order.  In SQL
sorting is the term and ORDER BY is part of the syntax.

In map-reduce pragma for example in Hive QL, SORT BY sorts data per
reducer. As I understand the difference between ORDER BY and SORT BY is
that ORDER BY guarantees total order in the output while SORT BY only
guarantees ordering of the rows within a reducer. If there are more than
one reducer, SORT BY may give partially ordered final results.

hive> select prod_id, quantity_sold, amount_sold from sales *sort by*
quantity_sold asc, amount_sold desc

compared to

hive> select prod_id, quantity_sold, amount_sold from sales *order by*
quantity_sold asc, amount_sold desc


Personally, I stick to order by

HTH


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 29 July 2016 at 17:49, Daniel Santana <dan...@everymundo.com> wrote:

> As far as I know *sort* is just an alias of *orderBy* (or vice-versa)
>
> And your last operation is taking longer because you are sorting it twice.
>
> --
> *Daniel Santana*
> Senior Software Engineer
>
> EVERY*MUNDO*
> 25 SE 2nd Ave., Suite 900
> Miami, FL 33131 USA
> main:+1 (305) 375-0045
> EveryMundo.com <http://www.everymundo.com/#whoweare>
>
> *Confidentiality Notice: *This email and any files transmitted with it
> are confidential and intended solely for the use of the individual or
> entity to whom they are addressed. If you have received this email in
> error, please notify the system manager.
>
> On Fri, Jul 29, 2016 at 12:20 PM, Ashok Kumar <
> ashok34...@yahoo.com.invalid> wrote:
>
>> Hi,
>>
>> In Spark programing I can use
>>
>> df.filter(col("transactiontype") ===
>> "DEB").groupBy("transactiondate").agg(sum("debitamount").cast("Float").as("Total
>> Debit Card")).orderBy("transactiondate").show(5)
>>
>> or
>>
>> df.filter(col("transactiontype") ===
>> "DEB").groupBy("transactiondate").agg(sum("debitamount").cast("Float").as("Total
>> Debit Card")).sort("transactiondate").show(5)
>>
>> i get the same results
>>
>> and i can use both as well
>>
>> df.ilter(col("transactiontype") ===
>> "DEB").groupBy("transactiondate").agg(sum("debitamount").cast("Float").as("Total
>> Debit Card")).orderBy("transactiondate").sort("transactiondate").show(5)
>>
>> but the last one takes more time.
>>
>> what is the use case for both these please. does it make sense to use
>> both?
>>
>> Thanks
>>
>
>

Reply via email to