Re: [Questio]Which record does Hive give the bigger number when I use row_number

2017-10-11 Thread Furcy Pin
Hello,

Either one can receive the bigger row_num, in an underteministic fashion
(which is NOT equivalent to random).
Simply put, it will be whichever is treated last by Hive, which you have no
way to know.

If your two rows differ on other columns, you might want to add them to
your ORDER BY clause to ensure consistency.
If you do want to have them randomly shuffled, you can simply use "ORDER BY
cost, rand()"

Finally, there are other variants to row_number that behave slightly
differently, check out this link:
https://blog.jooq.org/2014/08/12/the-difference-between-row_number-rank-and-dense_rank/






On Wed, Oct 11, 2017 at 4:33 PM, 孙志禹  wrote:

> Dear all,
>Thanks since it's the first time for me to have a honor to ask
> questions here.
> I used the hql script below:
> -- -
> select
> user_id
> , cost_date  -- datetime
> , cost  -- int
> , row_number over( partition by user_id order by cost  )
> as row_num
> from table_A
> -- -.
> * The question is,* if for a special *user_id*( e.g. *user_id *=
> '1'),  there are two records with the same *cost *in the table, and I
> know by using the function *row_number *Hive will give  different
> *row_nums *for both records, so which one will get the bigger *row_num*?
> Thanks! And it's also okay to me if you give me a web-link which can
> give the answer.
> 
> Anci Sun from China
>


[Questio]Which record does Hive give the bigger number when I use row_number

2017-10-11 Thread 孙志禹
Dear all,   Thanks since it's the first time for me to have a honor to ask 
questions here.    I used the hql script below:
    -- -            select                 user_id          
      , cost_date  -- datetime                , cost  -- int                , 
row_number over( partition by user_id order by cost  ) as row_num            
from table_A    -- -.     The question is, if for a special 
user_id( e.g. user_id = '1'),  there are two records with the same cost in 
the table, and I know by using the function row_number Hive will give  
different row_nums for both records, so which one will get the bigger row_num?  
  Thanks! And it's also okay to me if you give me a web-link which can give the 
answer.    Anci Sun from China