Dataframe random permutation?

2015-06-01 Thread Cesar Flores
I would like to know what will be the best approach to randomly permute a Data Frame. I have tried: df.sample(false,1.0,x).show(100) where x is the seed. However, it gives the same result no matter the value of x (it only gives different values when the fraction is smaller than 1.0) . I have

Re: Dataframe random permutation?

2015-06-01 Thread Peter Rudenko
Hi Cesar, try to do: hc.createDataFrame(df.rdd.coalesce(NUM_PARTITIONS, shuffle =true),df.schema) It's a bit inefficient, but should shuffle the whole dataframe. Thanks, Peter Rudenko On 2015-06-01 22:49, Cesar Flores wrote: I would like to know what will be the best approach to randomly