I would like to know what will be the best approach to randomly permute a
Data Frame. I have tried:
df.sample(false,1.0,x).show(100)
where x is the seed. However, it gives the same result no matter the value
of x (it only gives different values when the fraction is smaller than 1.0)
. I have
Hi Cesar,
try to do:
hc.createDataFrame(df.rdd.coalesce(NUM_PARTITIONS, shuffle =true),df.schema)
It's a bit inefficient, but should shuffle the whole dataframe.
Thanks,
Peter Rudenko
On 2015-06-01 22:49, Cesar Flores wrote:
I would like to know what will be the best approach to randomly