I would like to know what will be the best approach to randomly permute a Data Frame. I have tried:
df.sample(false,1.0,x).show(100) where x is the seed. However, it gives the same result no matter the value of x (it only gives different values when the fraction is smaller than 1.0) . I have tried also: hc.createDataFrame(df.rdd.repartition(100),df.schema) which appears to be a random permutation. Can some one confirm me that the last line is in fact a random permutation, or point me out to a better approach? Thanks!!!! -- Cesar Flores