Re: Spark ML VarianceThresholdSelector Unexpected Results

2022-10-01 Thread 姜鑫
Thank you so much for the reply. You are right and maybe it would be better if it is mentioned in docs because in some other ml libraries e.g. sklearn, it uses population variance. > 2022年9月30日 上午10:49,Sean Owen 写道: > > This is sample variance, not population (i.e. divide by n-1, not n). I

Re: Spark ML VarianceThresholdSelector Unexpected Results

2022-09-29 Thread Sean Owen
This is sample variance, not population (i.e. divide by n-1, not n). I think that's justified as the data are notionally a sample from a population. On Thu, Sep 29, 2022 at 9:21 PM 姜鑫 wrote: > Hi folks, > > Has anyone used VarianceThresholdSelector refer to >