This is sample variance, not population (i.e. divide by n-1, not n). I
think that's justified as the data are notionally a sample from a
population.

On Thu, Sep 29, 2022 at 9:21 PM 姜鑫 <jiangxin...@gmail.com> wrote:

> Hi folks,
>
> Has anyone used VarianceThresholdSelector refer to
> https://spark.apache.org/docs/latest/ml-features.html#variancethresholdselector
>  ?
> In the doc, an example is gaven and says `The variance for the 6 features
> are 16.67, 0.67, 8.17, 10.17, 5.07, and 11.47 respectively`, but after
> calculating I found that the variance should be 13.89, 0.56, 6.81, 8.47,
> 4.22, 9.56, and there should be only 3 columns selected. Is there something
> wrong with me or this is a bug?
>
>
> Regards,
> Xin
>

Reply via email to