Thank you for all your responses! In the LSA what is equivalent, I think, is - to apply a L2 normalization (not the StandardScaler) after the LSA and then compute the cosine similarity between document vectors simply as a dot product. - not apply the L2 normalization and call the `cosine_similarity` function instead.
I have applied this normalization to the previous example, and it produces indeed equivalent results (i.e. does not solve the problem). Opening an issue on this for further discussion https://github.com/scikit-learn/scikit-learn/issues/7283 Thanks for your feedback! -- Roman On 28/08/16 18:20, Andy wrote: > If you do "with_mean=False" it should be the same, right? > > On 08/27/2016 12:20 PM, Olivier Grisel wrote: >> I am not sure this is exactly the same because we do not center the >> data in the TruncatedSVD case (as opposed to the real PCA case where >> whitening is the same as calling StandardScaler). >> >> Having an option to normalize the transformed data by sigma seems like >> a good idea but we should probably not call that whitening. >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn