Re: [scikit-learn] Latent Semantic Analysis (LSA) and TrucatedSVD

Roman Yurchak Mon, 29 Aug 2016 03:41:51 -0700

Thank you for all your responses!

In the LSA what is equivalent, I think, is
   - to apply a L2 normalization (not the StandardScaler) after the LSA
and then compute the cosine similarity between document vectors simply
as a dot product.
   - not apply the L2 normalization and call the `cosine_similarity`
function instead.


I have applied this normalization to the previous example, and it
produces indeed equivalent results (i.e. does not solve the problem).
Opening an issue on this for further discussion
   https://github.com/scikit-learn/scikit-learn/issues/7283

Thanks for your feedback!
-- 
Roman

On 28/08/16 18:20, Andy wrote:
> If you do "with_mean=False" it should be the same, right?
> 
> On 08/27/2016 12:20 PM, Olivier Grisel wrote:
>> I am not sure this is exactly the same because we do not center the
>> data in the TruncatedSVD case (as opposed to the real PCA case where
>> whitening is the same as calling StandardScaler).
>>
>> Having an option to normalize the transformed data by sigma seems like
>> a good idea but we should probably not call that whitening.
>>
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Latent Semantic Analysis (LSA) and TrucatedSVD

Reply via email to