Re: [Scikit-learn-general] Sparse Random Projection negative weights

Philipp Singer Fri, 08 Aug 2014 04:24:07 -0700

I always normalize X prior to the random projection as I have observed that 
this always produces more accurate results (same for LSA/SVD).


Have not tried to increase eps as this would lead to much less features and 
more error. I am also not sure how I should alter the density parameter. I feel 
safer to use it to the auto value which calculates it according to the Li et al 
paper. Could you recommend some value?

I think I will be more effective with LSA for now. Are there any specific 
recommendations for the number of components? Chose 300 for now.

Best,
Philipp

Am 08.08.2014 um 13:14 schrieb Arnaud Joly <[email protected]>:

> Have you tried to increase the number of components or epsilon parameter and 
> density of the SparseRandomProjection?
> Have you tried to normalise X prior the random projection?
> 
> Best regards,
> Arnaud
> 
> On 08 Aug 2014, at 12:19, Philipp Singer <[email protected]> wrote:
> 
>> Just another remark regarding this:
>> 
>> I guess I can not circumvent the negative cosine similarity values. Maybe 
>> LSA is a better approach? (TruncatedSVD)
>> 
>> Am 08.08.2014 um 10:35 schrieb Philipp Singer <[email protected]>:
>> 
>>> Hi,
>>> 
>>> I asked a question about the sparse random projection a few days ago, but 
>>> thought I should start a new topic regarding my current problem.
>>> 
>>> I am calculating TFIDF weights for my text documents and then calculate 
>>> cosine similarity between documents for determining the similarity between 
>>> documents. For dimensionality reduction I am using the Sparse Random 
>>> Projection class.
>>> 
>>> My current process looks like the following:
>>> 
>>> docs = [text1, text2,…]
>>> vec = TfidfVectorizer(max_df=0.8)
>>> X = vec.fit_transform(docs)
>>> proj = SparseRandomProjection()
>>> X2 = proj.fit_transform(X)
>>> X2 = normalize(X2) #for L2 normalization
>>> sim = X2 * X2.T
>>> 
>>> It works reasonable well. However, I found out that the sparse random 
>>> projection sets many weights to a negative value. Hence, also many 
>>> similarity scores end up being negative. Given the original intention of 
>>> tfidf weights (which should never be negative) and corresponding cosine 
>>> similarity scores (which then should always only range between zero and 
>>> one), I do not know whether this is an appropriate approach for my task.
>>> 
>>> Hope someone has some advice. Maybe I am also doing something wrong here.
>>> 
>>> Best,
>>> Philipp
>>> 
>> 
>> ------------------------------------------------------------------------------
>> Want fast and easy access to all the code in your enterprise? Index and
>> search up to 200,000 lines of code with a free copy of Black Duck
>> Code Sight - the same software that powers the world's largest code
>> search on Ohloh, the Black Duck Open Hub! Try it now.
>> http://p.sf.net/sfu/bds_______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> 
> ------------------------------------------------------------------------------
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck
> Code Sight - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds_______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Sparse Random Projection negative weights

Reply via email to