I read in a tutorial that I can download many wikipedia articles and use them
on LSA. If I use 500k random articles from wikipedia plus my 2 documents that I
want to find the similarity score, will I have results like TF-IDF? Or do I
have to use related documents with my 2 docs?
> From: [email protected]
> Date: Sun, 29 Sep 2013 12:42:50 +0200
> To: [email protected]
> Subject: Re: [Scikit-learn-general] LSA for documents similarity
>
> 2013/9/29 Tasos Ventouris <[email protected]>:
> > Thank you for your answer. I checked it with many documents. Both totaly
> > different and similar documents. You can see an example of the text I used
> > here https://dl.dropboxusercontent.com/u/37124455/documents.txt
> >
> > Another script I wrote with only tf-idf shows me 69% similarity on those
> > documents.
>
> Then I guess you should really try using more documents. LSA typically
> shines when the number of documents is on the order of 10k-1M.
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general