On Sep 25, 7:52 pm, Paul Rubin <http://[EMAIL PROTECTED]> wrote:
> "exhuma.twn" <[EMAIL PROTECTED]> writes:
> > Is it possible to calculate a distance between two chunks of text? I
> > suppose one could simply do a simple word-count on the chunks
> > (removing common noise words of course). And then go from there. Maybe
> > even assigning different weighting to words. But maybe there is a well-
> > tested and useful algorithm already available?
>
> There's a huge field of text mining that attempts to do things like
> this.  http://en.wikipedia.org/wiki/Latent_semantic_analysisfor some
> info about one approach.  Manning & Schutz's book "Foundations of Statistical
> Natural Language Processing" (http://nlp.stanford.edu/fsnlp/) is
> a standard reference about text processing.  They also have a
> new one about information retrieval (downloadable as a pdf) that
> looks very good: <http://informationretrieval.org>.

Thanks a lot. This gives me some bed-time reading.

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to