Re: text clustering noob

Miles Osborne Wed, 04 Jun 2008 02:27:09 -0700

if you have:

--a set of snippets
--a set of articles


and for each snippet, you want to find the `matching` set of articles, then
you could:

--treat this as an IR task (a snippet becomes a query)

--treat this as co-clustering (eg http://citeseer.ist.psu.edu/447871.html)

nutch could do the first for you;  right now there is no support in mahout
that i know about for co-clustering

Miles

2008/6/4 Marcus Persson Lindqvist <[EMAIL PROTECTED]>:

> Hi list!
>
> I've been looking at mahout since the start and am very excited. However,
> I'm a ML-noob and need some introductory pointers before I can start play.
>
> What I want to do fairly simple: I have small set of text snippets which I
> now match a smaller set of articles, so that an article consists of one or
> more of the text snippets. So I need to group those snippets into articles.
> Preferably would I like to be able to detect "noise" as well (snippet has
> too little or dirty information and is not classified as an article.)
>
> I have access to large training sets of "complete" articles.
>
> Now, anyone got any tip on how to achieve this? Which of the algos
> discussed
> here would be sufficient?
>
> Any help much appreciated.
>
> /Marcus
>



-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.

Re: text clustering noob

Reply via email to