I guess you should use some text mining tools. you can use googl find them.
I remember UIUC recently releases one tool. It is very good.

On 3/21/06, Valerio Schiavoni <[EMAIL PROTECTED]> wrote:
>
> Hello,
> not sure if the term 'cluster' is the correct one, but here what i would
> like to do:
> given I have a small set of categories; i manually defined some keywords
> for
> each category.
> ie:
>
> -spielberg: ET, munich, indiana jones;
> -sport: football, basket, volley, etc etc;
>
> then, i have a quite large archive of documents (html, pdf, doc) (~5000,
> still growing) and I want to 'assign' each document
> to those categories, using Lucene possibly (if it can help!).
>
> what approach could I adopt ?
>
> thanks,
> valerio
>
> --
> To Iterate is Human, to Recurse, Divine
> James O. Coplien, Bell Labs
> (how good is to be human indeed)
>
>

Reply via email to