I guess you should use some text mining tools. you can use googl find them. I remember UIUC recently releases one tool. It is very good.
On 3/21/06, Valerio Schiavoni <[EMAIL PROTECTED]> wrote: > > Hello, > not sure if the term 'cluster' is the correct one, but here what i would > like to do: > given I have a small set of categories; i manually defined some keywords > for > each category. > ie: > > -spielberg: ET, munich, indiana jones; > -sport: football, basket, volley, etc etc; > > then, i have a quite large archive of documents (html, pdf, doc) (~5000, > still growing) and I want to 'assign' each document > to those categories, using Lucene possibly (if it can help!). > > what approach could I adopt ? > > thanks, > valerio > > -- > To Iterate is Human, to Recurse, Divine > James O. Coplien, Bell Labs > (how good is to be human indeed) > >