This strike me a little bit as an XY problem: http://people.apache.org/~hossman/#xyproblem
Perhaps it would be helpful if you could back up a little and describe the higher level problem you are trying to solve. You certainly can split up your documents and then cluster them, but I'm not sure that is actually going to give you what you need. Cheers, Grant On Apr 30, 2010, at 5:29 AM, Bogdan Vatkov wrote: > Hi, > > I would like to run some clustering for a single document but then I want > that multiple clusters are extracted. > I guess I have to find a way to split the doc into multiple docs / input > vectors but I am wondering if there are any best practices on how to do the > split then > Should I derive vectors based on sentences or paragraphs? Is there a > paragraph boundary detection tool around? > Any recommendations will be appreciated. > > Best regards, > Bogdan
