This strike me a little bit as an XY problem: 
http://people.apache.org/~hossman/#xyproblem

Perhaps it would be helpful if you could back up a little and describe the 
higher level problem you are trying to solve.  You certainly can split up your 
documents and then cluster them, but I'm not sure that is actually going to 
give you what you need.

Cheers,
Grant

On Apr 30, 2010, at 5:29 AM, Bogdan Vatkov wrote:

> Hi,
> 
> I would like to run some clustering for a single document but then I want
> that multiple clusters are extracted.
> I guess I have to find a way to split the doc into multiple docs / input
> vectors but I am wondering if there are any best practices on how to do the
> split then
> Should I derive vectors based on sentences or paragraphs? Is there a
> paragraph boundary detection tool around?
> Any recommendations will be appreciated.
> 
> Best regards,
> Bogdan


Reply via email to