Hi, I would like to run some clustering for a single document but then I want that multiple clusters are extracted. I guess I have to find a way to split the doc into multiple docs / input vectors but I am wondering if there are any best practices on how to do the split then Should I derive vectors based on sentences or paragraphs? Is there a paragraph boundary detection tool around? Any recommendations will be appreciated.
Best regards, Bogdan
