Add 3. Clustering would benefit from a plain text version.

Yes Dawid, but it is already committed => the clustering now uses the plain
text version returned by the toString() method.

Dawid, I have a question about clustering.
Actually, the clustering uses the summaries as input. I assumes it would
provides some better results if it takes the whole documents content. no?
I assumes that clustering uses the summaries instead of documents content
for some performances purpose.
But there is a (bad) side effect : since the size of the summaries is
configurable, the clustering "quality" will vary depending on the summaries
size configuration. I really found this very confusing : when folks adjust
this parameter it is only for front-end consideration (they want to display
a long or a short summary), but certainly not for clustering reasons.

What you and others thinks about this?

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Reply via email to