>>> stopwords inside) - maybe it is a question on its own - how can I easily go >>> back from clusters->original docs (an not just vectors), I do not know >>> maybe >>> some kind of mapper which maps vectors to the original documents somehow >>> (e.g. sort of URL for a document based on the vector id/index or >>> something?). >>> >> >> To do this, you should use the document ID and just return the original >> content from some other content store. Lucene or especially SOLR can help >> with this.
> Right, Mahout's vector can take labels. what do you mean by using the document ID and that vectors can take labels? is it something I could use right away from the current cluster vectors of should I change some Mahout code to get to the documents ID?
