I'd also add, the other thing that would be great is Tika integration for the 
DocumentVectorizer (which is seriously cool already!).  Thus, if I had a huge 
number of Word/HTML/PDFs on HDFS, I could run the DV and the output would be 
Mahout vectors.

On Feb 12, 2010, at 3:52 AM, Robin Anil wrote:

> Any more feedback on the original topic ? i.e. "Your" use of mahout and your
> wishlist
> 
> Robin


Reply via email to