I'd also add, the other thing that would be great is Tika integration for the DocumentVectorizer (which is seriously cool already!). Thus, if I had a huge number of Word/HTML/PDFs on HDFS, I could run the DV and the output would be Mahout vectors.
On Feb 12, 2010, at 3:52 AM, Robin Anil wrote: > Any more feedback on the original topic ? i.e. "Your" use of mahout and your > wishlist > > Robin
