Hi, Robin In my work, i have a lot of query log which produced by search engine and we use hadoop as our tool to analyse those data. Sometimes, i'd like to some data mining job such as clustering the similary queries, or classify them. At first time, i think the mahout maybe another option for me to do data mining job (as you know, the weka is my favorable data mining tool). But, as i try to integrate mahout into my project, i find two major obstacles to prevent me moving on further:
First, in my company, The hadoop with 0.19 is provided as platform for us to do daily jobs. As we know, Mahout is dependent the hadoop with 0.2 or above. This prevent me from benefiting from the functions which provided by mahout. Secondly, the input data should be indexed by Lucene firstly( right or wrong? ), then be imported by the Mahout. It confuse me very much, because there are so many data stored by HDFS. In order to use the Mahout, i have to check out all the data firstly ,and indexed by Lucene, and so on. It is unbelievable for me. So, i haven't use the mahout in my daily work. However, i always give my attendtion to the Mahout, maybe someday i benefit from it. What about other one's idea? On Wed, Feb 10, 2010 at 6:19 PM, Robin Anil <[email protected]> wrote: > Hi Mahouters > I am trying to find out how you are using Mahout for your work or > project, or which among the algorithms in Mahout are more important for you > to do that work. And finally what do you expect to see in Mahout(A kind of > a > wish list). It wont take much of your time. Please reply with this details. > It will help a great deal in figuring out where what we need to > prioritize. > > Thanks > Robin > -- http://anqiang1900.blog.163.com/
