Sorry for the late reply. The issue number is 99. Please have a look at https://issues.apache.org/jira/browse/MAHOUT-99
Thanks Pallavi -----Original Message----- From: Grant Ingersoll [mailto:[email protected]] Sent: Friday, December 05, 2008 7:56 AM To: [email protected] Subject: Re: mahout & hadoop compatibility Yeah, it should work with 0.18, with a few patches to fix the Combiner issue, if you are using the k-Means clustering stuff. I committed one of them, but forget the Issue numbers (Pallavi?) Have a look in JIRA. On Dec 4, 2008, at 8:11 PM, Pradhuman Jhala wrote: > > Just wondering if Mahout is compatible with hadoop-0.18 (and later) > versions. As in hadoop version 0.18 onwards, the combiner execution > policy has changed and now it gets executed twice - first from > Mapper side (on the output of Mapper) and then again on the Reducer > side (on the output of first Combiner). > > For more details: http://issues.apache.org/jira/browse/HADOOP-3226 <http://issues.apache.org/jira/browse/HADOOP-3226 > > > > It seems me that the kmean and canopy clustering in Mahout assumes > that the combiner gets executed on Mapper side only and it's a major > source of error, as when the Combiner gets executed on the Reducer > side, it can not parse the output of first Combiner correctly. > > To fix, only for hadoop-0.18.*, if you want to use combiner only on > the output of mapper (like earlier hadoop versions), add the > following to your job config: > > job.setCombineOnlyOnce(true); > > This method (setCombineOnlyOnce) is not available in hadoop-0.19 > release, so I think Mahout code needs to be changed to take care of > this issue. > > Pradhuman > > -------------------------- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
