I am not sure I understand the hadoop svn structure, however I was able to make it work with hadoop trunk, or 0.20.0-dev. It didn't work with hadoop/branch-0.18, with or without patch 4277.
Here is a copy-paste of the steps, once Hadoop is built and installed. I am using the same exact "apache-mahout-examples-0.1-dev.job", not rebuilt with the 0.20.0-dev jars. It works! That would mean that the bug/feature is not related to HADOOP-4277<http://issues.apache.org/jira/browse/HADOOP-4277>, and was reintroduced (or never took away) in hadoop/trunk. [EMAIL PROTECTED]:/usr/local/hadoop$ bin/hadoop namenode -format 08/10/29 18:27:59 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = phil/127.0.1.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.0-dev STARTUP_MSG: build = -r ; compiled by 'philippe' on Wed Oct 29 18:25:08 EDT 2008 ************************************************************/ 08/10/29 18:28:00 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop 08/10/29 18:28:00 INFO namenode.FSNamesystem: supergroup=supergroup 08/10/29 18:28:00 INFO namenode.FSNamesystem: isPermissionEnabled=true 08/10/29 18:28:00 INFO common.Storage: Image file of size 96 saved in 0 seconds. 08/10/29 18:28:00 INFO common.Storage: Storage directory /usr/local/hadoop-datastore/hadoop-hadoop/dfs/name has been successfully formatted. 08/10/29 18:28:00 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at phil/127.0.1.1 ************************************************************/ [EMAIL PROTECTED]:/usr/local/hadoop$ bin/hadoop dfs -put /home/philippe/synthetic_control.data testdata [EMAIL PROTECTED]:/usr/local/hadoop$ bin/hadoop jar /home/philippe/workspace/MahoutJava/examples/build/apache-mahout-examples-0.1-dev.job org.apache.mahout.clustering.syntheticcontrol.kmeans.Job 08/10/29 18:28:45 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 08/10/29 18:28:46 INFO mapred.FileInputFormat: Total input paths to process : 1 08/10/29 18:28:47 INFO mapred.JobClient: Running job: job_200810291828_0002 08/10/29 18:28:48 INFO mapred.JobClient: map 0% reduce 0% 08/10/29 18:28:54 INFO mapred.JobClient: map 50% reduce 0% 08/10/29 18:28:55 INFO mapred.JobClient: map 100% reduce 0% 08/10/29 18:28:56 INFO mapred.JobClient: Job complete: job_200810291828_0002 08/10/29 18:28:56 INFO mapred.JobClient: Counters: 7 08/10/29 18:28:56 INFO mapred.JobClient: File Systems 08/10/29 18:28:56 INFO mapred.JobClient: HDFS bytes read=291644 08/10/29 18:28:56 INFO mapred.JobClient: HDFS bytes written=323660 08/10/29 18:28:56 INFO mapred.JobClient: Job Counters 08/10/29 18:28:56 INFO mapred.JobClient: Launched map tasks=2 08/10/29 18:28:56 INFO mapred.JobClient: Data-local map tasks=2 08/10/29 18:28:56 INFO mapred.JobClient: Map-Reduce Framework 08/10/29 18:28:56 INFO mapred.JobClient: Map input records=600 08/10/29 18:28:56 INFO mapred.JobClient: Map input bytes=288374 08/10/29 18:28:56 INFO mapred.JobClient: Map output records=600 08/10/29 18:28:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 08/10/29 18:28:56 INFO mapred.FileInputFormat: Total input paths to process : 2 08/10/29 18:28:56 INFO mapred.JobClient: Running job: job_200810291828_0003 08/10/29 18:28:57 INFO mapred.JobClient: map 0% reduce 0% 08/10/29 18:29:03 INFO mapred.JobClient: map 50% reduce 0% 08/10/29 18:29:05 INFO mapred.JobClient: map 100% reduce 0% 08/10/29 18:29:10 INFO mapred.JobClient: map 100% reduce 100% 08/10/29 18:29:11 INFO mapred.JobClient: Job complete: job_200810291828_0003 08/10/29 18:29:11 INFO mapred.JobClient: Counters: 16 08/10/29 18:29:11 INFO mapred.JobClient: File Systems 08/10/29 18:29:11 INFO mapred.JobClient: HDFS bytes read=323660 08/10/29 18:29:11 INFO mapred.JobClient: HDFS bytes written=9657 08/10/29 18:29:11 INFO mapred.JobClient: Local bytes read=36119 08/10/29 18:29:11 INFO mapred.JobClient: Local bytes written=72300 08/10/29 18:29:11 INFO mapred.JobClient: Job Counters 08/10/29 18:29:11 INFO mapred.JobClient: Launched reduce tasks=1 08/10/29 18:29:11 INFO mapred.JobClient: Launched map tasks=2 08/10/29 18:29:11 INFO mapred.JobClient: Data-local map tasks=2 08/10/29 18:29:11 INFO mapred.JobClient: Map-Reduce Framework 08/10/29 18:29:11 INFO mapred.JobClient: Reduce input groups=1 08/10/29 18:29:11 INFO mapred.JobClient: Combine output records=28 08/10/29 18:29:11 INFO mapred.JobClient: Map input records=600 08/10/29 18:29:11 INFO mapred.JobClient: Reduce output records=7 08/10/29 18:29:11 INFO mapred.JobClient: Map output bytes=943020 08/10/29 18:29:11 INFO mapred.JobClient: Map input bytes=323660 08/10/29 18:29:11 INFO mapred.JobClient: Combine input records=1732 08/10/29 18:29:11 INFO mapred.JobClient: Map output records=1732 08/10/29 18:29:11 INFO mapred.JobClient: Reduce input records=28 08/10/29 18:29:11 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 08/10/29 18:29:11 INFO mapred.FileInputFormat: Total input paths to process : 2 08/10/29 18:29:12 INFO mapred.JobClient: Running job: job_200810291828_0004 08/10/29 18:29:13 INFO mapred.JobClient: map 0% reduce 0% 08/10/29 18:29:20 INFO mapred.JobClient: map 50% reduce 0% 08/10/29 18:29:22 INFO mapred.JobClient: map 100% reduce 0% 08/10/29 18:29:27 INFO mapred.JobClient: map 100% reduce 100% 08/10/29 18:29:28 INFO mapred.JobClient: Job complete: job_200810291828_0004 08/10/29 18:29:28 INFO mapred.JobClient: Counters: 16 08/10/29 18:29:28 INFO mapred.JobClient: File Systems 08/10/29 18:29:28 INFO mapred.JobClient: HDFS bytes read=342974 08/10/29 18:29:28 INFO mapred.JobClient: HDFS bytes written=3002539 08/10/29 18:29:28 INFO mapred.JobClient: Local bytes read=3018455 08/10/29 18:29:28 INFO mapred.JobClient: Local bytes written=6036972 08/10/29 18:29:28 INFO mapred.JobClient: Job Counters 08/10/29 18:29:28 INFO mapred.JobClient: Launched reduce tasks=1 08/10/29 18:29:28 INFO mapred.JobClient: Launched map tasks=2 08/10/29 18:29:28 INFO mapred.JobClient: Data-local map tasks=2 08/10/29 18:29:28 INFO mapred.JobClient: Map-Reduce Framework 08/10/29 18:29:28 INFO mapred.JobClient: Reduce input groups=7 08/10/29 18:29:28 INFO mapred.JobClient: Combine output records=0 08/10/29 18:29:28 INFO mapred.JobClient: Map input records=600 08/10/29 18:29:28 INFO mapred.JobClient: Reduce output records=1591 08/10/29 18:29:28 INFO mapred.JobClient: Map output bytes=3008903 08/10/29 18:29:28 INFO mapred.JobClient: Map input bytes=323660 08/10/29 18:29:28 INFO mapred.JobClient: Combine input records=0 08/10/29 18:29:28 INFO mapred.JobClient: Map output records=1591 08/10/29 18:29:28 INFO mapred.JobClient: Reduce input records=1591 08/10/29 18:29:28 INFO kmeans.KMeansDriver: Iteration 0 08/10/29 18:29:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 08/10/29 18:29:28 INFO mapred.FileInputFormat: Total input paths to process : 2 08/10/29 18:29:28 INFO mapred.JobClient: Running job: job_200810291828_0005 08/10/29 18:29:29 INFO mapred.JobClient: map 0% reduce 0% 08/10/29 18:29:35 INFO mapred.JobClient: map 50% reduce 0% 08/10/29 18:29:37 INFO mapred.JobClient: map 100% reduce 0% 08/10/29 18:29:41 INFO mapred.JobClient: Job complete: job_200810291828_0005 08/10/29 18:29:41 INFO mapred.JobClient: Counters: 16 08/10/29 18:29:41 INFO mapred.JobClient: File Systems 08/10/29 18:29:41 INFO mapred.JobClient: HDFS bytes read=342974 08/10/29 18:29:41 INFO mapred.JobClient: HDFS bytes written=8205 08/10/29 18:29:41 INFO mapred.JobClient: Local bytes read=23227 08/10/29 18:29:41 INFO mapred.JobClient: Local bytes written=46516 08/10/29 18:29:41 INFO mapred.JobClient: Job Counters 08/10/29 18:29:41 INFO mapred.JobClient: Launched reduce tasks=1 08/10/29 18:29:41 INFO mapred.JobClient: Launched map tasks=2 08/10/29 18:29:41 INFO mapred.JobClient: Data-local map tasks=2 08/10/29 18:29:41 INFO mapred.JobClient: Map-Reduce Framework 08/10/29 18:29:41 INFO mapred.JobClient: Reduce input groups=7 08/10/29 18:29:41 INFO mapred.JobClient: Combine output records=10 08/10/29 18:29:41 INFO mapred.JobClient: Map input records=600 08/10/29 18:29:41 INFO mapred.JobClient: Reduce output records=7 08/10/29 18:29:41 INFO mapred.JobClient: Map output bytes=1136504 08/10/29 18:29:41 INFO mapred.JobClient: Map input bytes=323660 08/10/29 18:29:41 INFO mapred.JobClient: Combine input records=600 08/10/29 18:29:41 INFO mapred.JobClient: Map output records=600 08/10/29 18:29:41 INFO mapred.JobClient: Reduce input records=10 08/10/29 18:29:41 INFO kmeans.KMeansDriver: Iteration 1 08/10/29 18:29:41 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 08/10/29 18:29:41 INFO mapred.FileInputFormat: Total input paths to process : 2 08/10/29 18:29:42 INFO mapred.JobClient: Running job: job_200810291828_0006 08/10/29 18:29:43 INFO mapred.JobClient: map 0% reduce 0% 08/10/29 18:29:50 INFO mapred.JobClient: map 50% reduce 0% 08/10/29 18:29:51 INFO mapred.JobClient: map 100% reduce 0% 08/10/29 18:29:55 INFO mapred.JobClient: map 100% reduce 100% 08/10/29 18:29:56 INFO mapred.JobClient: Job complete: job_200810291828_0006 08/10/29 18:29:56 INFO mapred.JobClient: Counters: 16 08/10/29 18:29:56 INFO mapred.JobClient: File Systems 08/10/29 18:29:56 INFO mapred.JobClient: HDFS bytes read=340070 08/10/29 18:29:56 INFO mapred.JobClient: HDFS bytes written=8242 08/10/29 18:29:56 INFO mapred.JobClient: Local bytes read=21265 08/10/29 18:29:56 INFO mapred.JobClient: Local bytes written=42592 08/10/29 18:29:56 INFO mapred.JobClient: Job Counters 08/10/29 18:29:56 INFO mapred.JobClient: Launched reduce tasks=1 08/10/29 18:29:56 INFO mapred.JobClient: Launched map tasks=2 08/10/29 18:29:56 INFO mapred.JobClient: Data-local map tasks=2 08/10/29 18:29:56 INFO mapred.JobClient: Map-Reduce Framework 08/10/29 18:29:56 INFO mapred.JobClient: Reduce input groups=7 08/10/29 18:29:56 INFO mapred.JobClient: Combine output records=10 08/10/29 18:29:56 INFO mapred.JobClient: Map input records=600 08/10/29 18:29:56 INFO mapred.JobClient: Reduce output records=7 08/10/29 18:29:56 INFO mapred.JobClient: Map output bytes=1023966 08/10/29 18:29:56 INFO mapred.JobClient: Map input bytes=323660 08/10/29 18:29:56 INFO mapred.JobClient: Combine input records=600 08/10/29 18:29:56 INFO mapred.JobClient: Map output records=600 08/10/29 18:29:56 INFO mapred.JobClient: Reduce input records=10 08/10/29 18:29:56 INFO kmeans.KMeansDriver: Iteration 2 08/10/29 18:29:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 08/10/29 18:29:56 INFO mapred.FileInputFormat: Total input paths to process : 2 08/10/29 18:29:56 INFO mapred.JobClient: Running job: job_200810291828_0007 08/10/29 18:29:57 INFO mapred.JobClient: map 0% reduce 0% 08/10/29 18:30:03 INFO mapred.JobClient: map 50% reduce 0% 08/10/29 18:30:05 INFO mapred.JobClient: map 100% reduce 0% 08/10/29 18:30:09 INFO mapred.JobClient: Job complete: job_200810291828_0007 08/10/29 18:30:09 INFO mapred.JobClient: Counters: 16 08/10/29 18:30:09 INFO mapred.JobClient: File Systems 08/10/29 18:30:09 INFO mapred.JobClient: HDFS bytes read=340144 08/10/29 18:30:09 INFO mapred.JobClient: HDFS bytes written=8280 08/10/29 18:30:09 INFO mapred.JobClient: Local bytes read=21085 08/10/29 18:30:09 INFO mapred.JobClient: Local bytes written=42232 08/10/29 18:30:09 INFO mapred.JobClient: Job Counters 08/10/29 18:30:09 INFO mapred.JobClient: Launched reduce tasks=1 08/10/29 18:30:09 INFO mapred.JobClient: Launched map tasks=2 08/10/29 18:30:09 INFO mapred.JobClient: Data-local map tasks=2 08/10/29 18:30:09 INFO mapred.JobClient: Map-Reduce Framework 08/10/29 18:30:09 INFO mapred.JobClient: Reduce input groups=7 08/10/29 18:30:09 INFO mapred.JobClient: Combine output records=10 08/10/29 18:30:09 INFO mapred.JobClient: Map input records=600 08/10/29 18:30:09 INFO mapred.JobClient: Reduce output records=7 08/10/29 18:30:09 INFO mapred.JobClient: Map output bytes=1023681 08/10/29 18:30:09 INFO mapred.JobClient: Map input bytes=323660 08/10/29 18:30:09 INFO mapred.JobClient: Combine input records=600 08/10/29 18:30:09 INFO mapred.JobClient: Map output records=600 08/10/29 18:30:09 INFO mapred.JobClient: Reduce input records=10 08/10/29 18:30:09 INFO kmeans.KMeansDriver: Iteration 3 08/10/29 18:30:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 08/10/29 18:30:09 INFO mapred.FileInputFormat: Total input paths to process : 2 08/10/29 18:30:09 INFO mapred.JobClient: Running job: job_200810291828_0008 08/10/29 18:30:10 INFO mapred.JobClient: map 0% reduce 0% 08/10/29 18:30:17 INFO mapred.JobClient: map 50% reduce 0% 08/10/29 18:30:18 INFO mapred.JobClient: map 100% reduce 0% 08/10/29 18:30:22 INFO mapred.JobClient: map 100% reduce 100% 08/10/29 18:30:23 INFO mapred.JobClient: Job complete: job_200810291828_0008 08/10/29 18:30:23 INFO mapred.JobClient: Counters: 16 08/10/29 18:30:23 INFO mapred.JobClient: File Systems 08/10/29 18:30:23 INFO mapred.JobClient: HDFS bytes read=340220 08/10/29 18:30:23 INFO mapred.JobClient: HDFS bytes written=8250 08/10/29 18:30:23 INFO mapred.JobClient: Local bytes read=21339 08/10/29 18:30:23 INFO mapred.JobClient: Local bytes written=42740 08/10/29 18:30:23 INFO mapred.JobClient: Job Counters 08/10/29 18:30:23 INFO mapred.JobClient: Launched reduce tasks=1 08/10/29 18:30:23 INFO mapred.JobClient: Launched map tasks=2 08/10/29 18:30:23 INFO mapred.JobClient: Data-local map tasks=2 08/10/29 18:30:23 INFO mapred.JobClient: Map-Reduce Framework 08/10/29 18:30:23 INFO mapred.JobClient: Reduce input groups=7 08/10/29 18:30:23 INFO mapred.JobClient: Combine output records=10 08/10/29 18:30:23 INFO mapred.JobClient: Map input records=600 08/10/29 18:30:23 INFO mapred.JobClient: Reduce output records=7 08/10/29 18:30:23 INFO mapred.JobClient: Map output bytes=1028419 08/10/29 18:30:23 INFO mapred.JobClient: Map input bytes=323660 08/10/29 18:30:23 INFO mapred.JobClient: Combine input records=600 08/10/29 18:30:23 INFO mapred.JobClient: Map output records=600 08/10/29 18:30:23 INFO mapred.JobClient: Reduce input records=10 08/10/29 18:30:23 INFO kmeans.KMeansDriver: Iteration 4 08/10/29 18:30:23 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 08/10/29 18:30:23 INFO mapred.FileInputFormat: Total input paths to process : 2 08/10/29 18:30:24 INFO mapred.JobClient: Running job: job_200810291828_0009 08/10/29 18:30:25 INFO mapred.JobClient: map 0% reduce 0% 08/10/29 18:30:31 INFO mapred.JobClient: map 50% reduce 0% 08/10/29 18:30:33 INFO mapred.JobClient: map 100% reduce 0% 08/10/29 18:30:37 INFO mapred.JobClient: map 100% reduce 100% 08/10/29 18:30:38 INFO mapred.JobClient: Job complete: job_200810291828_0009 08/10/29 18:30:38 INFO mapred.JobClient: Counters: 16 08/10/29 18:30:38 INFO mapred.JobClient: File Systems 08/10/29 18:30:38 INFO mapred.JobClient: HDFS bytes read=340160 08/10/29 18:30:38 INFO mapred.JobClient: HDFS bytes written=8200 08/10/29 18:30:38 INFO mapred.JobClient: Local bytes read=21219 08/10/29 18:30:38 INFO mapred.JobClient: Local bytes written=42500 08/10/29 18:30:38 INFO mapred.JobClient: Job Counters 08/10/29 18:30:38 INFO mapred.JobClient: Launched reduce tasks=1 08/10/29 18:30:38 INFO mapred.JobClient: Launched map tasks=2 08/10/29 18:30:38 INFO mapred.JobClient: Data-local map tasks=2 08/10/29 18:30:38 INFO mapred.JobClient: Map-Reduce Framework 08/10/29 18:30:38 INFO mapred.JobClient: Reduce input groups=7 08/10/29 18:30:38 INFO mapred.JobClient: Combine output records=10 08/10/29 18:30:38 INFO mapred.JobClient: Map input records=600 08/10/29 18:30:38 INFO mapred.JobClient: Reduce output records=7 08/10/29 18:30:38 INFO mapred.JobClient: Map output bytes=1024899 08/10/29 18:30:38 INFO mapred.JobClient: Map input bytes=323660 08/10/29 18:30:38 INFO mapred.JobClient: Combine input records=600 08/10/29 18:30:38 INFO mapred.JobClient: Map output records=600 08/10/29 18:30:38 INFO mapred.JobClient: Reduce input records=10 08/10/29 18:30:38 INFO kmeans.KMeansDriver: Clustering 08/10/29 18:30:38 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 08/10/29 18:30:38 INFO mapred.FileInputFormat: Total input paths to process : 2 08/10/29 18:30:38 INFO mapred.JobClient: Running job: job_200810291828_0010 08/10/29 18:30:39 INFO mapred.JobClient: map 0% reduce 0% 08/10/29 18:30:45 INFO mapred.JobClient: map 50% reduce 0% 08/10/29 18:30:47 INFO mapred.JobClient: Job complete: job_200810291828_0010 08/10/29 18:30:47 INFO mapred.JobClient: Counters: 7 08/10/29 18:30:47 INFO mapred.JobClient: File Systems 08/10/29 18:30:47 INFO mapred.JobClient: HDFS bytes read=340060 08/10/29 18:30:47 INFO mapred.JobClient: HDFS bytes written=1020535 08/10/29 18:30:47 INFO mapred.JobClient: Job Counters 08/10/29 18:30:47 INFO mapred.JobClient: Launched map tasks=2 08/10/29 18:30:47 INFO mapred.JobClient: Data-local map tasks=2 08/10/29 18:30:47 INFO mapred.JobClient: Map-Reduce Framework 08/10/29 18:30:47 INFO mapred.JobClient: Map input records=600 08/10/29 18:30:47 INFO mapred.JobClient: Map input bytes=323660 08/10/29 18:30:47 INFO mapred.JobClient: Map output records=600 08/10/29 18:30:47 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 08/10/29 18:30:47 INFO mapred.FileInputFormat: Total input paths to process : 2 08/10/29 18:30:48 INFO mapred.JobClient: Running job: job_200810291828_0011 08/10/29 18:30:49 INFO mapred.JobClient: map 0% reduce 0% 08/10/29 18:30:56 INFO mapred.JobClient: map 50% reduce 0% 08/10/29 18:30:57 INFO mapred.JobClient: Job complete: job_200810291828_0011 08/10/29 18:30:57 INFO mapred.JobClient: Counters: 7 08/10/29 18:30:57 INFO mapred.JobClient: File Systems 08/10/29 18:30:57 INFO mapred.JobClient: HDFS bytes read=1020535 08/10/29 18:30:57 INFO mapred.JobClient: HDFS bytes written=325460 08/10/29 18:30:57 INFO mapred.JobClient: Job Counters 08/10/29 18:30:57 INFO mapred.JobClient: Launched map tasks=2 08/10/29 18:30:57 INFO mapred.JobClient: Data-local map tasks=2 08/10/29 18:30:57 INFO mapred.JobClient: Map-Reduce Framework 08/10/29 18:30:57 INFO mapred.JobClient: Map input records=600 08/10/29 18:30:57 INFO mapred.JobClient: Map input bytes=1020535 08/10/29 18:30:57 INFO mapred.JobClient: Map output records=600 On Wed, Oct 29, 2008 at 11:10 AM, Philippe Lamarche < [EMAIL PROTECTED]> wrote: > I will! > > > On 10/29/08, Grant Ingersoll <[EMAIL PROTECTED]> wrote: >> >> Philippe, can you try the patch suggested by Arun Murthy on >> [EMAIL PROTECTED] See >> http://issues.apache.org/jira/browse/HADOOP-4277 >> >> I'm pretty swamped at the moment w/ ApacheCon coming up next week, but if >> it does fix the issue, then maybe we should move forward to the 18.2 >> candidate (I don't think it has been released yet, those guys have a pretty >> sophisticated build process going) >> >> -Grant >> >> On Oct 28, 2008, at 7:19 AM, Philippe Lamarche wrote: >> >> Ubuntu linux 2.6.24 <http://2.6.24.21>, with java-6-sun-1.6.0.07. >>> >>> On Tue, Oct 28, 2008 at 7:03 AM, Grant Ingersoll <[EMAIL PROTECTED] >>> >wrote: >>> >>> Just a single machine. I didn't think we were using features either. >>>> Are >>>> you saying you can run the example using 0.18.1? >>>> >>>> BTW, Philippe, what JVM, O/S, etc. are you using? >>>> >>>> -Grant >>>> >>>> >>>> On Oct 27, 2008, at 11:55 PM, Jeff Eastman wrote: >>>> >>>> Hi, >>>> >>>>> >>>>> Are you guys running on real Hadoop arrays? I can run the synthetic >>>>> control example just fine on a single machine. That code is just trying >>>>> to >>>>> read a vector from a string. I'd be surprised if we were using any >>>>> "features" but will watch the threads. >>>>> >>>>> Jeff >>>>> >>>>> >>>>> >>>>> Grant Ingersoll wrote: >>>>> >>>>> I started a thread on [EMAIL PROTECTED]: >>>>>> http://hadoop.markmail.org/message/cczunzfhpcqz6pis >>>>>> >>>>>> >>>>>> On Oct 27, 2008, at 9:49 PM, Grant Ingersoll wrote: >>>>>> >>>>>> OK, I can confirm that the exact same code works with 0.17.2 and not >>>>>> w/ >>>>>> >>>>>>> 0.18.1. So, it sounds like a bug in Hadoop, or we are relying on >>>>>>> incorrect behavior in Hadoop. >>>>>>> >>>>>>> >>>>>>> On Oct 27, 2008, at 9:33 PM, Grant Ingersoll wrote: >>>>>>> >>>>>>> >>>>>>> On Oct 26, 2008, at 10:46 AM, Philippe Lamarche wrote: >>>>>>>> >>>>>>>> Unfortunately, I went straight from 0.17.2 to 0.18.1. It was >>>>>>>> working >>>>>>>> >>>>>>>>> on >>>>>>>>> 0.17.2. >>>>>>>>> >>>>>>>>> >>>>>>>>> BTW, are you saying the same exact code was working on 0.17.2 or >>>>>>>> are >>>>>>>> you referring to some older Mahout code that worked on 17.2? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Sun, Oct 26, 2008 at 9:48 AM, Grant Ingersoll < >>>>>>>>> [EMAIL PROTECTED] >>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>> >>>>>>>>> Did this work with 0.18.0 or other prior versions for you? >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Oct 25, 2008, at 7:23 PM, Philippe Lamarche wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> I just updated to hadoop 0.18.1 and got a clean version of mahout >>>>>>>>>>> from >>>>>>>>>>> svn. >>>>>>>>>>> However, I am having problems with KMeans, that can be traced >>>>>>>>>>> down >>>>>>>>>>> to : >>>>>>>>>>> >>>>>>>>>>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger: >>>>>>>>>>> Merging >>>>>>>>>>> 2 sorted segments >>>>>>>>>>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger: >>>>>>>>>>> Down >>>>>>>>>>> to >>>>>>>>>>> the last merge-pass, with 2 segments left of total size: 5011 >>>>>>>>>>> bytes >>>>>>>>>>> 2008-10-25 19:10:16,999 WARN org.apache.hadoop.mapred.ReduceTask: >>>>>>>>>>> attempt_200810251826_0013_r_000000_0 Merge of the inmemory files >>>>>>>>>>> threw >>>>>>>>>>> an exception: java.io.IOException: Intermedate merge failed >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2078) >>>>>>>>>>> Caused by: java.lang.NumberFormatException: For input string: "[" >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224) >>>>>>>>>>> at java.lang.Double.parseDouble(Double.java:510) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.mahout.matrix.AbstractVector.decodeVector(AbstractVector.java:256) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:38) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.combineAndSpill(ReduceTask.java:2174) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$3100(ReduceTask.java:341) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134) >>>>>>>>>>> ... 1 more >>>>>>>>>>> >>>>>>>>>>> 2008-10-25 19:10:16,999 INFO org.apache.hadoop.mapred.ReduceTask: >>>>>>>>>>> In-memory merge complete: 0 files left. >>>>>>>>>>> 2008-10-25 19:10:17,000 WARN >>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>>> Error running child >>>>>>>>>>> java.io.IOException: attempt_200810251826_0013_r_000000_0The >>>>>>>>>>> reduce >>>>>>>>>>> copier failed >>>>>>>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This is while running the synthetic_control.data example, but I >>>>>>>>>>> have >>>>>>>>>>> the >>>>>>>>>>> same problems with any other input data. >>>>>>>>>>> >>>>>>>>>>> I am able to do other map-reduce job without problems. >>>>>>>>>>> >>>>>>>>>>> Here is the output of the jar task: >>>>>>>>>>> >>>>>>>>>>> [EMAIL PROTECTED]:/usr/local/hadoop$ bin/hadoop jar >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> /home/philippe/workspace/MahoutJava/examples/dist/apache-mahout-examples-0.1-dev.jar >>>>>>>>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job >>>>>>>>>>> 08/10/25 19:09:27 WARN mapred.JobClient: Use GenericOptionsParser >>>>>>>>>>> for >>>>>>>>>>> parsing the arguments. Applications should implement Tool for the >>>>>>>>>>> same. >>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input paths >>>>>>>>>>> to >>>>>>>>>>> process >>>>>>>>>>> : 1 >>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input paths >>>>>>>>>>> to >>>>>>>>>>> process >>>>>>>>>>> : 1 >>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.JobClient: Running job: >>>>>>>>>>> job_200810251826_0010 >>>>>>>>>>> 08/10/25 19:09:29 INFO mapred.JobClient: map 0% reduce 0% >>>>>>>>>>> 08/10/25 19:09:31 INFO mapred.JobClient: map 50% reduce 0% >>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Job complete: >>>>>>>>>>> job_200810251826_0010 >>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Counters: 7 >>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: File Systems >>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: HDFS bytes >>>>>>>>>>> read=291644 >>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: HDFS bytes >>>>>>>>>>> written=323660 >>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Job Counters >>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Launched map tasks=2 >>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Data-local map >>>>>>>>>>> tasks=2 >>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Map-Reduce Framework >>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Map input >>>>>>>>>>> records=600 >>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Map input >>>>>>>>>>> bytes=288374 >>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Map output >>>>>>>>>>> records=600 >>>>>>>>>>> 08/10/25 19:09:32 WARN mapred.JobClient: Use GenericOptionsParser >>>>>>>>>>> for >>>>>>>>>>> parsing the arguments. Applications should implement Tool for the >>>>>>>>>>> same. >>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input paths >>>>>>>>>>> to >>>>>>>>>>> process >>>>>>>>>>> : 2 >>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input paths >>>>>>>>>>> to >>>>>>>>>>> process >>>>>>>>>>> : 2 >>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Running job: >>>>>>>>>>> job_200810251826_0011 >>>>>>>>>>> 08/10/25 19:09:33 INFO mapred.JobClient: map 0% reduce 0% >>>>>>>>>>> 08/10/25 19:09:37 INFO mapred.JobClient: map 50% reduce 0% >>>>>>>>>>> 08/10/25 19:09:39 INFO mapred.JobClient: map 100% reduce 0% >>>>>>>>>>> 08/10/25 19:09:44 INFO mapred.JobClient: map 100% reduce 16% >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Job complete: >>>>>>>>>>> job_200810251826_0011 >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Counters: 16 >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: File Systems >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: HDFS bytes >>>>>>>>>>> read=323660 >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: HDFS bytes >>>>>>>>>>> written=1447 >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Local bytes >>>>>>>>>>> read=1389 >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Local bytes >>>>>>>>>>> written=37878 >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Job Counters >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Launched reduce >>>>>>>>>>> tasks=1 >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Launched map tasks=2 >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Data-local map >>>>>>>>>>> tasks=2 >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Map-Reduce Framework >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Reduce input >>>>>>>>>>> groups=1 >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Combine output >>>>>>>>>>> records=29 >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Map input >>>>>>>>>>> records=600 >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Reduce output >>>>>>>>>>> records=1 >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Map output >>>>>>>>>>> bytes=943020 >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Map input >>>>>>>>>>> bytes=323660 >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Combine input >>>>>>>>>>> records=1760 >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Map output >>>>>>>>>>> records=1732 >>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Reduce input >>>>>>>>>>> records=1 >>>>>>>>>>> 08/10/25 19:09:53 WARN mapred.JobClient: Use GenericOptionsParser >>>>>>>>>>> for >>>>>>>>>>> parsing the arguments. Applications should implement Tool for the >>>>>>>>>>> same. >>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input paths >>>>>>>>>>> to >>>>>>>>>>> process >>>>>>>>>>> : 2 >>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input paths >>>>>>>>>>> to >>>>>>>>>>> process >>>>>>>>>>> : 2 >>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.JobClient: Running job: >>>>>>>>>>> job_200810251826_0012 >>>>>>>>>>> 08/10/25 19:09:54 INFO mapred.JobClient: map 0% reduce 0% >>>>>>>>>>> 08/10/25 19:09:56 INFO mapred.JobClient: map 50% reduce 0% >>>>>>>>>>> 08/10/25 19:09:58 INFO mapred.JobClient: map 100% reduce 0% >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Job complete: >>>>>>>>>>> job_200810251826_0012 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Counters: 16 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: File Systems >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: HDFS bytes >>>>>>>>>>> read=326554 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: HDFS bytes >>>>>>>>>>> written=1137260 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Local bytes >>>>>>>>>>> read=1147358 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Local bytes >>>>>>>>>>> written=2304490 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Job Counters >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Launched reduce >>>>>>>>>>> tasks=1 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Launched map tasks=2 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Data-local map >>>>>>>>>>> tasks=2 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Map-Reduce Framework >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Reduce input >>>>>>>>>>> groups=1 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Combine output >>>>>>>>>>> records=0 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Map input >>>>>>>>>>> records=600 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Reduce output >>>>>>>>>>> records=600 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Map output >>>>>>>>>>> bytes=1139660 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Map input >>>>>>>>>>> bytes=323660 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Combine input >>>>>>>>>>> records=0 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Map output >>>>>>>>>>> records=600 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Reduce input >>>>>>>>>>> records=600 >>>>>>>>>>> 08/10/25 19:10:02 INFO kmeans.KMeansDriver: Iteration 0 >>>>>>>>>>> 08/10/25 19:10:02 WARN mapred.JobClient: Use GenericOptionsParser >>>>>>>>>>> for >>>>>>>>>>> parsing the arguments. Applications should implement Tool for the >>>>>>>>>>> same. >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input paths >>>>>>>>>>> to >>>>>>>>>>> process >>>>>>>>>>> : 2 >>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input paths >>>>>>>>>>> to >>>>>>>>>>> process >>>>>>>>>>> : 2 >>>>>>>>>>> 08/10/25 19:10:03 INFO mapred.JobClient: Running job: >>>>>>>>>>> job_200810251826_0013 >>>>>>>>>>> 08/10/25 19:10:04 INFO mapred.JobClient: map 0% reduce 0% >>>>>>>>>>> 08/10/25 19:10:08 INFO mapred.JobClient: map 50% reduce 0% >>>>>>>>>>> 08/10/25 19:10:09 INFO mapred.JobClient: map 100% reduce 0% >>>>>>>>>>> 08/10/25 19:10:21 INFO mapred.JobClient: Task Id : >>>>>>>>>>> attempt_200810251826_0013_r_000000_0, Status : FAILED >>>>>>>>>>> java.io.IOException: attempt_200810251826_0013_r_000000_0The >>>>>>>>>>> reduce >>>>>>>>>>> copier >>>>>>>>>>> failed >>>>>>>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I am not sure if I am doing something wrong here. >>>>>>>>>>> >>>>>>>>>>> Thanks for the help, >>>>>>>>>>> >>>>>>>>>>> Philippe. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -------------------------- >>>>>>>>>> Grant Ingersoll >>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New >>>>>>>>>> Orleans. >>>>>>>>>> http://www.lucenebootcamp.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Lucene Helpful Hints: >>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -------------------------- >>>>>>>> Grant Ingersoll >>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. >>>>>>>> http://www.lucenebootcamp.com >>>>>>>> >>>>>>>> >>>>>>>> Lucene Helpful Hints: >>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -------------------------- >>>>>>> Grant Ingersoll >>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. >>>>>>> http://www.lucenebootcamp.com >>>>>>> >>>>>>> >>>>>>> Lucene Helpful Hints: >>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -------------------------- >>>>>> Grant Ingersoll >>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. >>>>>> http://www.lucenebootcamp.com >>>>>> >>>>>> >>>>>> Lucene Helpful Hints: >>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> -------------------------- >>>> Grant Ingersoll >>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. >>>> http://www.lucenebootcamp.com >>>> >>>> >>>> Lucene Helpful Hints: >>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>>> http://wiki.apache.org/lucene-java/LuceneFAQ >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >> -------------------------- >> Grant Ingersoll >> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. >> http://www.lucenebootcamp.com >> >> >> Lucene Helpful Hints: >> http://wiki.apache.org/lucene-java/BasicsOfPerformance >> http://wiki.apache.org/lucene-java/LuceneFAQ >> >> >> >> >> >> >> >> >> >> >
