org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
08/10/27 15:34:55 WARN mapred.JobClient: Use GenericOptionsParser
for
parsing the arguments. Applications should implement Tool for the
same.
08/10/27 15:34:55 INFO mapred.FileInputFormat: Total input paths to
process
: 1
08/10/27 15:34:55 INFO mapred.FileInputFormat: Total input paths to
process
: 1
08/10/27 15:34:55 INFO mapred.JobClient: Running job:
job_200810271532_0001
08/10/27 15:34:56 INFO mapred.JobClient: map 0% reduce 0%
08/10/27 15:34:59 INFO mapred.JobClient: Job complete:
job_200810271532_0001
08/10/27 15:34:59 INFO mapred.JobClient: Counters: 7
08/10/27 15:34:59 INFO mapred.JobClient: File Systems
08/10/27 15:34:59 INFO mapred.JobClient: HDFS bytes read=291644
08/10/27 15:34:59 INFO mapred.JobClient: HDFS bytes
written=323660
08/10/27 15:34:59 INFO mapred.JobClient: Job Counters
08/10/27 15:34:59 INFO mapred.JobClient: Launched map tasks=2
08/10/27 15:34:59 INFO mapred.JobClient: Data-local map tasks=2
08/10/27 15:34:59 INFO mapred.JobClient: Map-Reduce Framework
08/10/27 15:34:59 INFO mapred.JobClient: Map input records=600
08/10/27 15:34:59 INFO mapred.JobClient: Map input bytes=288374
08/10/27 15:34:59 INFO mapred.JobClient: Map output records=600
08/10/27 15:34:59 WARN mapred.JobClient: Use GenericOptionsParser
for
parsing the arguments. Applications should implement Tool for the
same.
08/10/27 15:35:00 INFO mapred.FileInputFormat: Total input paths to
process
: 2
08/10/27 15:35:00 INFO mapred.FileInputFormat: Total input paths to
process
: 2
08/10/27 15:35:00 INFO mapred.JobClient: Running job:
job_200810271532_0002
08/10/27 15:35:01 INFO mapred.JobClient: map 0% reduce 0%
08/10/27 15:35:10 INFO mapred.JobClient: map 100% reduce 0%
08/10/27 15:35:16 INFO mapred.JobClient: Job complete:
job_200810271532_0002
08/10/27 15:35:16 INFO mapred.JobClient: Counters: 16
08/10/27 15:35:16 INFO mapred.JobClient: File Systems
08/10/27 15:35:16 INFO mapred.JobClient: HDFS bytes read=323660
08/10/27 15:35:16 INFO mapred.JobClient: HDFS bytes written=1447
08/10/27 15:35:16 INFO mapred.JobClient: Local bytes read=1389
08/10/27 15:35:16 INFO mapred.JobClient: Local bytes
written=37878
08/10/27 15:35:16 INFO mapred.JobClient: Job Counters
08/10/27 15:35:16 INFO mapred.JobClient: Launched reduce tasks=1
08/10/27 15:35:16 INFO mapred.JobClient: Launched map tasks=2
08/10/27 15:35:16 INFO mapred.JobClient: Data-local map tasks=2
08/10/27 15:35:16 INFO mapred.JobClient: Map-Reduce Framework
08/10/27 15:35:16 INFO mapred.JobClient: Reduce input groups=1
08/10/27 15:35:16 INFO mapred.JobClient: Combine output
records=29
08/10/27 15:35:16 INFO mapred.JobClient: Map input records=600
08/10/27 15:35:16 INFO mapred.JobClient: Reduce output records=1
08/10/27 15:35:16 INFO mapred.JobClient: Map output bytes=943020
08/10/27 15:35:16 INFO mapred.JobClient: Map input bytes=323660
08/10/27 15:35:16 INFO mapred.JobClient: Combine input
records=1760
08/10/27 15:35:16 INFO mapred.JobClient: Map output records=1732
08/10/27 15:35:16 INFO mapred.JobClient: Reduce input records=1
08/10/27 15:35:16 WARN mapred.JobClient: Use GenericOptionsParser
for
parsing the arguments. Applications should implement Tool for the
same.
08/10/27 15:35:16 INFO mapred.FileInputFormat: Total input paths to
process
: 2
08/10/27 15:35:16 INFO mapred.FileInputFormat: Total input paths to
process
: 2
08/10/27 15:35:16 INFO mapred.JobClient: Running job:
job_200810271532_0003
08/10/27 15:35:17 INFO mapred.JobClient: map 0% reduce 0%
08/10/27 15:35:24 INFO mapred.JobClient: map 100% reduce 0%
08/10/27 15:35:28 INFO mapred.JobClient: Job complete:
job_200810271532_0003
08/10/27 15:35:28 INFO mapred.JobClient: Counters: 16
08/10/27 15:35:28 INFO mapred.JobClient: File Systems
08/10/27 15:35:28 INFO mapred.JobClient: HDFS bytes read=326554
08/10/27 15:35:28 INFO mapred.JobClient: HDFS bytes
written=1137260
08/10/27 15:35:28 INFO mapred.JobClient: Local bytes
read=1147358
08/10/27 15:35:28 INFO mapred.JobClient: Local bytes
written=2304490
08/10/27 15:35:28 INFO mapred.JobClient: Job Counters
08/10/27 15:35:28 INFO mapred.JobClient: Launched reduce tasks=1
08/10/27 15:35:28 INFO mapred.JobClient: Launched map tasks=2
08/10/27 15:35:28 INFO mapred.JobClient: Data-local map tasks=2
08/10/27 15:35:28 INFO mapred.JobClient: Map-Reduce Framework
08/10/27 15:35:28 INFO mapred.JobClient: Reduce input groups=1
08/10/27 15:35:28 INFO mapred.JobClient: Combine output
records=0
08/10/27 15:35:28 INFO mapred.JobClient: Map input records=600
08/10/27 15:35:28 INFO mapred.JobClient: Reduce output
records=600
08/10/27 15:35:28 INFO mapred.JobClient: Map output
bytes=1139660
08/10/27 15:35:28 INFO mapred.JobClient: Map input bytes=323660
08/10/27 15:35:28 INFO mapred.JobClient: Combine input records=0
08/10/27 15:35:28 INFO mapred.JobClient: Map output records=600
08/10/27 15:35:28 INFO mapred.JobClient: Reduce input
records=600
08/10/27 15:35:28 INFO kmeans.KMeansDriver: Iteration 0
08/10/27 15:35:29 WARN mapred.JobClient: Use GenericOptionsParser
for
parsing the arguments. Applications should implement Tool for the
same.
08/10/27 15:35:29 INFO mapred.FileInputFormat: Total input paths to
process
: 2
08/10/27 15:35:29 INFO mapred.FileInputFormat: Total input paths to
process
: 2
08/10/27 15:35:29 INFO mapred.JobClient: Running job:
job_200810271532_0004
08/10/27 15:35:30 INFO mapred.JobClient: map 0% reduce 0%
08/10/27 15:35:37 INFO mapred.JobClient: map 100% reduce 0%
08/10/27 15:35:45 INFO mapred.JobClient: Task Id :
attempt_200810271532_0004_r_000000_0, Status : FAILED
java.io.IOException: attempt_200810271532_0004_r_000000_0The
reduce copier
failed
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
2207)
The failed attempts logs contain this:
008-10-27 15:35:40,133 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 2524 bytes (2524 raw bytes) into RAM from
attempt_200810271532_0004_m_000000_0
2008-10-27 15:35:40,134 INFO org.apache.hadoop.mapred.ReduceTask:
Read
2524 bytes from map-output for attempt_200810271532_0004_m_000000_0
2008-10-27 15:35:40,134 INFO org.apache.hadoop.mapred.ReduceTask:
Rec
#1 from attempt_200810271532_0004_m_000000_0 -> (1358, 1158) from
phil
2008-10-27 15:35:41,110 INFO org.apache.hadoop.mapred.ReduceTask:
Closed ram manager
2008-10-27 15:35:41,125 INFO org.apache.hadoop.mapred.ReduceTask:
Interleaved on-disk merge complete: 0 files left.
2008-10-27 15:35:41,173 INFO org.apache.hadoop.mapred.ReduceTask:
Initiating in-memory merge with 2 segments...
2008-10-27 15:35:41,177 INFO org.apache.hadoop.mapred.Merger:
Merging
2 sorted segments
2008-10-27 15:35:41,178 INFO org.apache.hadoop.mapred.Merger: Down
to
the last merge-pass, with 2 segments left of total size: 5011 bytes
2008-10-27 15:35:41,197 WARN org.apache.hadoop.mapred.ReduceTask:
attempt_200810271532_0004_r_000000_0 Merge of the inmemory files
threw
an exception: java.io.IOException: Intermedate merge failed
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$InMemFSMergeThread.run(ReduceTask.java:2078)
Caused by: java.lang.NumberFormatException: For input string: "["
at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:
1224)
at java.lang.Double.parseDouble(Double.java:510)
at
org.apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:
60)
at
org
.apache
.mahout.matrix.AbstractVector.decodeVector(AbstractVector.java:256)
at
org
.apache
.mahout
.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:38)
at
org
.apache
.mahout
.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31)
at
org.apache.hadoop.mapred.ReduceTask
$ReduceCopier.combineAndSpill(ReduceTask.java:2174)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access
$3100(ReduceTask.java:341)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134)
... 1 more
2008-10-27 15:35:41,197 INFO org.apache.hadoop.mapred.ReduceTask:
In-memory merge complete: 0 files left.
2008-10-27 15:35:41,198 WARN org.apache.hadoop.mapred.TaskTracker:
Error running child
java.io.IOException: attempt_200810271532_0004_r_000000_0The reduce
copier failed
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:
255)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
2207)
However, I can run the org.apache.mahout.clustering.kmeans unit
tests
without problems.
I truly do not understand where the problems lies.
Thanks for the help.
On Sun, Oct 26, 2008 at 8:24 PM, Grant Ingersoll
<[EMAIL PROTECTED]
wrote:
Same Mahout code, though, right?
Can you provide details on how you were running it?
On Oct 26, 2008, at 10:46 AM, Philippe Lamarche wrote:
Unfortunately, I went straight from 0.17.2 to 0.18.1. It was
working on
0.17.2.
On Sun, Oct 26, 2008 at 9:48 AM, Grant Ingersoll <[EMAIL PROTECTED]
wrote:
Did this work with 0.18.0 or other prior versions for you?
On Oct 25, 2008, at 7:23 PM, Philippe Lamarche wrote:
Hi,
I just updated to hadoop 0.18.1 and got a clean version of
mahout from
svn.
However, I am having problems with KMeans, that can be traced
down to
:
2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger:
Merging
2 sorted segments
2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger:
Down to
the last merge-pass, with 2 segments left of total size: 5011
bytes
2008-10-25 19:10:16,999 WARN
org.apache.hadoop.mapred.ReduceTask:
attempt_200810251826_0013_r_000000_0 Merge of the inmemory
files threw
an exception: java.io.IOException: Intermedate merge failed
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$InMemFSMergeThread.run(ReduceTask.java:2078)
Caused by: java.lang.NumberFormatException: For input string:
"["
at
sun
.misc
.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224)
at java.lang.Double.parseDouble(Double.java:510)
at
org
.apache
.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60)
at
org
.apache
.mahout.matrix.AbstractVector.decodeVector(AbstractVector.java:
256)
at
org
.apache
.mahout
.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:38)
at
org
.apache
.mahout
.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31)
at
org.apache.hadoop.mapred.ReduceTask
$ReduceCopier.combineAndSpill(ReduceTask.java:2174)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access
$3100(ReduceTask.java:341)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134)
... 1 more
2008-10-25 19:10:16,999 INFO
org.apache.hadoop.mapred.ReduceTask:
In-memory merge complete: 0 files left.
2008-10-25 19:10:17,000 WARN
org.apache.hadoop.mapred.TaskTracker:
Error running child
java.io.IOException: attempt_200810251826_0013_r_000000_0The
reduce
copier failed
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:
255)
at
org.apache.hadoop.mapred.TaskTracker
$Child.main(TaskTracker.java:2207)
This is while running the synthetic_control.data example, but
I have
the
same problems with any other input data.
I am able to do other map-reduce job without problems.
Here is the output of the jar task:
[EMAIL PROTECTED]:/usr/local/hadoop$ bin/hadoop jar
/home/philippe/workspace/MahoutJava/examples/dist/apache-
mahout-examples-0.1-dev.jar
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
08/10/25 19:09:27 WARN mapred.JobClient: Use
GenericOptionsParser for
parsing the arguments. Applications should implement Tool for
the
same.
08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input
paths to
process
: 1
08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input
paths to
process
: 1
08/10/25 19:09:28 INFO mapred.JobClient: Running job:
job_200810251826_0010
08/10/25 19:09:29 INFO mapred.JobClient: map 0% reduce 0%
08/10/25 19:09:31 INFO mapred.JobClient: map 50% reduce 0%
08/10/25 19:09:32 INFO mapred.JobClient: Job complete:
job_200810251826_0010
08/10/25 19:09:32 INFO mapred.JobClient: Counters: 7
08/10/25 19:09:32 INFO mapred.JobClient: File Systems
08/10/25 19:09:32 INFO mapred.JobClient: HDFS bytes
read=291644
08/10/25 19:09:32 INFO mapred.JobClient: HDFS bytes
written=323660
08/10/25 19:09:32 INFO mapred.JobClient: Job Counters
08/10/25 19:09:32 INFO mapred.JobClient: Launched map
tasks=2
08/10/25 19:09:32 INFO mapred.JobClient: Data-local map
tasks=2
08/10/25 19:09:32 INFO mapred.JobClient: Map-Reduce Framework
08/10/25 19:09:32 INFO mapred.JobClient: Map input
records=600
08/10/25 19:09:32 INFO mapred.JobClient: Map input
bytes=288374
08/10/25 19:09:32 INFO mapred.JobClient: Map output
records=600
08/10/25 19:09:32 WARN mapred.JobClient: Use
GenericOptionsParser for
parsing the arguments. Applications should implement Tool for
the
same.
08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input
paths to
process
: 2
08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input
paths to
process
: 2
08/10/25 19:09:32 INFO mapred.JobClient: Running job:
job_200810251826_0011
08/10/25 19:09:33 INFO mapred.JobClient: map 0% reduce 0%
08/10/25 19:09:37 INFO mapred.JobClient: map 50% reduce 0%
08/10/25 19:09:39 INFO mapred.JobClient: map 100% reduce 0%
08/10/25 19:09:44 INFO mapred.JobClient: map 100% reduce 16%
08/10/25 19:09:52 INFO mapred.JobClient: Job complete:
job_200810251826_0011
08/10/25 19:09:52 INFO mapred.JobClient: Counters: 16
08/10/25 19:09:52 INFO mapred.JobClient: File Systems
08/10/25 19:09:52 INFO mapred.JobClient: HDFS bytes
read=323660
08/10/25 19:09:52 INFO mapred.JobClient: HDFS bytes
written=1447
08/10/25 19:09:52 INFO mapred.JobClient: Local bytes
read=1389
08/10/25 19:09:52 INFO mapred.JobClient: Local bytes
written=37878
08/10/25 19:09:52 INFO mapred.JobClient: Job Counters
08/10/25 19:09:52 INFO mapred.JobClient: Launched reduce
tasks=1
08/10/25 19:09:52 INFO mapred.JobClient: Launched map
tasks=2
08/10/25 19:09:52 INFO mapred.JobClient: Data-local map
tasks=2
08/10/25 19:09:52 INFO mapred.JobClient: Map-Reduce Framework
08/10/25 19:09:52 INFO mapred.JobClient: Reduce input
groups=1
08/10/25 19:09:52 INFO mapred.JobClient: Combine output
records=29
08/10/25 19:09:52 INFO mapred.JobClient: Map input
records=600
08/10/25 19:09:52 INFO mapred.JobClient: Reduce output
records=1
08/10/25 19:09:52 INFO mapred.JobClient: Map output
bytes=943020
08/10/25 19:09:52 INFO mapred.JobClient: Map input
bytes=323660
08/10/25 19:09:52 INFO mapred.JobClient: Combine input
records=1760
08/10/25 19:09:52 INFO mapred.JobClient: Map output
records=1732
08/10/25 19:09:52 INFO mapred.JobClient: Reduce input
records=1
08/10/25 19:09:53 WARN mapred.JobClient: Use
GenericOptionsParser for
parsing the arguments. Applications should implement Tool for
the
same.
08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input
paths to
process
: 2
08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input
paths to
process
: 2
08/10/25 19:09:53 INFO mapred.JobClient: Running job:
job_200810251826_0012
08/10/25 19:09:54 INFO mapred.JobClient: map 0% reduce 0%
08/10/25 19:09:56 INFO mapred.JobClient: map 50% reduce 0%
08/10/25 19:09:58 INFO mapred.JobClient: map 100% reduce 0%
08/10/25 19:10:02 INFO mapred.JobClient: Job complete:
job_200810251826_0012
08/10/25 19:10:02 INFO mapred.JobClient: Counters: 16
08/10/25 19:10:02 INFO mapred.JobClient: File Systems
08/10/25 19:10:02 INFO mapred.JobClient: HDFS bytes
read=326554
08/10/25 19:10:02 INFO mapred.JobClient: HDFS bytes
written=1137260
08/10/25 19:10:02 INFO mapred.JobClient: Local bytes
read=1147358
08/10/25 19:10:02 INFO mapred.JobClient: Local bytes
written=2304490
08/10/25 19:10:02 INFO mapred.JobClient: Job Counters
08/10/25 19:10:02 INFO mapred.JobClient: Launched reduce
tasks=1
08/10/25 19:10:02 INFO mapred.JobClient: Launched map
tasks=2
08/10/25 19:10:02 INFO mapred.JobClient: Data-local map
tasks=2
08/10/25 19:10:02 INFO mapred.JobClient: Map-Reduce Framework
08/10/25 19:10:02 INFO mapred.JobClient: Reduce input
groups=1
08/10/25 19:10:02 INFO mapred.JobClient: Combine output
records=0
08/10/25 19:10:02 INFO mapred.JobClient: Map input
records=600
08/10/25 19:10:02 INFO mapred.JobClient: Reduce output
records=600
08/10/25 19:10:02 INFO mapred.JobClient: Map output
bytes=1139660
08/10/25 19:10:02 INFO mapred.JobClient: Map input
bytes=323660
08/10/25 19:10:02 INFO mapred.JobClient: Combine input
records=0
08/10/25 19:10:02 INFO mapred.JobClient: Map output
records=600
08/10/25 19:10:02 INFO mapred.JobClient: Reduce input
records=600
08/10/25 19:10:02 INFO kmeans.KMeansDriver: Iteration 0
08/10/25 19:10:02 WARN mapred.JobClient: Use
GenericOptionsParser for
parsing the arguments. Applications should implement Tool for
the
same.
08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input
paths to
process
: 2
08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input
paths to
process
: 2
08/10/25 19:10:03 INFO mapred.JobClient: Running job:
job_200810251826_0013
08/10/25 19:10:04 INFO mapred.JobClient: map 0% reduce 0%
08/10/25 19:10:08 INFO mapred.JobClient: map 50% reduce 0%
08/10/25 19:10:09 INFO mapred.JobClient: map 100% reduce 0%
08/10/25 19:10:21 INFO mapred.JobClient: Task Id :
attempt_200810251826_0013_r_000000_0, Status : FAILED
java.io.IOException: attempt_200810251826_0013_r_000000_0The
reduce
copier
failed
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
at
org.apache.hadoop.mapred.TaskTracker
$Child.main(TaskTracker.java:2207)
I am not sure if I am doing something wrong here.
Thanks for the help,
Philippe.
--------------------------
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
Orleans.
http://www.lucenebootcamp.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
--------------------------
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ