Hi All,
   My Hadoop streaming job (in Python) runs to "completion" (both map and
reduce says 100% complete). But, when I look at the output directory in
HDFS, the part files are empty. I do not know what might be causing this
behavior. I understand that the percentages represent the records that have
been read in (not processed).

The following are some of the logs. The detailed logs from Cloudera Manager
says that there were no Map Outputs...which is interesting. Any suggestions?

12/08/30 03:27:14 INFO streaming.StreamJob: To kill this job, run:
12/08/30 03:27:14 INFO streaming.StreamJob: /usr/lib/hadoop-0.20/bin/hadoop
job  -Dmapred.job.tracker=xxxxx.yyy.com:8021 -kill job_201208232245_3182
12/08/30 03:27:14 INFO streaming.StreamJob: Tracking URL:
12/08/30 03:27:15 INFO streaming.StreamJob:  map 0%  reduce 0%
12/08/30 03:27:20 INFO streaming.StreamJob:  map 33%  reduce 0%
12/08/30 03:27:23 INFO streaming.StreamJob:  map 67%  reduce 0%
12/08/30 03:27:29 INFO streaming.StreamJob:  map 100%  reduce 0%
12/08/30 03:27:33 INFO streaming.StreamJob:  map 100%  reduce 100%
12/08/30 03:27:35 INFO streaming.StreamJob: Job complete:
12/08/30 03:27:35 INFO streaming.StreamJob: Output: /user/GHU
Thu Aug 30 03:27:24 GMT 2012
*** END
bash-3.2$ hadoop fs -ls /user/ghu/
Found 5 items
-rw-r--r--   3 ghu hadoop          0 2012-08-30 03:27 /user/GHU/_SUCCESS
drwxrwxrwx   - ghu hadoop          0 2012-08-30 03:27 /user/GHU/_logs
-rw-r--r--   3 ghu hadoop          0 2012-08-30 03:27 /user/GHU/part-00000
-rw-r--r--   3 ghu hadoop          0 2012-08-30 03:27 /user/GHU/part-00001
-rw-r--r--   3 ghu hadoop          0 2012-08-30 03:27 /user/GHU/part-00002

Metadata Status Succeeded  Type MapReduce  Id job_201208232245_3182
Name CaidMatch
 User srisrini  Mapper class PipeMapper  Reducer class
 Scheduler pool name default  Job input directory
hdfs://xxxxx.yyy.txt,hdfs://xxxx.yyyy.com/user/GHUcaidlist.txt  Job output
directory hdfs://xxxx.yyyy.com/user/GHU/  Timing
Duration 20.977s  Submit time Wed, 29 Aug 2012 08:27 PM  Start time Wed, 29
Aug 2012 08:27 PM  Finish time Wed, 29 Aug 2012 08:27 PM

 Progress and Scheduling Map Progress
 Reduce Progress
 Launched maps 4  Data-local maps 3  Rack-local maps 1  Other local maps
 Desired maps 3  Launched reducers
 Desired reducers 0  Fairscheduler running tasks
 Fairscheduler minimum share
 Fairscheduler demand
 Current Resource Usage Current User CPUs 0  Current System CPUs 0  Resident
memory 0 B  Running maps 0  Running reducers 0  Aggregate Resource Usage
and Counters User CPU 0s  System CPU 0s  Map Slot Time 12.135s  Reduce slot
time 0s  Cumulative disk reads
 Cumulative disk writes 155.0 KiB  Cumulative HDFS reads 3.6 KiB  Cumulative
HDFS writes
 Map input bytes 2.5 KiB  Map input records 45  Map output records 0  Reducer
input groups
 Reducer input records
 Reducer output records
 Reducer shuffle bytes
 Spilled records

Reply via email to