Change streaming code to use new mapreduce api. -----------------------------------------------
Key: MAPREDUCE-3619 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3619 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming Affects Versions: 0.23.1 Reporter: Liyin Liang If we run a streaming job with following python script as mapper or reducer, the job will throws NullPointerException. {code:} #!/usr/bin/python import sys,os class MyTask: def __init__(self, file=sys.stdin): self.file = file print >>sys.stderr, "reporter:counter:spam,disp_flag_record,0" print >>sys.stderr, "reporter:counter:spam,spam_record,0" def process(self): while True: line = self.file.readline() if not line: break; print line if __name__ == "__main__": task = MyTask() task.process() {code} Here is the NPE related log: 2011-12-22 14:14:06,310 WARN org.apache.hadoop.streaming.PipeMapRed: java.lang.NullPointerException at org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.incrCounter(PipeMapRed.java:502) at org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.run(PipeMapRed.java:444) This is because the above script's "print >>sys.stderr" will invoke reporter.incrCounter() during PipeMapper|PipeReducer.configure(). While we can not get reporter in configure() function. To fix this problem, we should change streaming code to use new-api. Then we can call context.getCounter() in Mapper|Reducer.setup() function. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira