problem with IdentityMapper

Mike Forrest Thu, 10 Jan 2008 14:51:36 -0800

Hi,

I'm running into a problem where IdentityMapper seems to produce way toomuch data. For example, I have a job that reads a sequence file usingIdentityMapper and then uses IdentityReducer to write everything backout to another sequence file. My input is a ~60MB sequence file andafter the map phase has completed, the job tracker UI reports about 10GBfor "Map output bytes". It seems like the output collector does not getproperly reset and so each map that gets emitted has the correct key butthe value ends up being all the data you've encountered up to thatpoint. I think this is a known issue but I can't seem to find anydiscussion about it right now. Has anyone else run into this, and ifso, is there a solution? I'm using the latest code in the 0.15 branch.

Thanks
Mike

problem with IdentityMapper

Reply via email to