Mark, thanks for the pointer. So as far as I understand you are not using hadoop's default split but using your own split of one record as specified by the everything between the starting tag and the end tag in your xml? So in a way you have one map per record? In my case this will not be efficient since my xml files are small. What I would want to do is to have a split that includes multiple files so that I can use one map for around 64meg of data. And do the parsing inside map. I will update you once it makes more sense to even me.
-- Vipul Sharma sharmavipul AT gmail DOT com