I've used the Mahout XMLInputFormat. It is the right tool if you have an XML file with one type of section repeated over and over again and want to turn that into Sequence file where each repeated section is a value. I've found it helpful as a preprocessing step for converting raw XML input into something that can be handled by Hadoop jobs.
If you're worried about having lots of small files--specifically, about overwhelming your namenode because you have too many small files--the XMLInputFormat won't help with that. However, it may be possible to concatenate the small files into larger files, then have a Hadoop job that uses XMLInputFormat transform the large files into sequence files.