Hi Manoj If you are looking at a scheduler and a work flow manager to carry out this task you can have a look at oozie.
If your xml files are smaller(smaller than hdfs block size) then definitely it is a better practice to combine them to form larger files. Combining into Sequence Files should be good. Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: Manoj Babu <[email protected]> Date: Fri, 13 Jul 2012 08:59:51 To: <[email protected]> Reply-To: [email protected] Subject: suggest Best way to upload xml files to HDFS Hi, I need to upload large xml files files daily. Right now am having a small program to read all the files from local folder and writing it to HDFS as a single file. Is this a right way? If there any best practices or optimized way to achieve this Kindly let me know. Thanks in advance! Cheers! Manoj.
