Hi, I have a task to process large quantities of files by converting them into other formats. Each file is processed as a whole and converted to a target format. Since there are 100's of GB of data I thought it suitable for Hadoop, but the problem is, I don't think the files can be broken apart and processed. For example, how would mapreduce work to convert a Word Document to PDF if the file is reduced to blocks? I'm not sure that's possible, or is it?
thanks for any advice Darren