Ideally what you would want is your data to be on HDFS and run your map/reduce jobs on that data. Hadoop framework splits you data and feeds in those splits to each map or reduce task. One problem with Image files is that you will not be able to split them. Alternatively people have done this, they wrap Image files within xml and create huge files which has multiple image files in them. Hadoop offers something called streaming using which you will be able to split the files at xml boundry and feed it to your map/reduce tasks. Streaming also enables you to use any code like perl/php/c++. Check info about streaming here http://hadoop.apache.org/core/docs/r0.17.0/streaming.html And information about parsing XML files in streaming in here http://hadoop.apache.org/core/docs/r0.17.0/streaming.html#How+do+I+parse+XML+documents+using+streaming%3F
Thanks, Lohit ----- Original Message ---- From: Chanchal James <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Thursday, June 12, 2008 9:42:46 AM Subject: Question about Hadoop Hi, I have a question about Hadoop. I am a beginner and just testing Hadoop. Would like to know how a php application would benefit from this, say an application that needs to work on a large number of image files. Do I have to store the application in HDFS always, or do I just copy it to HDFS when needed, do the processing, and then copy it back to the local file system ? Is that the case with the data files too ? Once I have Hadoop running, do I keep all data & application files in HDFS always, and not use local file system storage ? Thank you.