Hi, esteemed group, how would I form Maps in MapReduce to recursevely look at every file in a directory, and do something to this file, such as produce a PDF or compute its hash?
For that matter, Google builds its index using MapReduce, or so the papers say. First the crawlers store all the files. Are their papers on the architecture of this? How and where do the crawlers store the downloaded files? Thank you, Mark