Hi, I am very new to the MapReduce paradigm so this could be a dumb question.
What do you do if your mapper functions need to know more than just the data being processed in order to do their job? The simplest example I can think of is implementing a selective, phrase-based version of wordcount. Imagine you want to count the occurrences of all notable names (from the notable names database) in a large collection of news stories. You can't just count phrases - the number of potential word combinations is ridiculously large, and the vast majority are irrelevant. You have a limited (large, but bounded) vocabulary of phrases you are interested in--this list of names. You want each mapper to be aware of it, and only count the relevant phrases. You basically want to give each mapper read-only access to a HashSet of phrases as well as the documents they should be counting over. How would you do that? Cheers, Dave -- View this message in context: http://old.nabble.com/Context-needed-by-mapper-tp28532164p28532164.html Sent from the Hadoop core-user mailing list archive at Nabble.com.