OK, that was a dumb question, sorry. If I had worked to the end of the
tutorial instead of immediately trying to solve my problem I would have
found out the DistributedCache.


DNMILNE wrote:
> 
> Hi,
> 
> I am very new to the MapReduce paradigm so this could be a dumb question. 
> 
> What do you do if your mapper functions need to know more than just the
> data being processed in order to do their job? The simplest example I can
> think of is implementing a selective, phrase-based version of wordcount. 
> 
> Imagine you want to count the occurrences of all notable names (from the
> notable names database) in a large collection of news stories. You can't
> just count phrases - the number of potential word combinations is
> ridiculously large, and the vast majority are irrelevant. 
> 
> You have a limited (large, but bounded) vocabulary of phrases you are
> interested in--this list of names. You want each mapper to be aware of it,
> and only count the relevant phrases. You basically want to give each
> mapper read-only access to a HashSet of phrases as well as the documents
> they should be counting over. How would you do that?
> 
> Cheers, 
> Dave
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Context-needed-by-mapper-tp28532164p28542926.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Reply via email to