I have a problem at hand that seems to need "local" reducing: 


I have a large data input, in which each line is a data mapping, something like 
"name : attribute". The attributes for the same name are usually pretty close 
in the file, so they are very likely to be processed by the same mapper. I need 
to persist the "name:attributes" somewhere else (think DB). It'll be optimal if 
I can combine the attributes of the same name together and only persist them 
once. Attributes for the same name from different mappers can be safely 
persisted separately. 

I don't want to use reducers due to the network traffic. What I need is exactly 
what a combiner does, but as far as I can tell, combiners are not guaranteed to 
run or run only once (Correct me if I'm wrong here), so I guess I am not 
supposed to implement the persistence in the combiner. 

Anybody has got a similar problem before? What's your solution? 


Appreciate your help. 


Thanks,
James

Reply via email to