Hi all, I am a day one newbie investigating distributed work for the first time...
I have run through the tutorials with ease (thanks for the nice documentation) and now have written my first map reduce. Is it accurate to say that the reduce is repetitively called by the Hadoop framework until the number of inputs = number of outputs? I am only running in single server mode at the moment but I have map outputs: Football UK Football UK Rugby UK American Football USA Rugby FR Football FR And reduce outputs: Football UK, FR Rugby UK, FR American Football USA This worked fine. But when I tried to include the counts in the output, I got some strange results: Football UK(2), FR(1)(1) Rugby UK(1), FR(1)(1) American Football USA(1)(1) I think it was because I was just doing String manipulation in the reducer to produce the counts. I presume then I need to not use the Text type and actually define a Type for the Country+Count? Thanks, Tim