Hello,

I was wondering if Hadoop provides thread safe shared variables that can be
accessed from individual mappers/reducers along with a proper locking
mechanism. To clarify things, let's say that in the word count example, I
want to know the word that has the highest frequency and how many times it
occured. I believe that the latter can be done using the counters that come
with the Hadoop framework but I don't know how to get the word itself as a
String. Of course, the problem can be more complicated like the top 100
words or so.

I thought of writing a serial program which can go over the final output of
the word count but this wouldn't be a good idea if the output file gets too
large. However, if there is a way to define and use shared variables, this
would be really easy to do on the fly during the word count's reduce phase.

Thanks,
Jim

Reply via email to