Hi,

I want to make experiments with wordcount example in a different way.

Suppose we have very large data. Instead of splitting all the data one time,
we want to feed some splits in the map-reduce job at a time. I want to model
the hadoop job like this,

Suppose a batch of inputsplits arrive in the beginning to every map, and
reduce gives the word, frequency for this batch of inputsplits.
Now after this another batch of inputsplits arrive and the results from
subsequent reduce are aggregated to the previous results(if the word "that"
has frequency 2 in previous processing and in this processing it occurs 1
time, then the frequency of "that" is now maintained as 3).
In next map-reduce "that" comes 4 times, now its frequency maintained as
7....

And this process goes on like this.
Now how would I model inputsplits like this and how these continuous
map-reduces can be made running. In what way should I keep the results of
Map-Reduces so that I could aggregate this with the output of next
Map-reduce.

Thanks,
Aayush

Reply via email to