On 10/26/2010 01:07 PM, Paweł Łoziński wrote: > 2010/10/26 Johannes.Lichtenberger <[email protected]>: >> On 10/26/2010 07:39 AM, Paweł Łoziński wrote: >>> Hi, >>> >>> the framework doesn't give you the first/last information about reduce >>> job you perform in your reducer. Just as the mapper doesn't give you >>> information whether the (key, value) pair passed to map function is >>> first/last for a given key. However you can workaround this by adding >>> special values to your data, e.g. <page><id>0</id>... and >>> <page><id>Long.MAX_VALUE</id>.... When you encounter those in your >>> reducer, you know you are at the beginning/end of your data and you >>> can emit <root> and </root>. >> >> This wouldn't work, since it might as well be possible that the last >> value isn't Long.MAX_VALUE. >> > > The idea is to choose such a special-value, that the last value in > your data will be definitely smaller. In case of 64bit numerical > values this would be Long.MAX_VALUE, generally speaking - the last > value in the possible range of values (or better: the last value +1). > Then you can be sure the reducer will process it as the last value, > and emit </root> to your output. Of course, if you have multiple > reducers, the closing tag will appear only in the output of one of > them.
Hm, that's valuable, but I think I leave the ID's unchanged and add the tags afterwards, even if it needs two more I/O operations (read/write), but that carries no weight. regards, johannes
