Thank you, Kai and Joey, for the explanation. That's what I thought about them, but did not want to miss the "magical" replacement for a central services in the counters. No, there is no magic, just great reality.
Mark On Fri, May 20, 2011 at 12:39 PM, Kai Voigt <k...@123.org> wrote: > Also, with speculative execution enabled, you might see a higher count as > you expect while the same task is running multiple times in parallel. When a > task gets killed because another instance was quicker, those counters will > be removed from the global count though. > > Kai > > Am 20.05.2011 um 19:34 schrieb Joey Echeverria: > > > Counters are a way to get status from your running job. They don't > > increment a global state. They locally save increments and > > periodically report those increments to the central counter. That > > means that the final count will be correct, but you can't use them to > > coordinate counts while your job is running. > > > > -Joey > > > > On Fri, May 20, 2011 at 10:17 AM, Mark Kerzner <markkerz...@gmail.com> > wrote: > >> Joey, > >> > >> You understood me perfectly well. I see your first advice, but I am not > >> allowed to have gaps. A central service is something I may consider if > >> single reducer becomes a worse bottleneck than it. > >> > >> But what are counters for? They seem to be exactly that. > >> > >> Mark > >> > >> On Fri, May 20, 2011 at 12:01 PM, Joey Echeverria <j...@cloudera.com> > wrote: > >> > >>> To make sure I understand you correctly, you need a globally unique > >>> one up counter for each output record? > >>> > >>> If you had an upper bound on the number of records a single reducer > >>> could output and you can afford to have gaps, you could just use the > >>> task id and multiply that by the max number of records and then one up > >>> from there. > >>> > >>> If that doesn't work for you, then you'll need to use some kind of > >>> central service for allocating numbers which could become a > >>> bottleneck. > >>> > >>> -Joey > >>> > >>> On Fri, May 20, 2011 at 9:55 AM, Mark Kerzner <markkerz...@gmail.com> > >>> wrote: > >>>> Hi, can I use a Counter to give each record in all reducers a > consecutive > >>>> number? Currently I am using a single Reducer, but it is an > anti-pattern. > >>>> But I need to assign consecutive numbers to all output records in all > >>>> reducers, and it does not matter how, as long as each gets its own > >>> number. > >>>> > >>>> If it IS possible, then how are multiple processes accessing those > >>> counters > >>>> without creating race conditions. > >>>> > >>>> Thank you, > >>>> > >>>> Mark > >>>> > >>> > >>> > >>> > >>> -- > >>> Joseph Echeverria > >>> Cloudera, Inc. > >>> 443.305.9434 > >>> > >> > > > > > > > > -- > > Joseph Echeverria > > Cloudera, Inc. > > 443.305.9434 > > > > -- > Kai Voigt > k...@123.org > > > > >