Hi,
I'm trying to figure out how to implement per-job counters. Google's
paper on map-reduce mentions that their API allows individual tasks to
update global counters, defined for each job, and then easily retrieve
them when the job is completed.
Example: process some records in a map-reduce job (with many map and
reduce taks), and at the end of the job emit the total count of
processed records for the whole job (or any other programmer-defined
count aggregated during processing).
I was looking at the metrics API, but it's not obvious to me if it's
useful in this case ... if so, how should I go about it?
I could probably implement extended OutputFormat-s that write down these
counters per each task to a separate output file, and then read them at
the end of the job, but this seems awfully intrusive and complex for
such a simple functionality...
I'd appreciate any suggestions.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com