Hi,

I'm trying to figure out how to implement per-job counters. Google's paper on map-reduce mentions that their API allows individual tasks to update global counters, defined for each job, and then easily retrieve them when the job is completed.

Example: process some records in a map-reduce job (with many map and reduce taks), and at the end of the job emit the total count of processed records for the whole job (or any other programmer-defined count aggregated during processing).

I was looking at the metrics API, but it's not obvious to me if it's useful in this case ... if so, how should I go about it?

I could probably implement extended OutputFormat-s that write down these counters per each task to a separate output file, and then read them at the end of the job, but this seems awfully intrusive and complex for such a simple functionality...

I'd appreciate any suggestions.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to