I think the previous reply wasn't very accurate. So you need a count per-file? One way I can think of doing that, via the job itself, is to use Counter to count the "name of the output + the task's ID". But it would not be a good solution if there are several hundreds of tasks.
A distributed count can be performed on a single file, however, using an identity mapper + null output and then looking at map-input-records counter after completion. On Tue, Mar 8, 2011 at 3:54 PM, Harsh J <qwertyman...@gmail.com> wrote: > Count them as you sink using the Counters functionality of Hadoop > Map/Reduce (If you're using MultipleOutputs, it has a way to enable > counters for each name used). You can then aggregate related counters > post-job, if needed. > > On Tue, Mar 8, 2011 at 3:11 PM, Jun Young Kim <juneng...@gmail.com> wrote: >> Hi. >> >> my hadoop application generated several output files by a single job. >> (for example, A, B, C are generated as a result) >> >> after finishing a job, I want to count each files' row counts. >> >> is there any way to count each files? >> >> thanks. >> >> -- >> Junyoung Kim (juneng...@gmail.com) >> >> > > > > -- > Harsh J > www.harshj.com > -- Harsh J www.harshj.com