Re: How to count rows of output files ?

Harsh J Tue, 08 Mar 2011 02:30:39 -0800

I think the previous reply wasn't very accurate. So you need a count
per-file? One way I can think of doing that, via the job itself, is to
use Counter to count the "name of the output + the task's ID". But it
would not be a good solution if there are several hundreds of tasks.


A distributed count can be performed on a single file, however, using
an identity mapper + null output and then looking at map-input-records
counter after completion.

On Tue, Mar 8, 2011 at 3:54 PM, Harsh J <qwertyman...@gmail.com> wrote:
> Count them as you sink using the Counters functionality of Hadoop
> Map/Reduce (If you're using MultipleOutputs, it has a way to enable
> counters for each name used). You can then aggregate related counters
> post-job, if needed.
>
> On Tue, Mar 8, 2011 at 3:11 PM, Jun Young Kim <juneng...@gmail.com> wrote:
>> Hi.
>>
>> my hadoop application generated several output files by a single job.
>> (for example, A, B, C are generated as a result)
>>
>> after finishing a job, I want to count each files' row counts.
>>
>> is there any way to count each files?
>>
>> thanks.
>>
>> --
>> Junyoung Kim (juneng...@gmail.com)
>>
>>
>
>
>
> --
> Harsh J
> www.harshj.com
>



-- 
Harsh J
www.harshj.com

Re: How to count rows of output files ?

Reply via email to