Mike,

I think we may have just miss-reading the output of the jobhistory page.
 It has columns for map /reduce and total and the map and reduce columns
were empty so it appears that the data was being read from a local tablet
server in the mappers, but the total number seems to line up more with what
we expect.

Thanks for looking into this though.

-Vincent

On Thu, May 5, 2022 at 11:17 AM Mike Miller <mmil...@apache.org> wrote:

> Are you configuring MultipleOutputs to have Accumulo Key Value objects? How
> are you ingesting the Data into Accumulo?
>
> public class MultipleOutputs
> <
> https://hadoop.apache.org/docs/r2.6.3/api/src-html/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#line.175
> ><KEYOUT,VALUEOUT>
>
>
> On Thu, May 5, 2022 at 9:46 AM Vincent Russell <vincent.russ...@gmail.com>
> wrote:
>
> > Thank you for the reply mike.
> >
> > These are the counters that show up in the job history server for
> example.
> > For instance in the accumulo docs:
> >
> > $ bin/tool.sh lib/accumulo-examples-simple.jar
> > org.apache.accumulo.examples.simple.mapreduce.WordCount -i instance -z
> > zookeepers  --input /user/username/wc -t wordCount -u username -p
> > password
> >
> > 11/02/07 18:20:11 INFO input.FileInputFormat: Total input paths to
> process
> > : 1
> > 11/02/07 18:20:12 INFO mapred.JobClient: Running job:
> job_201102071740_0003
> > 11/02/07 18:20:13 INFO mapred.JobClient:  map 0% reduce 0%
> > 11/02/07 18:20:20 INFO mapred.JobClient:  map 100% reduce 0%
> > 11/02/07 18:20:22 INFO mapred.JobClient: Job complete:
> > job_201102071740_0003
> > 11/02/07 18:20:22 INFO mapred.JobClient: Counters: 6
> > 11/02/07 18:20:22 INFO mapred.JobClient:   Job Counters
> > 11/02/07 18:20:22 INFO mapred.JobClient:     Launched map tasks=1
> > 11/02/07 18:20:22 INFO mapred.JobClient:     Data-local map tasks=1
> > 11/02/07 18:20:22 INFO mapred.JobClient:   FileSystemCounters
> > 11/02/07 18:20:22 INFO mapred.JobClient:     HDFS_BYTES_READ=10487
> > 11/02/07 18:20:22 INFO mapred.JobClient:   Map-Reduce Framework
> > 11/02/07 18:20:22 INFO mapred.JobClient:     Map input records=255
> > 11/02/07 18:20:22 INFO mapred.JobClient:     Spilled Records=0
> > 11/02/07 18:20:22 INFO mapred.JobClient:     Map output records=1452
> >
> >
> > The outputformat is a MultipleOutputs that write to disk.
> >
> > I guess my question is should we care about these counters?  Do they mean
> > anything with accumulo?  It seems to suggest that mappers are not running
> > local with the tservers.
> >
> > Thanks,
> > Vincent
> >
> > On Thu, May 5, 2022 at 9:03 AM Mike Miller <mmil...@apache.org> wrote:
> >
> > > What do you mean by data local or rack-local map task? You don't have
> any
> > > mappers running? What is your output format? What configuration are you
> > > setting in the Job?
> > >
> > > On Wed, May 4, 2022 at 8:44 PM Vincent Russell <
> > vincent.russ...@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I am using map reduce with accumulo 2.0.1 and hadoop 3.3.1.  After
> our
> > > map
> > > > reduce jobs complete we take a look at the counters and the
> > > Data-local-map
> > > > tasks and rack-local map tasks are both equal to 0.   Do we probably
> > have
> > > > something misconfigured or is this expected?  Or can this count not
> be
> > > > configured properly with the AccumuloInputFormat?
> > > >
> > > > Thank you,
> > > > Vincent
> > > >
> > >
> >
>

Reply via email to