Mike, I think we may have just miss-reading the output of the jobhistory page. It has columns for map /reduce and total and the map and reduce columns were empty so it appears that the data was being read from a local tablet server in the mappers, but the total number seems to line up more with what we expect.
Thanks for looking into this though. -Vincent On Thu, May 5, 2022 at 11:17 AM Mike Miller <mmil...@apache.org> wrote: > Are you configuring MultipleOutputs to have Accumulo Key Value objects? How > are you ingesting the Data into Accumulo? > > public class MultipleOutputs > < > https://hadoop.apache.org/docs/r2.6.3/api/src-html/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#line.175 > ><KEYOUT,VALUEOUT> > > > On Thu, May 5, 2022 at 9:46 AM Vincent Russell <vincent.russ...@gmail.com> > wrote: > > > Thank you for the reply mike. > > > > These are the counters that show up in the job history server for > example. > > For instance in the accumulo docs: > > > > $ bin/tool.sh lib/accumulo-examples-simple.jar > > org.apache.accumulo.examples.simple.mapreduce.WordCount -i instance -z > > zookeepers --input /user/username/wc -t wordCount -u username -p > > password > > > > 11/02/07 18:20:11 INFO input.FileInputFormat: Total input paths to > process > > : 1 > > 11/02/07 18:20:12 INFO mapred.JobClient: Running job: > job_201102071740_0003 > > 11/02/07 18:20:13 INFO mapred.JobClient: map 0% reduce 0% > > 11/02/07 18:20:20 INFO mapred.JobClient: map 100% reduce 0% > > 11/02/07 18:20:22 INFO mapred.JobClient: Job complete: > > job_201102071740_0003 > > 11/02/07 18:20:22 INFO mapred.JobClient: Counters: 6 > > 11/02/07 18:20:22 INFO mapred.JobClient: Job Counters > > 11/02/07 18:20:22 INFO mapred.JobClient: Launched map tasks=1 > > 11/02/07 18:20:22 INFO mapred.JobClient: Data-local map tasks=1 > > 11/02/07 18:20:22 INFO mapred.JobClient: FileSystemCounters > > 11/02/07 18:20:22 INFO mapred.JobClient: HDFS_BYTES_READ=10487 > > 11/02/07 18:20:22 INFO mapred.JobClient: Map-Reduce Framework > > 11/02/07 18:20:22 INFO mapred.JobClient: Map input records=255 > > 11/02/07 18:20:22 INFO mapred.JobClient: Spilled Records=0 > > 11/02/07 18:20:22 INFO mapred.JobClient: Map output records=1452 > > > > > > The outputformat is a MultipleOutputs that write to disk. > > > > I guess my question is should we care about these counters? Do they mean > > anything with accumulo? It seems to suggest that mappers are not running > > local with the tservers. > > > > Thanks, > > Vincent > > > > On Thu, May 5, 2022 at 9:03 AM Mike Miller <mmil...@apache.org> wrote: > > > > > What do you mean by data local or rack-local map task? You don't have > any > > > mappers running? What is your output format? What configuration are you > > > setting in the Job? > > > > > > On Wed, May 4, 2022 at 8:44 PM Vincent Russell < > > vincent.russ...@gmail.com> > > > wrote: > > > > > > > Hello, > > > > > > > > I am using map reduce with accumulo 2.0.1 and hadoop 3.3.1. After > our > > > map > > > > reduce jobs complete we take a look at the counters and the > > > Data-local-map > > > > tasks and rack-local map tasks are both equal to 0. Do we probably > > have > > > > something misconfigured or is this expected? Or can this count not > be > > > > configured properly with the AccumuloInputFormat? > > > > > > > > Thank you, > > > > Vincent > > > > > > > > > >