[ https://issues.apache.org/jira/browse/TEZ-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15566278#comment-15566278 ]
Kuhu Shukla commented on TEZ-3459: ---------------------------------- I ran the jar and think I know what the issue is. The framework and fs and other counters do show up in the task level and dag level counters as show below- values have been overwritten. (when in yarn-tez mode) {code} [INFO] [TezChild] |runtime.LogicalIOProcessorRuntimeTask|: Final Counters for attempt_123_0001_1_00_000000_0: Counters: 28 [[File System Counters FILE_BYTES_READ=123, FILE_BYTES_WRITTEN=123, FILE_READ_OPS=0, FILE_LARGE_READ_OPS=0, FILE_WRITE_OPS=0, HDFS_BYTES_READ=1234, HDFS_BYTES_WRITTEN=0, HDFS_READ_OPS=2, HDFS_LARGE_READ_OPS=0, HDFS_WRITE_OPS=0][org.apache.tez.common.counters.TaskCounter SPLIT_RAW_BYTES=123, SPILLED_RECORDS=123, GC_TIME_MILLIS=123, CPU_MILLISECONDS=123, PHYSICAL_MEMORY_BYTES=123456, VIRTUAL_MEMORY_BYTES=123456, COMMITTED_HEAP_BYTES=12345678, INPUT_RECORDS_PROCESSED=0, INPUT_SPLIT_LENGTH_BYTES=1234, OUTPUT_RECORDS=123, OUTPUT_BYTES=123, OUTPUT_BYTES_WITH_OVERHEAD=123, OUTPUT_BYTES_PHYSICAL=12, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0, SHUFFLE_CHUNK_COUNT=1][example.MapredColorCount$ColorCounter INPUT_RECORDS=100]] {code} {code} [INFO] [TezChild] |runtime.LogicalIOProcessorRuntimeTask|: Final Counters for attempt_123_0001_1_01_000000_0: Counters: 44 [[File System Counters FILE_BYTES_READ=123, FILE_BYTES_WRITTEN=0, FILE_READ_OPS=0, FILE_LARGE_READ_OPS=0, FILE_WRITE_OPS=0, HDFS_BYTES_READ=0, HDFS_BYTES_WRITTEN=123, HDFS_READ_OPS=4, HDFS_LARGE_READ_OPS=0, HDFS_WRITE_OPS=2][org.apache.tez.common.counters.TaskCounter REDUCE_INPUT_GROUPS=12, REDUCE_INPUT_RECORDS=1234, COMBINE_INPUT_RECORDS=0, SPILLED_RECORDS=123, NUM_SHUFFLED_INPUTS=5, NUM_SKIPPED_INPUTS=0, NUM_FAILED_SHUFFLE_INPUTS=0, MERGED_MAP_OUTPUTS=5, GC_TIME_MILLIS=12, CPU_MILLISECONDS=1234, PHYSICAL_MEMORY_BYTES=12345678, VIRTUAL_MEMORY_BYTES=12345678, COMMITTED_HEAP_BYTES=12345678, OUTPUT_RECORDS=7, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=123, SHUFFLE_BYTES=123, SHUFFLE_BYTES_DECOMPRESSED=1234, SHUFFLE_BYTES_TO_MEM=0, SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_DISK_DIRECT=123, NUM_MEM_TO_DISK_MERGES=0, NUM_DISK_TO_DISK_MERGES=0, SHUFFLE_PHASE_TIME=12, MERGE_PHASE_TIME=123, FIRST_EVENT_RECEIVED=12, LAST_EVENT_RECEIVED=12][Shuffle Errors BAD_ID=0, CONNECTION=0, IO_ERROR=0, WRONG_LENGTH=0, WRONG_MAP=0, WRONG_REDUCE=0][example.MapredColorCount$ColorCounter OUTPUT_RECORDS=7]] {code} The issue is when we try to retrieve those counters, the Tez {{YarnRunner}} returns an empty counter object : {code} public Counters getJobCounters(JobID jobId) throws IOException, InterruptedException { // FIXME needs counters support from DAG // with a translation layer on client side Counters empty = new Counters(); return empty; } {code} Hence when we try to get the custom counter it is init-ed to zero and treated like a new counter as per: AbstractCounterGroup : {code} private synchronized T findCounterImpl(String counterName, boolean create) { T counter = counters.get(counterName); if (counter == null && create) { String localized = ResourceBundles.getCounterName(getName(), counterName, counterName); return addCounterImpl(counterName, localized, 0); } return counter; } {code} This is true even for framework counters as the counters map above is empty in tez case. Asking [~hitesh] if this triaging makes sense and comments on a possible fix. Mapred YarnRunner equivalent uses getCountersProto to get the job counters. > Issues running M/R jobs with Tez > -------------------------------- > > Key: TEZ-3459 > URL: https://issues.apache.org/jira/browse/TEZ-3459 > Project: Apache Tez > Issue Type: Bug > Reporter: Manuel Godbert > Attachments: colorCount.sh, mr-example.jar > > > After applying the patch delivered in TEZ-3330, I enriched the > MapredColorCount example to reproduce some of the other issues I encountered > on the jobs I wish to see running with Tez. > I am attaching a jar to the JIRA, including source code, and a script file > detailing the observed results in comments. > It adresses 3 issues: > - the embedded jars in /lib are ignored by Tez, but YARN uses them without > additional configuration > - The use of a combiner causes a NullPointerException > - The counters incremented in the Reporter objects stay at 0 > I am using HDP2.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)