Re: tracking remote reads in datanode logs

2015-02-25 Thread Igor Bogomolov
Thanks a lot!

Igor

On Tue, Feb 24, 2015 at 11:46 PM, Drake민영근 drake@nexr.com wrote:

 Hi, Igor

 The AM logs are in the Hdfs if you set log aggregation property.
 Otherwise, they are in the container log directory. See this:
 http://ko.hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

 Thanks

 2015년 2월 25일 수요일, Igor Bogomolovigor.bogomo...@gmail.com님이 작성한 메시지:

 Hi Drake,

 Thanks for a pointer. AM log indeed have information about remote map
 tasks. But I'd like to have more low level details. Like on which node each
 map task was scheduled and how many bytes was read. That should be exactly
 in datanode log and I saw it for another job. But after I reinstall the
 cluster it's not there anymore :(

 Could you please tell the path where AM log is located (from which you
 copied the lines)? I found it in web interface but not as file on a disk.
 And nothing in /var/log/hadoop-*

 Thanks,
 Igor

 On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 drake@nexr.com wrote:

 I found this in the mapreduce am log.

 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
 Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
 HostLocal:0 RackLocal:0
 ..
 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
 Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
 HostLocal:3 RackLocal:2
 ..

 The first line says Map tasks are 5 and second says HostLocal 3 and Rack
 Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
 before.


 Drake 민영근 Ph.D
 kt NexR

 On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 drake@nexr.com wrote:

 Hi, Igor

 Did you look at the mapreduce application master log? I think the local
 or rack local map tasks are logged in the MapReduce AM log.

 Good luck.

 Drake 민영근 Ph.D
 kt NexR

 On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov 
 igor.bogomo...@gmail.com wrote:

 Hi all,

 In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I
 want to know how many remote map tasks (ones that read input data from
 remote nodes) there are in a mapreduce job. For this purpose I took logs 
 of
 each datanode an looked for lines with op: HDFS_READ and cliID
 field that contains map task id.

 Surprisingly, 4 datanode logs does not contain lines with op: HDFS_READ.
 Another 1 has many lines with op: HDFS_READ but all cliID look like
 DFSClient_NONMAPREDUCE_* and does not contain any map task id.

 I concluded there are no remote map tasks but that does not look
 correct. Also even local reads are not logged (because there is no line
 where cliID field contains some map task id). Could anyone please
 explain what's wrong? Why logging is not working? (I use default 
 settings).

 Chris,

 Found HADOOP-3062 https://issues.apache.org/jira/browse/HADOOP-3062
 that you have implemented. Thought you might have an explanation.

 Best,
 Igor






 --
 Drake 민영근 Ph.D
 kt NexR




Re: tracking remote reads in datanode logs

2015-02-24 Thread Igor Bogomolov
Hi Drake,

Thanks for a pointer. AM log indeed have information about remote map
tasks. But I'd like to have more low level details. Like on which node each
map task was scheduled and how many bytes was read. That should be exactly
in datanode log and I saw it for another job. But after I reinstall the
cluster it's not there anymore :(

Could you please tell the path where AM log is located (from which you
copied the lines)? I found it in web interface but not as file on a disk.
And nothing in /var/log/hadoop-*

Thanks,
Igor

On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 drake@nexr.com wrote:

 I found this in the mapreduce am log.

 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
 Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
 HostLocal:0 RackLocal:0
 ..
 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
 Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
 HostLocal:3 RackLocal:2
 ..

 The first line says Map tasks are 5 and second says HostLocal 3 and Rack
 Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
 before.


 Drake 민영근 Ph.D
 kt NexR

 On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 drake@nexr.com wrote:

 Hi, Igor

 Did you look at the mapreduce application master log? I think the local
 or rack local map tasks are logged in the MapReduce AM log.

 Good luck.

 Drake 민영근 Ph.D
 kt NexR

 On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov igor.bogomo...@gmail.com
  wrote:

 Hi all,

 In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want
 to know how many remote map tasks (ones that read input data from remote
 nodes) there are in a mapreduce job. For this purpose I took logs of each
 datanode an looked for lines with op: HDFS_READ and cliID field that
 contains map task id.

 Surprisingly, 4 datanode logs does not contain lines with op: HDFS_READ.
 Another 1 has many lines with op: HDFS_READ but all cliID look like
 DFSClient_NONMAPREDUCE_* and does not contain any map task id.

 I concluded there are no remote map tasks but that does not look
 correct. Also even local reads are not logged (because there is no line
 where cliID field contains some map task id). Could anyone please
 explain what's wrong? Why logging is not working? (I use default settings).

 Chris,

 Found HADOOP-3062 https://issues.apache.org/jira/browse/HADOOP-3062
 that you have implemented. Thought you might have an explanation.

 Best,
 Igor