[ 
http://issues.apache.org/jira/browse/HADOOP-96?page=comments#action_12372589 ] 

Yoram Arnon commented on HADOOP-96:
-----------------------------------

the plan is to add a log line for each change in the name space and each change 
in block placement or replication. What we get is effectively a trace of 
program execution for DFS changes.
the log will go to a new log object, to enable switching this (extensive) 
logging on or off.
name space changes will be logged at level fine, block commit changes at finer, 
and block pending changes at finest.
In order to facilitate tracing of multiple concurrent operations, each line 
will include the thread id of the name server's thread. For that we derive a 
logging class, that places the thread id right after the date/time.

we log in the following methods of class name node, and in methods of class 
nameSystem called by them:
create (startFile)
abandonFileInProgress (abandonFileInProgress )
AbandonBlock (AbandonBlock )
reportWrittenBlock (blockReceived)
addBlock (getAdditionalBlock)
Complete (completeFile)
rename (renameTo)
delete (delete)
Mkdirs (Mkdirs)
sendHeartbeat (getHeartbeat)
blockReport (processReoprt)
blockReceived (blockReceived)
errorReport
getBlockWork (pendingTransfer, blocksToInvalidate)


> name server should log decisions that affect data: block creation, removal, 
> replication
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-96
>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>      Project: Hadoop
>         Type: Improvement
>   Components: dfs
>     Versions: 0.1
>     Reporter: Yoram Arnon
>     Priority: Critical

>
> currently, there's no way to analyze and debug DFS errors where blocks 
> disapear.
> name server should log its decisions that affect data, including block 
> creation, removal, replication:
> - block <b> created, assigned to datanodes A, B, ...
> - datanode A dead, block <b> underreplicated(1), replicating to datanode C
> - datanode B dead, block <b> underreplicated(2), replicating to datanode D
> - datanode A alive, block <b> overreplicated, removing from datanode D
> - block <removed> from datanodes C, D, ...
> that will enable me to track down, two weeks later, a block that's missing 
> from a file, and to debug the name server.
> extra credit:
> - rotate log file, as it might grow large
> - make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to