[jira] [Commented] (HDFS-3590) Print a WARN if the edit log sync period takes more than X time units

Todd Lipcon (JIRA) Mon, 02 Jul 2012 10:57:25 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405170#comment-13405170
 ]


Todd Lipcon commented on HDFS-3590:
-----------------------------------

Nice idea. Per-directory metrics would also be a nice improvement, but the log 
makes good sense. I'd warn at a much earlier threshold, like 5 seconds.
                
> Print a WARN if the edit log sync period takes more than X time units
> ---------------------------------------------------------------------
>
>                 Key: HDFS-3590
>                 URL: https://issues.apache.org/jira/browse/HDFS-3590
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: Harsh J
>            Priority: Minor
>
> If an logSync operation, which happens for calls such as FS#create() after 
> the edit has been made at the NN metadata, takes longer than X seconds (I'd 
> say if it took more than a minute, there's something really wrong with the 
> volume it probably got stuck on), we should log a WARN with the volume that 
> may have particularly caused it. This helps track down, if an NN runs with 
> multiple NFS volumes, which particular volume may have caused it, as there's 
> no per-NN-dir metrics of any kind.
> I ran into a situation today where a hard-mounted NFS point hung for over X 
> minutes but there was no indication in NN's logs after it recovered 
> (recovering so late caused its own slew of issues for which I'll file other 
> improvement JIRAs) that such an event happened, aside of the Sync (Journal 
> Sync) metric spiking with the elapsed sync time value rising up. A log would 
> have helped save time investigating this, and possibly would have also 
> pin-pointed the bad location more accurately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3590) Print a WARN if the edit log sync period takes more than X time units

Reply via email to