[ https://issues.apache.org/jira/browse/HDFS-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913468#comment-16913468 ]
Erik Krogen commented on HDFS-13977: ------------------------------------ Good catch on the "less than" [~vagarychen]! Thanks. The check is in a method {{setOutputBufferSize()}}, but I checked and you're right that this is only ever modified from the hard-coded 512K in tests. I changed the wording slightly to make the message more clear in v003. > NameNode can kill itself if it tries to send too many txns to a QJM > simultaneously > ---------------------------------------------------------------------------------- > > Key: HDFS-13977 > URL: https://issues.apache.org/jira/browse/HDFS-13977 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, qjm > Affects Versions: 2.7.7 > Reporter: Erik Krogen > Assignee: Erik Krogen > Priority: Major > Attachments: HDFS-13977.000.patch, HDFS-13977.001.patch, > HDFS-13977.002.patch, HDFS-13977.003.patch > > > h3. Problem & Logs > We recently encountered an issue on a large cluster (running 2.7.4) in which > the NameNode killed itself because it was unable to communicate with the JNs > via QJM. We discovered that it was the result of the NameNode trying to send > a huge batch of over 1 million transactions to the JNs in a single RPC: > {code:title=NameNode Logs} > WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote > journal X.X.X.X:XXXX failed to > write txns 10000000-11153636. Will try to write to this JN again after the > next log roll. > ... > WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 1098ms > to send a batch of 1153637 edits (335886611 bytes) to remote journal > X.X.X.X:XXXX > {code} > {code:title=JournalNode Logs} > INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8485: > readAndProcess from client X.X.X.X threw exception [java.io.IOException: > Requested data length 335886776 is longer than maximum configured RPC length > 67108864. RPC came from X.X.X.X] > java.io.IOException: Requested data length 335886776 is longer than maximum > configured RPC length 67108864. RPC came from X.X.X.X > at > org.apache.hadoop.ipc.Server$Connection.checkDataLength(Server.java:1610) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1672) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:897) > at > org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:753) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:724) > {code} > The JournalNodes rejected the RPC because it had a size well over the 64MB > default {{ipc.maximum.data.length}}. > This was triggered by a huge number of files all hitting a hard lease timeout > simultaneously, causing the NN to force-close them all at once. This can be a > particularly nasty bug as the NN will attempt to re-send this same huge RPC > on restart, as it loads an fsimage which still has all of these open files > that need to be force-closed. > h3. Proposed Solution > To solve this we propose to modify {{EditsDoubleBuffer}} to add a "hard > limit" based on the value of {{ipc.maximum.data.length}}. When {{writeOp()}} > or {{writeRaw()}} is called, first check the size of {{bufCurrent}}. If it > exceeds the hard limit, block the writer until the buffer is flipped and > {{bufCurrent}} becomes {{bufReady}}. This gives some self-throttling to > prevent the NameNode from killing itself in this way. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org