[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844105#comment-17844105 ]
WenjingLiu commented on HDFS-14646: ----------------------------------- Hi, [~hemanthboyina] . In our test cluster, we also came across the same issue and discovered that the code "nnImage.purgeOldStorage" was removed in 002.patch as compared to 001.patch. As a result, ANN does not remove old fsimage files, which might not be suitable in certain situations. We are curious if there is a specific reason behind the necessity of removing the "nnImage.purgeOldStorage" code? > Standby NameNode should not upload fsimage to an inappropriate NameNode. > ------------------------------------------------------------------------ > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Affects Versions: 3.1.2 > Reporter: Xudong Cao > Assignee: Xudong Cao > Priority: Major > Labels: multi-sbnn > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences : > *1.Under Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.Under newest release-3.2.0-RC1 (with Jetty 9.3.24) and trunk (with Jetty > 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_0000000000003364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_0000000000003364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. > In detail, local SNN should establish an HTTP connection with the peer NN, > send the put request, and then immediately read the response (this is the key > point). If the peer NN does not reply an HTTP_OK, it means the local SNN > should not put image at this time. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org