[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035884#comment-13035884 ]
sam rash commented on HDFS-1057: -------------------------------- i assume a similar problem as before. The problem was that code that opened RPC proxies to DNs did not get closed in a finally block. The test failure output indicates a socket/fd leak ("Too many open files"). https://issues.apache.org/jira/browse/HDFS-1310 the test was succeeding 8 months ago, 2010-09-10, so I'd look at commits that came after that. > Concurrent readers hit ChecksumExceptions if following a writer to very end > of file > ----------------------------------------------------------------------------------- > > Key: HDFS-1057 > URL: https://issues.apache.org/jira/browse/HDFS-1057 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node > Affects Versions: 0.20-append, 0.21.0, 0.22.0 > Reporter: Todd Lipcon > Assignee: sam rash > Priority: Blocker > Fix For: 0.20-append, 0.21.0, 0.22.0 > > Attachments: HDFS-1057-0.20-append.patch, > conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, > conurrent-reader-patch-3.txt, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, > hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt, > hdfs-1057-trunk-6.txt > > > In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before > calling flush(). Therefore, if there is a concurrent reader, it's possible to > race here - the reader will see the new length while those bytes are still in > the buffers of BlockReceiver. Thus the client will potentially see checksum > errors or EOFs. Additionally, the last checksum chunk of the file is made > accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira