[ https://issues.apache.org/jira/browse/RATIS-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878848#comment-17878848 ]
yuuka commented on RATIS-2147: ------------------------------ [~szetszwo], I have submitted a pull request > MD5 mismatch when accept snapshot > --------------------------------- > > Key: RATIS-2147 > URL: https://issues.apache.org/jira/browse/RATIS-2147 > Project: Ratis > Issue Type: Bug > Components: snapshot > Affects Versions: 3.1.0, 3.2.0 > Reporter: yuuka > Priority: Major > Attachments: image-2024-09-03-10-35-08-315.png, > image-2024-09-03-10-35-28-617.png > > Time Spent: 10m > Remaining Estimate: 0h > > We encountered an MD5 mismatch issue in IoTDB, and after multiple > investigations, we found that the digester was contaminated > > We have checked that it is not a network and disk problem > > In implementation, the received snapshot will be written to a temporary file > first. If there is an md5 mismatch, we will read the data from this temporary > file and use a new digest to calculate md5, but the result of this > calculation is the same as the md5 hash value sent > !image-2024-09-03-10-35-28-617.png! > > !image-2024-09-03-10-35-08-315.png! > > > Use the saved corrupted file name to locate the relevant log, here to > tlog.txt.snapshot.snapshot.as an example corrupt20240831-094107 _735 > !https://timechor.feishu.cn/space/api/box/stream/download/asynccode/?code=YjM4MWY1MTA2Y2EyYWU4MmZlNDE0Mzg3MDRjYTBjMjRfU0dPbEpVbWFNalV1V1lSUVllOGFISUdWbUhqanRFdFdfVG9rZW46RHJlbmJHQlRkb2daakp4RHZMVWNEOVFPbmhiXzE3MjUzMzE2MDk6MTcyNTMzNTIwOV9WNA! > Before encountering corrupt, the sender sent several consecutive snapshot > installation requests to the receiver. > > The receiver successfully received some requests, and then encountered a > request for corrupt, and began printing "recompute again" to start > recalculating. > > After execution, the ERROR log of the rename will be printed, and the data > will be read from the file and compared with the received chunk data. > > If a byte does not match, the corresponding information will be printed, but > no log information will be printed, which means that the content written to > the disk is the same as the content sent > !https://timechor.feishu.cn/space/api/box/stream/download/asynccode/?code=YmZlYjk1YjAwOWE4MDJlYTEzZjkxMjljODU1MzQxMTZfMkU0NmlPRWpidDBweGNzWXY4cHNJZG14b1o3Z1BZMzhfVG9rZW46TUxFeGJxTjBqbzIxNUx4eUZrUGNHMk55bjhkXzE3MjUzMzE2MDk6MTcyNTMzNTIwOV9WNA! > This makes the problem very clear. There is a problem with the MD5 > calculation class, and the reasons are as follows: > > If a byte in the middle of the data part is incorrect due to network > reasons, the calculated result and the hash sent must be different > > If there is a problem with the part that stores the hash value, the final > calculation result will also be different. > > I suggest creating a new digest every time follower receive a snapshot, so as > to avoid pollution problems. Under normal network and disk conditions, > Corrupt will not occur -- This message was sent by Atlassian Jira (v8.20.10#820010)