Re: Feature Request: Add data verification mode

Daniel Rudolf Tue, 11 Feb 2025 06:23:45 -0800

Antonio Diaz Diaz wrote:

Daniel Rudolf wrote:
in my experience a broken HDD sometimes doesn't even report a read error,
but just returns random garbage.
Interesting. Yours is the third request of a feature intended to detector recover from such hardware misbehaviors. Are HDD manufacturerscutting too many corners as of lately?


I'd say that HDD manufacturers have always cut corners 😄

I can only speak about my experience, but the latest defective HDD withthis issue was quite old: It was a 2,5" Samsung drive manufactured backin 2015 that didn't accumulate many hours (below 8,000 hours). It diedout of nowhere, not a single sign of an imminent failure (apart from theold age of course). Nothing critical on that HDD, but I always try torestore data.

To add some more context about the issue: It's not like that the HDDstarts returning garbage data at some point, but rather a "wrong" sectornow and then. The only reason I noticed is because I store hashes ofsome of my data and thus was able to verify some of it after runningddrescue. I saw that some hashes didn't match, so I tracked the sectorsdown and was surprised to see that ddrescue didn't report these sectorsto be bad. So I rescued these sectors again and then the hashes matched.

However, I can't verify all data this way, simply because I don't havehashes of everything. I thus ran ddrescue-verify and was surprised tosee that it reported a difference of about 400 MiB (of 1,5 TB total).ddrescue-verify excluded the four 512 bit sectors ddrescue deemed bad.Most (but not all) differences were in areas ddrescue also struggledwith, but ultimately rescued successfully (what was false, but ddrescuedidn't know that). However, I don't know about how many "wrong" sectorswe're really talking - it could be as low as one "wrong" sector per 1MiB ddrescue-verify block.

I experienced this before (too many non-matching hashes of data for thenumber of bad sectors reported by ddrescue), but I didn't investigatefurther back then.


>
> If the hardware does not fulfil its part of the job (report a read
> error), it is not possible for the software to be sure that it has got
> the correct data.

I agree: It's virtually impossible for ddrescue to decide what thecorrect data is. One could indeed use a "two out of three" algorithm,but that isn't bulletproof either: The issue could have developed afterthe initial rescue. So, in the end ddrescue should leave the decision upto the user (best with the support of knowing how often what data was read).

Thus I like running ddrescue at least twice: Once to do the actual data
recovery, and at least once more to verify the data that was recovered.
This seems like the best one can do with such faulty HDDs. But I findthe hash approach both complicated and inefficient. Comparing the dataread during the verification run with the image (outfile) should bebetter. It would also make easier to use the domain options to verifyonly a subset of the whole image.
The biggest problems I see with verification are where to storeefficiently the different reads of variable sectors, and how to decidewhich of the reads is the real data of each given sector. Maybe thedrive returns garbage most of the time for damaged sectors, and onlyfrom time to time it manages to return the correct data.

You're absolutely right, ddrescue doesn't need the hashes to comparedata, because it has access to the outfile. According to the README,ddrescue-verify's reason to use hashes is because the developer wasrequired to send all data over the wire. But I presume that this isn't avery common use case...

However, not being required to store the diverging data is the reasonwhy I kept the idea of using hashes: Is it really worth storing thatdata? If ddrescue reports the data to differ, the user will investigateand thus read the data a third time. If the data of the third trydiffers from both the first and second try, I'd consider the data of thesecond try worthless anyway.

This made me thinking: Creating hashes proactively indeed doesn't makemuch sense, but they could be used in case ddrescue detects a "vary"sector: ddrescue could then calculate the sector's hash from theoutfile, and from the newly read data, and store both in the mapfile forthe user to decide what to do. This way we shouldn't have any issueswith domains, or blocks consisting of sectors read in different phases -and the mapfile remains manageable.


All the best
Daniel Rudolf

Re: Feature Request: Add data verification mode

Reply via email to