Allow me to begin with words of gratitude. A few weeks ago I accidentally dropped my laptop on the floor and the old 2 TB hard disk finally began to fail. Thanks to GNU ddrescue I've successfully recovered at least 70% of the data on that drive. Thank you!
I'm using the --data-preview option to monitor the data being read. During the entire process it seemed fine, I recognized numerous text files as they were recovered by ddrescue. Just today something weird started happening though. At some point the data preview started showing the same bytes over and over again. The input and output positions kept advancing, as did the block offsets on the data preview, but the data shown on the preview was always the same. This happened while copying forwards and backwards. I'm not entirely sure what's happening with the hard disk or if it's even possible to recover more data out of it. It's like the drive got stuck reading the same sectors over and over again and for some reason it's not reporting that fact as an I/O error to the operating system. When I noticed this, I concluded that it was potentially invalid or corrupt data. I interrupted the rescue, made note of the input position and block offsets that were being read for later reference and powered off the computer. It's not the fault of ddrescue. The drive appears to be so badly degraded that it can't even notice when it's reading garbage. Looks like it can't even reliably report errors back up the stack. Still, I'd like to suggest a feature based on this experience: detection of repeatedly equal non-null reads. Unless the drive has been filled with some zero or non-zero pattern, it just seems astronomically unlikely to me that reading two different regions from the storage device could return the same data. I assume that's a sign the drive is returning invalid data even in the absence of I/O errors in the kernel log. If ddrescue could detect this, then perhaps it could do something smart about it. I'm not sure if there's anything that can be done, folks much smarter than I would have to analyze the options. Even simply halting the rescue process until the user intervened would be reasonable though. In my case, I ended up reading what's likely invalid data from the disk and writing it to the ddrescue image. This went on for an undefined but hopefully short amount of time. The rescued data count falsely increased and the sectors were marked as rescued in the ddrescue's map file even though the data probably couldn't be read correctly. They won't be retried in the later passes and when trimming and scraping the disk. Data was also written needlessly to the new storage device which is undesirable for SSDs. After the rescue, I planned to analyze the data loss by correlating ext4 file system block locations with the data in ddrescue's map file. Such analysis becomes complicated if not impossible if one cannot assume that blocks rescued by ddrescue are valid data. Detecting this condition would increase the confidence that blocks marked by ddrescue as rescued were truly rescued. Just a report of my extremely positive experience with GNU ddrescue and a constructive suggestion on how to make it even better than it already is. A gigantic thank you to everyone who ever worked on it. -- Matheus
