On Thu, Dec 26, 2019 at 09:51:59AM -0500, rhkra...@gmail.com wrote: > Just to confirm, I assume that is true ("no way to skip ahead to byte 31337") > even if the underlying media is a (somewhat random access) disk instead of > (serial access) tape?
Correct. There's no central index inside the tar archive that says "file xyz begins at byte 12345". This is by design, so that you can append new content to an existing tar archive. When you append a new file to an existing archive, you simply drop a new metadata header record, and then the new content. So, the entire archive is a long string of header file header file header file .... The only way to find a file is to read the entire thing from the beginning until you find the file you want. > Again, I assume (I know what assume does) that "USB mass-storage device that > acts like a hard drive" is (or might be) a pen drive type of device. Yes. > I've had > a lot of bad luck (well, more bad luck than I'd like) with that kind of > device, and I suspect that the problem is more likely to occur when parts of > the device are erased to allow something new to be written to it. > > In other words, I suspect it would be more reliable if it functioned a little > bit more like a WORM (Write Once, Read Many) type device "Write Once, Read Many" is an entirely different data storage paradigm. Think of a large dusty vault full of optical media. Once you've backed up your full database (or whatever) to one of these media, it goes into the vault. You can't reuse the medium, nor do you WANT to, for legal reasons. You've chosen this technology specifically because it CANNOT be altered once written, and therefore gives you some sort of debatably reliable legal trail of evidence. "On May 7th, this is what we had." Very expensive, and very niche. > -- not that the whole > device necessarily has to be written in one go, but more that, for highest > reliablity, data is appended by writing in previously unused locations > rather than deleting some data, and then writing new data in previously used > and erased locations. I am not an expert in solid state storage, so I won't even try to address the questions about long-term reliability of various USB mass storage devices. For most people, it comes down to "when you can't write to the device any more, you throw it away and get another". > I don't know whether rsync, in the normal course of events will delete > (erase) > and write data in previously used locations, but it would be helpful to have > comments, with respect to: > > * whether rsync will rewrite to previously used locations, [...] Rsync does not operate at the disk sector level. It operates at the file level. If you've modified a file since the last backup, then rsync knows it needs to modify the backed-up copy of the file. It will use various algorithms to decide whether it should just copy the entire file from the source, or try to preserve pieces of the file that are already on the destination. The main goal there is to reduce the transmission of bytes from a source host to a destination host, because one of rsync's main use cases is backing up files across a network. Since you're focusing on the case where there's no network involved, a lot of that work is just not relevant. In the end, as far as I understand it, rsync will create a new file on the destination, which contains the new content (however it gets the new content). Then the older copy of the file will be deleted. How the storage device's controller works (how it decides which parts of the device get the new file, how the part where the old file used to be get recycled, etc.) is outside of rsync's purview, and definitely outside of *my* personal knowledge.