On Wed, Dec 23, 2015 at 1:21 PM, Neuer User <auslands...@gmx.de> wrote: > Am 23.12.2015 um 20:49 schrieb Chris Murphy: >> Seems to me if the LV's on the two HDDs are exposed, the lvmcache has >> to separately keep track of those LVs. So as long as everything is >> working correctly, it should be fine. That includes either transient >> or persistent, but consistent, errors for either HDD or the SSD, and >> Btrfs can fix up those bad reads with data from the other. If the SSD >> were to decide to go nutty, chances are reads through lvmcache would >> be corrupt no matter what LV is being read by Btrfs, and it'll be >> aware of that and discard those reads. Any corrupt writes in this >> case, won't be immediately known by Btrfs because it (like any file >> system) assumes writes are OK unless the device reports a write >> failure, but those too would be found on read. > > What corrupt write do you mean? The "nuts" SSD is not going to write to > the HDDs, that will be done by lvmcache. So the HDDs should get the > correct data, only the SSD will be bad, right?
Btrfs always writes to the 'cache LV' and then it's up to lvmcache to determine how and when things are written to the 'cache pool LV' vs the 'origin LV' and I have no idea if there's a case with writeback mode where things write to the SSD and only later get copied from SSD to the HDD, in which case a wildly misbehaving SSD might corrupt data on the origin. If you use writethrough, the default, then the data on HDDs should be fine even if the single SSD goes crazy for some reason. Even if all reads go bad, worse case is Btrfs should stop and go read-only. If the SSD read errors are more transient, then Btrfs tries to fix them with COW writes, so even if these fixes aren't needed on HDD, they should arrive safely on both HDDs and hence still no corruption. I mean *really* if data integrity is paramount you probably would do this with production methods. Anything that has high IOPS like a mail server, just write that stuff only to the SSD, and then occasionally rsync it to conventionally raided (md or lvm) HDDs with XFS. You could even use lvm snapshots and do this often, and now you not only have something fast and safe but also you have an integrated backup that's mirrored, in a sense you have three copies. Whereas what you're attempting is rather complicated, and while it ought to work and it gets testing, you're really being a test candidate not least of which is Btrfs but also lvmcache, but you're also combining both tests. I'd just say make sure you have regular backups - snapshot the rw subvolume regularly and sync it to another filesystem. As often as the workflow can tolerate. > > And that would become obvious with the next reads, in which case btrfs > probably would throw an error as it gets crazy data from apparently both > LVs (but only coming from the SSD). So, that could be fixed by removing > the SSD without any data loss from the HDDs, right? Only if you're using writethrough mode, but yes. > >> >> The question I have, that I don't know the answer to, is if the stack >> arrives at a point where all writes are corrupt but hardware isn't >> reporting write errors, and it continues to happen for a while, once >> you've resolved that problem and try to mount the file system again, >> how well does Btrfs disregard all those bad writes? How well would any >> filesystem? >> > Hmm, again the writes to the HDDs should be ok. Only the SSD would have > pretty corrupt data, right? In such a case it might depend on how much > bad data is read back from the SSDs and what the filesystem does in > raction to these? > > P.S.: Of course, one other possibility would be to use a second SSD, so > that each LV has a separate caching SSD. In this case, there would > always be a valid source (given that not both SSDs go nuts the same > time...). Simplistically, SSDs seem to fail two ways: a series of transient errors that Btrfs can pretty much always account for; and then totally face planting. The way they faceplant can be all writes fail, reads work, or the whole device just vanishes off the bus. I don't know how that affects lvmcache writethrough if the entire cache pool vanishes. It should still write to the HDDs but I don't know that it does. > But I would need another slot for this. If the pros are very high, > that's ok. If it works nicely with just one SSD, then even better. Yeah if it's a decent name brand SSD and not one of the ones with known crap firmware, then I think it's fine to just have one. Either way, each origin LV gets a separate cache pool LV if I understand lvmcache correctly. Off hand I don't know if you need separate VGs to make sure the 'cache LVs' you format with Btrfs in fact use different PVs as origins. That's important. The usual lvcreate command has a way to specify one or more PVs to use, rather than have it just grab a pile of extents from the VG (which could be from either PV), but I don't know if that's the way it works in conjunction with lvmcache. You're probably best off configuring this, and while doing write, pull a device. Do that three times, once for each HDD and once for the SSD, and see if you can recover. If it has to be bullet proof, you need to spray it with bullets. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html