On Mon, Apr 13, 2020 at 11:41 AM David Haller <gen...@dhaller.de> wrote: > > First of all: "physical write blocks" in the physical flash are 128kB > or something in that size range, not 4kB or even 512B
Yup, though I never claimed otherwise. I just made the generic statement that the erase blocks are much larger than the write blocks, even moreso than on a 4k hard drive. (The only time I mentioned 4k was in the context of hard drives, not SSDs.) > Anyway, a write to a single (used) logical 512B block > involves: > > 1. read existing data of the phy-block-group (e.g. 128KB) > 2. write data of logical block to the right spot of in-mem block-group > 3. write in-mem block-group to (a different, unused) phy-block-group > 4. update all logical block pointers to new phy-block-group as needed > 5. mark old phy-block-group as unused Yup. Hence my statement that my description was a simplification and that a real implementation would probably use extents to save memory. You're describing 128kB extents. However, there is no reason that the drive has to keep all the blocks in an erase group together, other than to save memory in the mapping layer. If it doesn't then it can modify a logical block without having to re-read adjacent logical blocks. > And what takes time when doing a "large" TRIM is transmitting a > _large_ list of blocks to the SSD via the TRIM command. That's why > e.g. those ~6-7GiB trims I did just before (see my other mail) took a > couple of seconds for 13GiB ~ 25M LBAs ~ a whole effin bunch of TRIM > commands (no idea... wait, 1-4kB per TRIM and 4B/LBA is max. 1k > LBAs/TRIM and for 25M LBAs you'll need minimum 25-100k TRIM > commands... go figure ;) no wonder it takes a second or few ;) There is no reason that 100k TRIM commands need to take much time. Transmitting the commands is happening at SATA speeds at least. I'm not sure what the length of the data in a trim instruction is, but even if it were 10-20 bytes you could send 100k of those in 1MB, which takes <10ms to transfer depending on the SATA generation. Now, the problem is the implementation on the drive. If the drive takes a long time to retire each command then that is what backs up the queue, and hence that is why the behavior depends a lot on firmware/etc. The drive mapping is like a filesystem and as we all know some filesystems are faster than others for various operations. Also as we know hardware designers often aren't optimizing for performance in these matters. > Oh, and yes, on rotating rust, all that does not matter. You'd just > let the data rot and write at 512B (or now 4kB) granularity. Well, > those 4k-but-512Bemulated drives (which is about all new ones by now I > think) have to do something like SSDs. But only on the 4kB level. Plus > the SMR shingling stuff of course. When will those implement TRIM? And that would be why I used 4k hard drives and SMR drives as an analogy. 4k hard drives do not support TRIM but as you (and I) pointed out, they're only dealing with 4k at a time. SMR drives sometimes do support TRIM. -- Rich