On Mon, Apr 13, 2020 at 11:41 AM David Haller <gen...@dhaller.de> wrote:
>
> First of all: "physical write blocks" in the physical flash are 128kB
> or something in that size range, not 4kB or even 512B

Yup, though I never claimed otherwise.  I just made the generic
statement that the erase blocks are much larger than the write blocks,
even moreso than on a 4k hard drive.  (The only time I mentioned 4k
was in the context of hard drives, not SSDs.)

> Anyway, a write to a single (used) logical 512B block
> involves:
>
> 1. read existing data of the phy-block-group (e.g. 128KB)
> 2. write data of logical block to the right spot of in-mem block-group
> 3. write in-mem block-group to (a different, unused) phy-block-group
> 4. update all logical block pointers to new phy-block-group as needed
> 5. mark old phy-block-group as unused

Yup.  Hence my statement that my description was a simplification and
that a real implementation would probably use extents to save memory.
You're describing 128kB extents.  However, there is no reason that the
drive has to keep all the blocks in an erase group together, other
than to save memory in the mapping layer.  If it doesn't then it can
modify a logical block without having to re-read adjacent logical
blocks.

> And what takes time when doing a "large" TRIM is transmitting a
> _large_ list of blocks to the SSD via the TRIM command. That's why
> e.g. those ~6-7GiB trims I did just before (see my other mail) took a
> couple of seconds for 13GiB ~ 25M LBAs ~ a whole effin bunch of TRIM
> commands (no idea... wait, 1-4kB per TRIM and 4B/LBA is max. 1k
> LBAs/TRIM and for 25M LBAs you'll need minimum 25-100k TRIM
> commands... go figure ;) no wonder it takes a second or few ;)

There is no reason that 100k TRIM commands need to take much time.
Transmitting the commands is happening at SATA speeds at least.  I'm
not sure what the length of the data in a trim instruction is, but
even if it were 10-20 bytes you could send 100k of those in 1MB, which
takes <10ms to transfer depending on the SATA generation.

Now, the problem is the implementation on the drive.  If the drive
takes a long time to retire each command then that is what backs up
the queue, and hence that is why the behavior depends a lot on
firmware/etc.  The drive mapping is like a filesystem and as we all
know some filesystems are faster than others for various operations.
Also as we know hardware designers often aren't optimizing for
performance in these matters.

> Oh, and yes, on rotating rust, all that does not matter. You'd just
> let the data rot and write at 512B (or now 4kB) granularity. Well,
> those 4k-but-512Bemulated drives (which is about all new ones by now I
> think) have to do something like SSDs. But only on the 4kB level. Plus
> the SMR shingling stuff of course. When will those implement TRIM?

And that would be why I used 4k hard drives and SMR drives as an
analogy.  4k hard drives do not support TRIM but as you (and I)
pointed out, they're only dealing with 4k at a time.  SMR drives
sometimes do support TRIM.

-- 
Rich

Reply via email to