Hi,
> On 9. Aug 2024, at 03:13, Yu Kuai <[email protected]> wrote:
>
>
> Yes, for sure IO are stuck in md127 and never get dispatched to nvme,
> for now I'll say this is a raid5 problem.
Note, that this is raid6, not raid5! Sorry, I never explicitly mentioned that
and it was buried in the mdstat output.
Not sure whether that code is internally the same anyway…
> Can you describe in steps how do you reporduce this problem? We must
> figure out where are those IO and why they're not dispatched. I'll be
> easier to debug if I can reporduce it. Otherwise I'll have to give you
> a debug patch.
My workload is pretty simple IMHO:
1. This is on an XFS volume:
meta-data=/dev/mapper/backy isize=512 agcount=112, agsize=268435328 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=1
data = bsize=4096 blocks=30001587200, imaxpct=1
= sunit=128 swidth=1024 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=521728, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
2. I’m rsyncing a large number of files using:
rsync -avz “<remote-machine>:/<remote-folder>" .
Specific aspects of the data that might come into play:
After 5-6 tries of syncing and locking up, I now have a directory on that
volume that contains 154.820 files directly and 1.828.933 files recursively.
The recursive structure uses a /year/<hash prefix>/<hash>-filename structure to
spread out the 1.8 million files more evenly.
The total amount of data is 1.0 TiB at the moment. The smallest files are empty
or just a few kilobytes. The mode of the filesizes is 180.984 bytes.
There is a second workload on the machine that has been running smoothly for a
few weeks which backs up virtual machine images and creates a somewhat
similarly (but more consistent) hashed structure but doesn’t use rsync but a
more complex diff + content hashing + compression approach. However, that other
tool does emit write barriers here and there -wWhich rsync doesn’t AFAIK.
I double checked, but the other workload was not active during my last lockup.
Happy to apply a debug patch to help diagnosing this.
Cheers,
Christian
--
Christian Theune · [email protected] · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick