Re: "kernel BUG" and segmentation fault with "device delete"

Vladimir Panteleev Sat, 20 Jul 2019 04:00:21 -0700

Hi,

I've done a few experiments and here are my findings.

First I probably should describe the filesystem: it is a snapshotarchive, containing a lot of snapshots for 4 subvolumes, totaling 2487subvolumes/snapshots. There are also a few files (inside the snapshots)that are probably very fragmented. This is probably what causes the bug.


Observations:

- If I delete all snapshots, the bug disappears (device delete succeeds).
- If I delete all but any single subvolume's snapshots, the bug disappears.

- If I delete one of two subvolumes' snapshots, the bug disappears, butstays if I delete one of the other two subvolumes' snapshots.

It looks like two subvolumes' snapshots' data participates in causingthe bug.

In theory, I guess it would be possible to reduce the filesystem to theminimal one causing the bug by iteratively deleting snapshots / filesand checking if the bug manifests, but it would be extremelytime-consuming, probably requiring weeks.

Anything else I can do to help diagnose / fix it? Or should I just ordermore HDDs and clone the RAID10 the right way?


On 06/07/2019 05.51, Qu Wenruo wrote:



On 2019/7/6 下午1:13, Vladimir Panteleev wrote:
[...]

I'm not sure if it's the degraded mount cause the problem, as the
enospc_debug output looks like reserved/pinned/over-reserved space has
taken up all space, while no new chunk get allocated.


The problem happens after replace-ing the missing device (which succeeds
in full) and then attempting to remove it, i.e. without a degraded mount.

Would you please try to balance metadata to see if the ENOSPC still
happens?


The problem also manifests when attempting to rebalance the metadata.


Have you tried to balance just one or two metadata block groups?
E.g using -mdevid or -mvrange?

And did the problem always happen at the same block group?

Thanks,
Qu


Thanks!


--
Best regards,
 Vladimir

Re: "kernel BUG" and segmentation fault with "device delete"

Reply via email to