On 2018/8/12 上午5:10, Dan Merillat wrote:
> 19 hours later, still going extremely slowly and taking longer and
> longer for progress made.  Main symptom is the mount process is
> spinning at 100% CPU, interspersed with btrfs-transaction spinning at
> 100% CPU.
> So far it's racked up 14h45m of CPU time on mount and an additional
> 3h40m on btrfs-transaction.
> 
> The current drop key changes every 10-15 minutes when I check it via
> inspect-internal, so some progress is slowly being made.
> 
> I built the kernel with ftrace to see what's going on internally, this
> is the pattern I'm seeing:
> 
[snip]

It looks pretty like qgroup, but too many noise.
The pin point trace event would btrfs_find_all_roots().

> 
> Repeats indefinitely.  btrace shows basically zero activity on the
> array while it spins, with the occasional burst when mount &
> btrfs-transaction swap off.
> 
> To recap the chain of events leading up to this:
> 11TB Array got completely full and started fragmenting badly.
> Ran bedup and it found 600gb of duplicate files that it offline-shared.
> Reboot for unrelated reasons

11T, with highly deduped usage is really the worst scenario case for qgroup.
Qgroup is not really good at handle hight reflinked files, nor balance.
When they combines, it goes worse.

> Enabled quota on all subvolumes to try to track where the new data is
> coming from
> Tried to balance metadata due to transaction CPU spikes
> Force-rebooted after the array was completely lagged out.
> 
> Now attempting to mount it RW.  Readonly works, but RW has taken well
> over 24 hours at this point.

I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you
to disable quota offline.

Thanks,
Qu


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to