On 2014-12-13 10:39, Robert White wrote:

Might I ask why you are running balance? After a persistent error I'd
understand going straight to scrub, but balance is usually for
transformation or to redistribute things after atypical use.

There were several reasons for running balance on this system:

1) I was getting "no space left", even though there were hundreds of GBs left. Not sure if this still applies to the current kernels (3.18 and later) though, but it was certainly the problem in the past.

2) The system was regularly freezing, I'd say once a week was a norm. Sometimes I was getting btrfs traces logged in syslog. After a few freezes the fs was getting corrupted to different degree. At some point, it was so bad that it was only possible to use it read only. So I had to get the data off, reformat, copy back... It would start crashing after a few weeks of usage.

My usage case is quite simple:

- skinny extents, extended inode refs
- mount compress-force=zlib
- rsync many remote data sources (-a -H --inplace --partial) + snapshot
- around 500 snapshots in total, from 20 or so subvolumes

Especially rsync's --inplace option combined with many snapshots and large fragmentation was deadly for btrfs - I was seeing system freezes right when rsyncing a highly fragmented, large file.

Then, running balance on the "corrupted" filesystem was more an exercise (if scrub passes fine, I would expect balance to pass as well). Some BUGs it was causing was sometimes fixed in newer kernels, sometimes not (btrfsck was not really usable a few months back).

3) I had different luck with recovering btrfs after a failed drive (in RAID-1). Sometimes it worked as expected, sometimes, the fs was getting broken so much I had to rsync data off it and format from scratch (where mdraid would kick the drive after getting write errors - it's not the case with btrfs, and weird things can happen). Sometimes, running "btrfs device delete missing" (it's balance in principle, I think) would take weeks, during which a second drive could easily die. Again, running balance would be more exercise there, to see if the newer kernel still crashes.


An entire generation of folks have grown used to defraging windows
boxes and all, but if you've already got an array that is going to
take "many days" to balance what benefit do you actually expect to
receive?

For me - it's a good test to see if btrfs is finally getting stable (some cases explained above).


Defrag -- used for "I think I'm getting a lot of unnecessary head seek
in this application, these files need to be brought into closer
order".

Fragmentation was an issue for btrfs, at least a few kernels back (as explained above, with rsync's --inplace). However, I'm not running autodefrag anywhere - not sure how it affects snapshots.


Scrub -- used for defensive checking a-la checkdisk. "I suspect that
after that unexpected power outage something may be a little off", or
alternately "I think my disks are giving me bitrot, I better check".

For me, it was passing fine, where balance was crashing the kernel.


Again, my main rationale for running balance is to see if btrfs is behaving stable. While I have systems with btrfs which are running fine for months, I also have ones which will crash after 1-2 weeks (once the system grows in size / complexity).

So hopefully, btrfsck had fixed that fs - once it is running stable for a week or two, I might be brave to re-enable btrfs quotas (was another system freezer, at least a few kernels back).


--
Tomasz Chmielewski
http://www.sslrack.com

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to