On 12/13/2014 12:16 AM, Tomasz Chmielewski wrote:
On 2014-12-12 23:58, Robert White wrote:
I don't have the history to answer this definitively, but I don't
think you have a choice. Nothing else is going to touch that error.
I have not seen any "oh my god, btrfsck just ate my filesystem errors"
since I joined the list -- but I am a relative newcomer.
I know that you, of course, as a contentious and well-traveled system
administrator, already have a current backup since you are doing
storage maintenance... right? 8-)
Who needs backups with btrfs, right? :)
So apparently btrfsck --repair fixed some issues, the fs is still
mountable and looks fine.
Running balance again, but that will take many days there.
Might I ask why you are running balance? After a persistent error I'd
understand going straight to scrub, but balance is usually for
transformation or to redistribute things after atypical use.
An entire generation of folks have grown used to defraging windows boxes
and all, but if you've already got an array that is going to take "many
days" to balance what benefit do you actually expect to receive?
Defrag -- used for "I think I'm getting a lot of unnecessary head seek
in this application, these files need to be brought into closer order".
Scrub -- used for defensive checking a-la checkdisk. "I suspect that
after that unexpected power outage something may be a little off", or
alternately "I think my disks are giving me bitrot, I better check".
Btrfsck -- used for "I suspect structural problems caused by real world
events like power hits or that one time when the cat knocked over my
tower case while I was vacuuming all my sql tables." (often reserved for
"hey, I'm getting weird messages from the kernel about things in my
filesystem".)
Balance -- primary -- used for "Well I used to use this filessytem for a
small number of large files, but now I am processing a large number of
small files and I'm running out of metadata even though I've got a lot
of space." (or vice versa)
Balance -- other -- used for "I just changed the geometry of my
filessytem by adding or removing a disk and I want to spread out.
Balance -- (conversion/restructuring) -- used for "single is okay, but
I'd rather raid-0 to spread out my load across these many disks" or
"gee, I'd like some redundancy now that I have the room.
Frequent balancing of a Copy On Write filesystem will tend to make
things somewhat anti-optimal. You are burping the natural working space
out of the natural layout.
Since COW implies mandatory movement of data, every time you burp out
all the slack and pack all the data together you are taking your
regularly modified files and moving them far, far away from the places
where frequently modified files are most happy (e.g. the
only-partly-full data region they were just living in).
Similarly two files that usually get modified at the same time (say a
databse file and its rollback log) will tend to end up in the same
active data extent as time goes on, and if balance decides it can "clean
up" that extent it will likely give those two files a data-extent
divorce and force them to the opposite ends of dataland.
COW systems are inherently somewhat chaotic. If you fight that too
aggressively you will, at best, be wasting the maintenance time.
It may be a decrease in performance measured in very small quanta, but
so is the expected benefit of most maintenance.
From the wiki::
https://btrfs.wiki.kernel.org/index.php/FAQ#What_does_.22balance.22_do.3F
btrfs filesystem balance is an operation which simply takes all of the
data and metadata on the filesystem, and re-writes it in a different
place on the disks, passing it through the allocator algorithm on the
way. It was originally designed for multi-device filesystems, to spread
data more evenly across the devices (i.e. to "balance" their usage).
This is particularly useful when adding new devices to a nearly-full
filesystem.
Due to the way that balance works, it also has some useful side-effects:
If there is a lot of allocated but unused data or metadata chunks, a
balance may reclaim some of that allocated space. This is the main
reason for running a balance on a single-device filesystem.
On a filesystem with damaged replication (e.g. a RAID-1 FS with a dead
and removed disk), it will force the FS to rebuild the missing copy of
the data on one of the currently active devices, restoring the RAID-1
capability of the filesystem.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html