Re: btrfs won't mount any more
On Thu, Apr 13, 2017 at 10:45:09AM +0200, Marc Haber wrote: > I do have a dd copy of the device now. > > $ sudo losetup --find --show ./dropbtr0.btrfs > $ sudo mount -o skip_balance -t btrfs /dev/loop0 /mnt/tempdisk > > does immediately result in: > > Apr 12 22:37:48 fan kernel: [ 124.742104] loop: module loaded > Apr 12 22:37:48 fan kernel: [ 124.784727] BTRFS: device label dropbtr0 devid > 1 transid 1530529 /dev/loop0 > Apr 12 22:38:07 fan kernel: [ 143.120268] BTRFS info (device loop0): disk > space caching is enabled > Apr 12 22:38:07 fan kernel: [ 143.207872] BUG: unable to handle kernel NULL > pointer dereference at 01f0 This is now https://bugzilla.kernel.org/show_bug.cgi?id=195631 Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs won't mount any more
On Thu, Apr 13, 2017 at 10:45:09AM +0200, Marc Haber wrote: > On Tue, Apr 11, 2017 at 06:15:02PM +0200, Adam Borowski wrote: > > Ouch, this is generally harmless unless your disk lies about barriers. > > Btrfs absolutely depends on them, and tends to suffer catastrophic > > corruption if writes were reordered when they shouldn't. > > So if the disk would actually lie, I would have had much trouble even > earlier. It's an SSD from 2013 or 2014, I think from Kingston. The box > is offline and remote at the moment, so I cannot give the exact type. I have re-built the btrfs and restored from backup, so I can access the disk again. It's a Crucial/Micron RealSSD m4/C400/P400 M4-CT256M4SSD1 with 256 GB Capacity. I do have an image of the bad btrfs that makes the kernel oops on mount, reproducibly. Does this help in debugging? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs won't mount any more
On Tue, Apr 11, 2017 at 06:15:02PM +0200, Adam Borowski wrote: > On Tue, Apr 11, 2017 at 09:15:31AM +0200, Marc Haber wrote: > > I have wrecked another btrfs file system, probably for good this time. > > > > It's a 80 GB filesystem from 2015, in my secondary notebook, on an > > encrypted SSD. The btrfs holds the root filesystem and the rest of the > > system as well. > > > > I have a cronjob that makes snapshots of the system directories daily, > > and of /home every ten minutes. A second cronjob cleans up old snapshots > > so that the number of snapshots present is about between 400 and 600. > > This is the key feature that made me decide for btrfs in the first > > place. > > > > Last week (I was on kernel 4.10.8 with Debian unstable), I was forced to > > promote the secondary laptop to the primary one which resulted in > > serious work being done on the first time. Over time, the filesystem > > filled up without me noticing and was finally 100% full. > > CoW and log-structured filesystems in general tend to take 100% full > conditions far worse than traditional filesystems, but it still should > result only in performance degradation and/or metadata-vs-data issues rather > than a fatal error. So if this is the cause, you obviously hit a bug. Given that btrfs has a reputation of not gracefully handling out of space situations, the trouble was expected. > > I then cleaned up about four gigs by deleting a couple of redundant ISO > > images and some snapshots that were not due for regular deletion yet. I > > then started a btrfs balance / -d50, unfortunately without stopping the > > snapshot-making cronjob. This resulted in the notebook becoming > > unuseable for extended periods of time, without even being able to log > > in. After running for some 30 hours, the notebook ran out of battery > > (don't ask, stupid me). > > Ouch, this is generally harmless unless your disk lies about barriers. > Btrfs absolutely depends on them, and tends to suffer catastrophic > corruption if writes were reordered when they shouldn't. So if the disk would actually lie, I would have had much trouble even earlier. It's an SSD from 2013 or 2014, I think from Kingston. The box is offline and remote at the moment, so I cannot give the exact type. Between the btrfs and the actual disk there was a dm-crypt/LUKS layer and LVM, but I can reproduce the crash from an image on a different host now. > Even in such a case, using an older root would help, although that > possibility is almost certainly gone now. How would I try that? A pointer to the docs is fine. > > After rebooting, the btrfs balance proceeded immediately after mounting > > the root fs. System unuseable again. After a day, I finally had a root > > shell and was able to issue a btrfs cancel /. Unfortunately, the system > > didn't care about that command and happily continued to balance. After > > some more 30 hours, I lost patience and resetted the system. > > Mounting with -o skip_balance may help. No, same issue. > Two years old is not much, the format nor its use hasn't changed noticeably > since then. You run the very latest upstream stable kernel, with its almost > freshest version (4.10.9 was tagged Saturday). 400-600 snapshots is nothing > remarkable, it's the usual range. The only thing differing from the most > typical usage is your snapshot frequency, and even that is nothing > frightening. Doesn't SuSE's snapper do snapshots every ten minutes? I can reproduce the crash on 4.10.10 now. > Thus, a failure like yours in mainstream use is certainly interesting. > > However, I have a piece of advice for now: could you make a copy of the > filesystem? 80GB is _nothing_: it's way below the accuracy of du -h on a > modern HDD, and not a burden for a typical SSD. Being able to investigate > it from a bigger machine would be convenient, and having a copy would let > you use dangerous rescue methods without any risk. And debugging oopses on > a laptop with no working serial or netconsole sucks; if you have no other > machine at hand then running the victim kernel in qemu-kvm might offer a > poor-man's console. qemu-kvm is also a good idea, up to now I have logs from a physical box mounting the btrfs image loopback. > For advice for your specific case, we can't do much without seeing the > actual error messages. I do have a dd copy of the device now. $ sudo losetup --find --show ./dropbtr0.btrfs $ sudo mount -o skip_balance -t btrfs /dev/loop0 /mnt/tempdisk does immediately result in: Apr 12 22:37:48 fan kernel: [ 124.742104] loop: module loaded Apr 12 22:37:48 fan kernel: [ 124.784727] BTRFS: device label dropbtr0 devid 1 transid 1530529 /dev/loop0 Apr 12 22:38:07 fan kernel:
btrfs won't mount any more
Hi, I have wrecked another btrfs file system, probably for good this time. It's a 80 GB filesystem from 2015, in my secondary notebook, on an encrypted SSD. The btrfs holds the root filesystem and the rest of the system as well. I have a cronjob that makes snapshots of the system directories daily, and of /home every ten minutes. A second cronjob cleans up old snapshots so that the number of snapshots present is about between 400 and 600. This is the key feature that made me decide for btrfs in the first place. Last week (I was on kernel 4.10.8 with Debian unstable), I was forced to promote the secondary laptop to the primary one which resulted in serious work being done on the first time. Over time, the filesystem filled up without me noticing and was finally 100% full. I then cleaned up about four gigs by deleting a couple of redundant ISO images and some snapshots that were not due for regular deletion yet. I then started a btrfs balance / -d50, unfortunately without stopping the snapshot-making cronjob. This resulted in the notebook becoming unuseable for extended periods of time, without even being able to log in. After running for some 30 hours, the notebook ran out of battery (don't ask, stupid me). After rebooting, the btrfs balance proceeded immediately after mounting the root fs. System unuseable again. After a day, I finally had a root shell and was able to issue a btrfs cancel /. Unfortunately, the system didn't care about that command and happily continued to balance. After some more 30 hours, I lost patience and resetted the system. To be able to keep control of the system and to monitor operations from remote, I installed a fresh copy of Debian unstable with the same 4.10.8 kernel on an USB stick and booted the notebook from the stick. I brought up the system and tried to mount the btrfs. The mount process quickly went up to 100 % CPU usage and stayed that way until I went to bed last night. This morning, the machine had dropped off the network (couldn't ping the default gateway any more despite the network looked fine), and spewed kernel oopses of about 80 lines (too long to scroll back even) every few seconds. I will try to tweak kernel.printk tonight so that I get my console back and see whether the oopses are also in journal, dmesg or syslog so that I can copypaste them. I also have a reasonably current backup of the filesystem so nuking it from orbit is an option, I would however hate losing my snapshots. Is it worthwhile to save information about the borked filesystem, or does the btrfs community just dont care about a heavily snapshotted two years old filesystem? I would like to hear comments and opinions about what has happened here and how to avoid things like that in the future. Do more recently created btrfs filesystems have safeguards against damage that may occur when a filesystem fills up? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "not a btrfs filesystem"
On Sat, Jan 21, 2017 at 03:52:19PM +0100, Hans van Kranenburg wrote: > You have to point balance to the mount point of the banana, not to the > block device. (balance does its work while the file system is mounted) Idiot me. I always forget that. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"not a btrfs filesystem"
I have a file system that I can show, check, but not rebalance: 1 [4/3420]mh@fan:~ $ sudo btrfs fi show /dev/mapper/banana-root Label: none uuid: b2906231-70a9-46d9-9830-38a13cb73171 Total devices 1 FS bytes used 1.82GiB devid1 size 6.00GiB used 3.69GiB path /dev/mapper/banana-root 1 [10/3426]mh@fan:~ $ sudo btrfs check /dev/mapper/banana-root Checking filesystem on /dev/mapper/banana-root UUID: b2906231-70a9-46d9-9830-38a13cb73171 checking extents checking free space cache checking fs roots checking csums checking root refs found 1954635785 bytes used err is 0 total csum bytes: 1833964 total tree bytes: 75907072 total fs tree bytes: 64241664 total extent tree bytes: 7843840 btree space waste bytes: 21224578 file data blocks allocated: 2168119296 referenced 2088648704 [11/3427]mh@fan:~ $ sudo btrfs balance /dev/mapper/banana-root ERROR: not a btrfs filesystem: /dev/mapper/banana-root 1 [12/3428]mh@fan:~ $ The filesystem is on a logical volume which is on a PV which is in a loop device that is on a disk file: [12/3428]mh@fan:~ $ sudo lvs LV VG Attr LSize rootbanana -wi-a---p- 6,00g [13/3429]mh@fan:~ $ sudo vgs VG #PV #LV #SN Attr VSize VFree banana 1 2 0 wz-pn- 7,16g 164,00m [14/3430]mh@fan:~ $ sudo pvs PV VG Fmt Attr PSize PFree [unknown] banana lvm2 a-m7,16g 164,00m (no idea why this says "unknown" here, it is [15/3431]mh@fan:~ $ ls -al /dev/mapper/loop0p2 lrwxrwxrwx 1 root root 8 Jan 21 14:40 /dev/mapper/loop0p2 -> ../dm-39 [16/3432]mh@fan:~ $ ls -al /dev/dm-39 brw-rw 1 root disk 254, 39 Jan 21 14:40 /dev/dm-39 [24/3440]mh@fan:~ $ sudo kpartx -l banana.sdcard loop0p1 : 0 497664 /dev/loop0 2048 loop0p2 : 0 15024128 /dev/loop0 499712 [25/3441]mh@fan:~ $ sudo losetup --list NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE DIO /dev/loop0 0 0 0 0 /home/mh/banana.sdcard 0 [26/3442]mh@fan:~ $ Can a btrfs be so broken that btrfs balance doesn't recognize it any more? What is going on with this file system? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is stability a joke? (wiki updated)
On Mon, Sep 12, 2016 at 02:44:35PM -0600, Chris Murphy wrote: > Just to cut yourself some slack, you could skip 3.14 because it's EOL > now, and just go from 4.4. Don't the btrfs-tools used to create the filesystem also play a huge role in this game? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
hi, On Thu, Aug 25, 2016 at 05:56:18PM -0600, Chris Murphy wrote: > Anyway it's a known problem, I don't think it's fixed still. There's a > lot of enospc work in 4.8 so eventually it'll make sense to give it a > shot with that kernel. assuming that I'm willing to try that, will a successful rebalance with 4.8 fix a filesystem, or is the recommended way still "backup, format, restore, lose all snapshots"? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs filesystem keeps allocating new chunks for no apparent reason
On Thu, Jun 09, 2016 at 01:10:46AM +0200, Hans van Kranenburg wrote: > So, instead of being the cause, apt-get update causing a new chunk to be > allocated might as well be the result of existing ones already filled up > with too many fragments. > > The next question is what files these extents belong to. To find out, I need > to open up the extent items I get back and follow a backreference to an > inode object. Might do that tomorrow, fun. Does your apt use pdiffs to update the packages lists? If yes, I'd try turning it off just for the fun of it and to see whether this changes btrfs' allocation behavior. I have never looked at apt's pdiff stuff in detail, but I guess that it creates many tiny temporary files. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended why to use btrfs for production?
On Fri, Jun 03, 2016 at 11:49:09AM +0200, Martin wrote: > We would like to use urBackup to make laptop backups, and they mention > btrfs as an option. > > https://www.urbackup.org/administration_manual.html#x1-8400010.6 > > So if we go with btrfs and we need 100TB usable space in raid6, and to > have it replicated each night to another btrfs server for "backup" of > the backup, how should we then install btrfs? Do you plan to use Snapshots? How many of them? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "No space left on device" and balance doesn't work
On Fri, Jun 03, 2016 at 12:45:51AM +0200, Henk Slager wrote: > The setup looks all pretty normal and btrfs should be able to handle > it, but unfortunately your fs is a typical example that one currently > needs to monitor/tune a btrfs fs for its 'health' in order to keep it > running longterm. What kind of work is being done to address this major usability issue? What is the timeframe for a fix? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad metadata crossing stripe boundary
On Sat, Apr 02, 2016 at 01:41:53PM -0600, Chris Murphy wrote: > On Thu, Mar 31, 2016 at 11:57 PM, Marc Haber > <mh+linux-bt...@zugschlus.de> wrote: > > On Thu, Mar 31, 2016 at 11:16:30PM +0200, Kai Krakow wrote: > >> Am Thu, 31 Mar 2016 23:00:04 +0200 > >> schrieb Marc Haber <mh+linux-bt...@zugschlus.de>: > >> > I find it somewhere between funny and disturbing that the first call > >> > of btrfs check made my kernel log the following: > >> > Mar 31 22:45:36 fan kernel: [ 6253.178264] EXT4-fs (dm-31): mounted > >> > filesystem with ordered data mode. Opts: (null) Mar 31 22:45:38 fan > >> > kernel: [ 6255.361328] BTRFS: device label fanbtr devid 1 transid > >> > 67526 /dev/dm-31 > >> > > >> > No, the filesystem was not converted, it was directly created as > >> > btrfs, and no, I didn't try mounting it. > >> > >> I suggest that your partition contained ext4 before, and you didn't run > >> wipefs before running mkfs.btrfs. > > > > I cryptsetup luksFormat'ted the partition before I mkfs.btrfs'ed it. > > That should do a much better job than wipefsing it, shouldnt it? > > Not really. The first btrfs super is at 64K. The second at 64M. The > third at 256G. While wipefs will remove the magic only on the first, > mkfs.btrfs will take care of all three. And luksFormat only overwrites > the first 132K of a block device. There's a scant chance of bugs > related to previous filesystems not being erased, I think this is more > likely when mixing and matching filesystems just because the > superblocks for each filesystem aren't in the same location. If I do: umount /dev/mapper/foo cryptsetup close /dev/mapper/foo cryptsetup luksFormat /dev/mapper/pv-c_foo cryptsetup open /dev/mapper/pv-c_foo foo and the contents of /dev/mapper/foo would randomly resemble its previous contents afterwards, I would be _very_ disturbed. During the luksFormat process, a new random symmetric key is created, and overwrites the old random symmetric key in the LUKS header. Therefore, the following crypto operations are _very_ unlikely to produce something that resembles an ext4 fileystem. Even if I did: umount /dev/mapper/foo cryptsetup close /dev/mapper/foo mkfs.btrfs /dev/mapper/pv-c_foo (assuming I previously did cryptsetup open /dev/mapper/pv-c_foo foo) I would be _very_ surprised if the kernel would find something resembling and ext4 file system on /dev/mapper/pv-c_foo. > If you're concerned about traces of previous file systems, then use > the dmcrypt device itself, rather than merely using the original block > device where merely 132K at the beginning has been overwritten. > Everytime you format a device, the resulting dmcrypt logical device is > in effect full of completely random data. A new random key is > generated each time you use luksFormat, even if you're using the same > passphrase. That's what I am saying. I must be missing something. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad metadata crossing stripe boundary
On Sat, Apr 02, 2016 at 08:31:17PM +0200, Kai Krakow wrote: > Am Sat, 2 Apr 2016 11:44:32 +0200 > schrieb Marc Haber <mh+linux-bt...@zugschlus.de>: > > > On Sat, Apr 02, 2016 at 11:03:53AM +0200, Kai Krakow wrote: > > > Am Fri, 1 Apr 2016 07:57:25 +0200 > > > schrieb Marc Haber <mh+linux-bt...@zugschlus.de>: > > > > On Thu, Mar 31, 2016 at 11:16:30PM +0200, Kai Krakow wrote: > > [...] > > [...] > > [...] > > > > > > > > I cryptsetup luksFormat'ted the partition before I mkfs.btrfs'ed > > > > it. That should do a much better job than wipefsing it, shouldnt > > > > it? > > > > > > Not sure how luksFormat works. If it encrypts what is already on the > > > device, it would also encrypt orphan superblocks. > > > > It overwrites the LUKS metadata including the symmetric key that was > > used to encrypt the existing data. Short of Shor's Algorithm and > > Quantum Computers, after that operation it is no longer possible to > > even guess what was on the disk before. > > If it was encrypted before... ;-) First, it was. Second, cleartext found on the block device is quite unlikely to be readable from the unlocked crypto device. I would be very worried if that were the case. I must be missing something here. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad metadata crossing stripe boundary
On Sat, Apr 02, 2016 at 11:03:53AM +0200, Kai Krakow wrote: > Am Fri, 1 Apr 2016 07:57:25 +0200 > schrieb Marc Haber <mh+linux-bt...@zugschlus.de>: > > On Thu, Mar 31, 2016 at 11:16:30PM +0200, Kai Krakow wrote: > > > Am Thu, 31 Mar 2016 23:00:04 +0200 > > > schrieb Marc Haber <mh+linux-bt...@zugschlus.de>: > > > > I find it somewhere between funny and disturbing that the first > > > > call of btrfs check made my kernel log the following: > > > > Mar 31 22:45:36 fan kernel: [ 6253.178264] EXT4-fs (dm-31): > > > > mounted filesystem with ordered data mode. Opts: (null) Mar 31 > > > > 22:45:38 fan kernel: [ 6255.361328] BTRFS: device label fanbtr > > > > devid 1 transid 67526 /dev/dm-31 > > > > > > > > No, the filesystem was not converted, it was directly created as > > > > btrfs, and no, I didn't try mounting it. > > > > > > I suggest that your partition contained ext4 before, and you didn't > > > run wipefs before running mkfs.btrfs. > > > > I cryptsetup luksFormat'ted the partition before I mkfs.btrfs'ed it. > > That should do a much better job than wipefsing it, shouldnt it? > > Not sure how luksFormat works. If it encrypts what is already on the > device, it would also encrypt orphan superblocks. It overwrites the LUKS metadata including the symmetric key that was used to encrypt the existing data. Short of Shor's Algorithm and Quantum Computers, after that operation it is no longer possible to even guess what was on the disk before. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another ENOSPC situation
On Fri, Apr 01, 2016 at 09:20:52PM +0200, Henk Slager wrote: > On Fri, Apr 1, 2016 at 6:50 PM, Marc Haber <mh+linux-bt...@zugschlus.de> > wrote: > > On Fri, Apr 01, 2016 at 06:30:20PM +0200, Marc Haber wrote: > >> On Fri, Apr 01, 2016 at 05:44:30PM +0200, Henk Slager wrote: > >> > On Fri, Apr 1, 2016 at 3:40 PM, Marc Haber <mh+linux-bt...@zugschlus.de> > >> > wrote: > >> > > btrfs balance -mprofiles seems to do something. one kworked and one > >> > > btrfs-transaction process hog one CPU core each for hours, while > >> > > blocking the filesystem for minutes apiece, which leads to the host > >> > > being nearly unuseable up to the point of "clock and mouse pointer > >> > > frozen for nearly ten minutes". > >> > > >> > I assume you still have your every 10 minutes snapshotting running > >> > while balancing? > >> > >> No, I disabled the cronjob before trying the balance. I might be > >> crazy, but not stup^wunexperienced. > > > > That being said, I would still expect the code not to allow _this_ > > kind of effect on the entire system when two alledgely incompatible > > operations run simultaneously. I mean, Linux is a multi-user, > > multi-tasking operating system where one simply cannot expect all > > processes to be cooperative to each other. We have the operating > > systems to prevent this kind of issues, not to cause them. > > Maybe look at it differently: Does user mh have trouble using this > laptop w.r.t. storing files? No. I would have cried murder otherwise. > In openSUSE Tumbleweed (the snapshot from end of march), root access > is needed to change the default snapshotting config, otherwise you > will have a 10 year history. After that change has been done according > to needs of the user, there is no need to run manual balance. So you are saying the balancing a filesystem should never be necessary? Or what are you trying to say? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another ENOSPC situation
On Fri, Apr 01, 2016 at 06:30:20PM +0200, Marc Haber wrote: > On Fri, Apr 01, 2016 at 05:44:30PM +0200, Henk Slager wrote: > > On Fri, Apr 1, 2016 at 3:40 PM, Marc Haber <mh+linux-bt...@zugschlus.de> > > wrote: > > > btrfs balance -mprofiles seems to do something. one kworked and one > > > btrfs-transaction process hog one CPU core each for hours, while > > > blocking the filesystem for minutes apiece, which leads to the host > > > being nearly unuseable up to the point of "clock and mouse pointer > > > frozen for nearly ten minutes". > > > > I assume you still have your every 10 minutes snapshotting running > > while balancing? > > No, I disabled the cronjob before trying the balance. I might be > crazy, but not stup^wunexperienced. That being said, I would still expect the code not to allow _this_ kind of effect on the entire system when two alledgely incompatible operations run simultaneously. I mean, Linux is a multi-user, multi-tasking operating system where one simply cannot expect all processes to be cooperative to each other. We have the operating systems to prevent this kind of issues, not to cause them. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another ENOSPC situation
On Fri, Apr 01, 2016 at 05:44:30PM +0200, Henk Slager wrote: > On Fri, Apr 1, 2016 at 3:40 PM, Marc Haber <mh+linux-bt...@zugschlus.de> > wrote: > > btrfs balance -mprofiles seems to do something. one kworked and one > > btrfs-transaction process hog one CPU core each for hours, while > > blocking the filesystem for minutes apiece, which leads to the host > > being nearly unuseable up to the point of "clock and mouse pointer > > frozen for nearly ten minutes". > > I assume you still have your every 10 minutes snapshotting running > while balancing? No, I disabled the cronjob before trying the balance. I might be crazy, but not stup^wunexperienced. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
On Sat, Feb 27, 2016 at 10:14:50PM +0100, Marc Haber wrote: > I have again the issue of no space left on device while rebalancing > (with btrfs-tools 4.4.1 on kernel 4.4.2 on Debian unstable): just for the record: The host started acting up in more and more interesting ways, and after a call of rm during kernel build resulted in SIGSEGV, I did the backup-format-restore routine for this system back to ext4 just to find out whether I have bad hardware or a bad filesystem. And, since going back to ext4, the system is just fine again. So it's not bad hardware. This systems's root drive is going to stay on ext4 for a loong time. If I get the btrfs phenomena I experience on other hosts get solved at some time in the future, I might migrate /home back to btrfs, but that's not going to happen in the next six months. This is a really bad experience which has made me lost a lot of faith in the new filesystem. I really feel sad about that. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Another ENOSPC situation
9491] [] ? smpboot_thread_fn+0xf7/0x13a Apr 1 11:16:39 swivel kernel: [249949.509495] [] ? sort_range+0x17/0x17 Apr 1 11:16:39 swivel kernel: [249949.509500] [] ? kthread+0x95/0x9d Apr 1 11:16:39 swivel kernel: [249949.509505] [] ? kthread_parkme+0x16/0x16 Apr 1 11:16:39 swivel kernel: [249949.509510] [] ? ret_from_fork+0x3f/0x70 Apr 1 11:16:39 swivel kernel: [249949.509515] [] ? kthread_parkme+0x16/0x16 Apr 1 11:16:39 swivel kernel: [249949.509519] Mem-Info: Apr 1 11:16:39 swivel kernel: [249949.509529] active_anon:1107088 inactive_anon:326101 isolated_anon:0 Apr 1 11:16:39 swivel kernel: [249949.509529] active_file:1104846 inactive_file:1367650 isolated_file:0 Apr 1 11:16:39 swivel kernel: [249949.509529] unevictable:2526 dirty:14757 writeback:0 unstable:0 Apr 1 11:16:39 swivel kernel: [249949.509529] slab_reclaimable:56106 slab_unreclaimable:33051 Apr 1 11:16:39 swivel kernel: [249949.509529] mapped:67336 shmem:87440 pagetables:12012 bounce:0 Apr 1 11:16:39 swivel kernel: [249949.509529] free:30592 free_pcp:170 free_cma:0 Apr 1 11:16:39 swivel kernel: [249949.509538] Node 0 DMA free:15360kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15360kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Apr 1 11:16:39 swivel kernel: [249949.509553] lowmem_reserve[]: 0 3403 15919 15919 Apr 1 11:16:39 swivel kernel: [249949.509559] Node 0 DMA32 free:64968kB min:3436kB low:4292kB high:5152kB active_anon:475148kB inactive_anon:357880kB active_file:1173604kB inactive_file:1314960kB unevictable:3416kB isolated(anon):0kB isolated(file):0kB present:3561088kB managed:3487816kB mlocked:3416kB dirty:13592kB writeback:0kB mapped:55924kB shmem:70004kB slab_reclaimable:47096kB slab_unreclaimable:17888kB kernel_stack:2000kB pagetables:8308kB unstable:0kB bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:128 all_unreclaimable? no Apr 1 11:16:39 swivel kernel: [249949.509575] lowmem_reserve[]: 0 0 12516 12516 Apr 1 11:16:39 swivel kernel: [249949.509580] Node 0 Normal free:42040kB min:12648kB low:15808kB high:18972kB active_anon:3953204kB inactive_anon:946524kB active_file:3245780kB inactive_file:4155640kB unevictable:6688kB isolated(anon):0kB isolated(file):0kB present:13080576kB managed:12816596kB mlocked:6688kB dirty:45436kB writeback:0kB mapped:213420kB shmem:279756kB slab_reclaimable:177328kB slab_unreclaimable:114316kB kernel_stack:8688kB pagetables:39740kB unstable:0kB bounce:0kB free_pcp:764kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Apr 1 11:16:39 swivel kernel: [249949.509596] lowmem_reserve[]: 0 0 0 0 Apr 1 11:16:39 swivel kernel: [249949.509601] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15360kB Apr 1 11:16:39 swivel kernel: [249949.509619] Node 0 DMA32: 11548*4kB (UME) 2282*8kB (UME) 55*16kB (UM) 2*32kB (UM) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 65392kB Apr 1 11:16:39 swivel kernel: [249949.509638] Node 0 Normal: 3736*4kB (UME) 3206*8kB (UE) 131*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 42688kB Apr 1 11:16:39 swivel kernel: [249949.509657] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Apr 1 11:16:39 swivel kernel: [249949.509661] 2561271 total pagecache pages Apr 1 11:16:39 swivel kernel: [249949.509664] 616 pages in swap cache Apr 1 11:16:39 swivel kernel: [249949.509667] Swap cache stats: add 28221, delete 27605, find 294750/295285 Apr 1 11:16:39 swivel kernel: [249949.509670] Free swap = 8277324kB Apr 1 11:16:39 swivel kernel: [249949.509672] Total swap = 8386556kB Apr 1 11:16:39 swivel kernel: [249949.509674] 4164412 pages RAM Apr 1 11:16:39 swivel kernel: [249949.509676] 0 pages HighMem/MovableOnly Apr 1 11:16:39 swivel kernel: [249949.509678] 84469 pages reserved Apr 1 11:16:39 swivel kernel: [249949.509681] 0 pages hwpoisoned Apr 1 11:16:39 swivel kernel: [249949.509717] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. Apr 1 11:16:39 swivel kernel: [249949.537265] EXT4-fs (dm-16): re-mounted. Opts: data=ordered,commit=0 Apr 1 11:16:39 swivel systemd[1]: Reloading Laptop Mode Tools. Apr 1 11:16:39 swivel kernel: [249949.664133] thinkpad_acpi: EC reports that Thermal Table has changed Apr 1 11:16:39 swivel kernel: [249949.723795] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready This btrfs is ripe for the backup-format-restore procedure, right? Greetings Marc -- ---
Re: "bad metadata" not fixed by btrfs repair
On Thu, Mar 31, 2016 at 08:42:46PM +0200, Henk Slager wrote: > So also false alerts. btrfs-tools 4.5.1 with Qu's patch from patchwork doesnt show those warnings any more. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad metadata crossing stripe boundary
On Thu, Mar 31, 2016 at 11:16:30PM +0200, Kai Krakow wrote: > Am Thu, 31 Mar 2016 23:00:04 +0200 > schrieb Marc Haber <mh+linux-bt...@zugschlus.de>: > > I find it somewhere between funny and disturbing that the first call > > of btrfs check made my kernel log the following: > > Mar 31 22:45:36 fan kernel: [ 6253.178264] EXT4-fs (dm-31): mounted > > filesystem with ordered data mode. Opts: (null) Mar 31 22:45:38 fan > > kernel: [ 6255.361328] BTRFS: device label fanbtr devid 1 transid > > 67526 /dev/dm-31 > > > > No, the filesystem was not converted, it was directly created as > > btrfs, and no, I didn't try mounting it. > > I suggest that your partition contained ext4 before, and you didn't run > wipefs before running mkfs.btrfs. I cryptsetup luksFormat'ted the partition before I mkfs.btrfs'ed it. That should do a much better job than wipefsing it, shouldnt it? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad metadata crossing stripe boundary
On Thu, Mar 31, 2016 at 10:31:49AM +0800, Qu Wenruo wrote: > Would you please try the following patch based on v4.5 btrfs-progs? > https://patchwork.kernel.org/patch/8706891/ This also fixes the "bad metadata crossing stripe boundary" on my pet patient. I find it somewhere between funny and disturbing that the first call of btrfs check made my kernel log the following: Mar 31 22:45:36 fan kernel: [ 6253.178264] EXT4-fs (dm-31): mounted filesystem with ordered data mode. Opts: (null) Mar 31 22:45:38 fan kernel: [ 6255.361328] BTRFS: device label fanbtr devid 1 transid 67526 /dev/dm-31 No, the filesystem was not converted, it was directly created as btrfs, and no, I didn't try mounting it. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to cancel btrfs balance on unmounted filesystem
On Thu, Mar 31, 2016 at 01:01:37PM +0500, Roman Mamedov wrote: > On Thu, 31 Mar 2016 08:21:12 +0200 > Marc Haber <mh+linux-bt...@zugschlus.de> wrote: > > the balance restarts immediately after mounting > > You can use the skip_balance mount option to prevent that. Thanks. I now have this in all fstabs. On the system in questionl, I was able to sneak in a btrfs balance cancel before the system hanged itself. Mar 31 08:17:42 fan kernel: [ 240.595465] INFO: task kworker/u16:0:6 blocked for more than 120 seconds. Mar 31 08:17:42 fan kernel: [ 240.595604] Tainted: GW 4.4.6-zgws1 #2 Mar 31 08:17:42 fan kernel: [ 240.595705] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 31 08:17:42 fan kernel: [ 240.595845] kworker/u16:0 D 88062fc956c0 0 6 2 0x Mar 31 08:17:42 fan kernel: [ 240.595913] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] Mar 31 08:17:42 fan kernel: [ 240.595919] 88017ca680c0 0002 88017ca78000 88017ca77ca0 Mar 31 08:17:42 fan kernel: [ 240.595927] 8800c9388960 0002 81409e1c 88017ca680c0 Mar 31 08:17:42 fan kernel: [ 240.595934] 81408329 7fff 81409e5a 00c0a044e7d3 Mar 31 08:17:42 fan kernel: [ 240.595941] Call Trace: Mar 31 08:17:42 fan kernel: [ 240.595955] [] ? usleep_range+0x35/0x35 Mar 31 08:17:42 fan kernel: [ 240.595964] [] ? schedule+0x6f/0x7c Mar 31 08:17:42 fan kernel: [ 240.595973] [] ? schedule_timeout+0x3e/0x128 Mar 31 08:17:42 fan kernel: [ 240.595981] [] ? cache_alloc+0x1bd/0x277 Mar 31 08:17:42 fan kernel: [ 240.595990] [] ? __wait_for_common+0x121/0x16d Mar 31 08:17:42 fan kernel: [ 240.595997] [] ? __wait_for_common+0x121/0x16d Mar 31 08:17:42 fan kernel: [ 240.596006] [] ? wake_up_q+0x3b/0x3b Mar 31 08:17:42 fan kernel: [ 240.596047] [] ? btrfs_async_run_delayed_refs+0xbf/0xd5 [btrfs] Mar 31 08:17:42 fan kernel: [ 240.596093] [] ? __btrfs_end_transaction+0x291/0x2d5 [btrfs] Mar 31 08:17:42 fan kernel: [ 240.596140] [] ? btrfs_finish_ordered_io+0x418/0x4d7 [btrfs] Mar 31 08:17:42 fan kernel: [ 240.596187] [] ? btrfs_scrubparity_helper+0xf4/0x233 [btrfs] Mar 31 08:17:42 fan kernel: [ 240.596198] [] ? process_one_work+0x178/0x27b Mar 31 08:17:42 fan kernel: [ 240.596206] [] ? worker_thread+0x1da/0x280 Mar 31 08:17:42 fan kernel: [ 240.596213] [] ? rescuer_thread+0x284/0x284 Mar 31 08:17:42 fan kernel: [ 240.596220] [] ? kthread+0x95/0x9d Mar 31 08:17:42 fan kernel: [ 240.596227] [] ? kthread_parkme+0x16/0x16 Mar 31 08:17:42 fan kernel: [ 240.596234] [] ? ret_from_fork+0x3f/0x70 Mar 31 08:17:42 fan kernel: [ 240.596240] [] ? kthread_parkme+0x16/0x16 Mar 31 08:17:42 fan kernel: [ 240.596272] INFO: task kworker/u16:2:134 blocked for more than 120 seconds. Mar 31 08:17:42 fan kernel: [ 240.596399] Tainted: GW 4.4.6-zgws1 #2 Mar 31 08:17:42 fan kernel: [ 240.596499] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 31 08:17:42 fan kernel: [ 240.596637] kworker/u16:2 D 88062fcd56c0 0 134 2 0x Mar 31 08:17:42 fan kernel: [ 240.596688] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] Mar 31 08:17:42 fan kernel: [ 240.596692] 8806130e4780 0003 880613108000 880613107ca0 Mar 31 08:17:42 fan kernel: [ 240.596699] 8805caa1d960 0002 81409e1c 8806130e4780 Mar 31 08:17:42 fan kernel: [ 240.596706] 81408329 7fff 81409e5a 88062fd556c0 Mar 31 08:17:42 fan kernel: [ 240.596712] Call Trace: Mar 31 08:17:42 fan kernel: [ 240.596721] [] ? usleep_range+0x35/0x35 Mar 31 08:17:42 fan kernel: [ 240.596728] [] ? schedule+0x6f/0x7c Mar 31 08:17:42 fan kernel: [ 240.596735] [] ? schedule_timeout+0x3e/0x128 Mar 31 08:17:42 fan kernel: [ 240.596742] [] ? check_preempt_curr+0x41/0x63 Mar 31 08:17:42 fan kernel: [ 240.596750] [] ? ttwu_do_wakeup+0xf/0xd0 Mar 31 08:17:42 fan kernel: [ 240.596757] [] ? __wait_for_common+0x121/0x16d Mar 31 08:17:42 fan kernel: [ 240.596764] [] ? __wait_for_common+0x121/0x16d Mar 31 08:17:42 fan kernel: [ 240.596771] [] ? wake_up_q+0x3b/0x3b Mar 31 08:17:42 fan kernel: [ 240.596812] [] ? btrfs_async_run_delayed_refs+0xbf/0xd5 [btrfs] Mar 31 08:17:42 fan kernel: [ 240.596858] [] ? __btrfs_end_transaction+0x291/0x2d5 [btrfs] Mar 31 08:17:42 fan kernel: [ 240.596904] [] ? btrfs_finish_ordered_io+0x418/0x4d7 [btrfs] Mar 31 08:17:42 fan kernel: [ 240.596952] [] ? btrfs_scrubparity_helper+0xf4/0x233 [btrfs] Mar 31 08:17:42 fan kernel: [ 240.596960] [] ? process_one_work+0x178/0x27b Mar 31 08:17:42 fan kernel: [ 240.596968] [] ? worker_thread+0x1da/0x280 Mar 31 08:17:42 fan kernel: [ 240.596976] [] ? rescuer_thread+0x284/0x284 Mar 31 08:17:42 fan kernel: [ 240.59
How to cancel btrfs balance on unmounted filesystem
Hi, one of my problem btrfs instances went into a hung process state while blancing metadata. This process is recorded in the file system somehow and the balance restarts immediately after mounting the filesystem with no chance to issue a btrfs balance cancel command before the system hangs again. Is there any possiblity to cancel the pending balance without mounting the fs first? I have also filed https://bugzilla.kernel.org/show_bug.cgi?id=115581 to adress this in a more elegant way. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "bad metadata" not fixed by btrfs repair
On Wed, Mar 30, 2016 at 04:03:17PM +0800, Qu Wenruo wrote: > Did your btrfs have enough *unallocated* space? 87 Gig out of a total 200 Gig Device size. I guess that should be enough for a rebalance of 2,8 Gig Metadata. Greetings Ma "please excuse my cynism" rc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "bad metadata" not fixed by btrfs repair
On Wed, Mar 30, 2016 at 03:00:19PM +0800, Qu Wenruo wrote: > Marc Haber wrote on 2016/03/29 08:43 +0200: > >On Mon, Mar 28, 2016 at 03:35:32PM -0400, Austin S. Hemmelgarn wrote: > >>Did you convert this filesystem from ext4 (or ext3)? > > > >No. > > > >>You hadn't mentioned what version of btrfs-progs you're using, and that is > >>somewhat important for recovery. I'm not sure if current versions of btrfs > >>check can fix this issue, but I know for a fact that older versions (prior > >>to at least 4.1) can not fix it. > > > >4.1 for creation and btrfs check. > > I assume that you have run older kernel on it, like v4.1 or v4.2. No, the productive system was always on a reasonably recent kernel. I guess that this instance of btrfs has never been mounted on anything older than 4.4.4. The rescue system I used to btrfs check (4.4-1 from Debian unstable, I updated btrfs-tools on the rescue system before going btrfs check) had kernel 3.16, but I have never actually mounted the btrfs there. > >Then btrfs check is a userspace-only matter, as it wants the fs > >unmounted, and it is irrelevant that I did btrfs check from a rescue > >system with an older kernel, 3.16 if I recall correctly. > > Not recommended to use older kernel to RW mount or use older fsck to do > repair. Oldest kernel that has mounted this btrfs is 4.4.4, fsck that touched the fs is 4.4. I'm trying to get hold of btrfs-tools 4.5. > >My "productive" desktops (fan is one of them) run Debian unstable with > >a current vanilla kernel. At the moment, I can't use 4.5 because it > >acts up with KVM. When I need a rescue system, I use grml, which > >unfortunately hasn't released since November 2014 and is still with > >kernel 3.16 > > To fix your problem(make these error message just disappear, even they are > harmless on recent kernels), the most easy one, is to balance your metadata. This does not work on kernel 4.4.6 with tools 4.4. Truckloads of kernel traces, "WARNING: CPU: 5 PID: 31021 at fs/btrfs/extent-tree.c:7897 btrfs_alloc_tree_block+0xeb/0x3d6 [btrfs]()", "BTRFS: block rsv returned -28", full trace is in this thread. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "bad metadata" not fixed by btrfs repair
On Mon, Mar 28, 2016 at 02:46:54PM -0600, Chris Murphy wrote: > http://git.kernel.org/cgit/linux/kernel/git/kdave/btrfs-progs.git/tree/cmds-check.c > line 7722 discusses this error message and it looks like there's no > repair function for it yet; uncertain what problems can result from > this. This basically means that I am ripe for a new mkfs.btrfs, right? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "bad metadata" not fixed by btrfs repair
On Tue, Mar 29, 2016 at 08:43:51AM +0200, Marc Haber wrote: > On Mon, Mar 28, 2016 at 03:35:32PM -0400, Austin S. Hemmelgarn wrote: > > As far as what the kernel is involved with, the easy way to check is if it's > > operating on a mounted filesystem or not. If it only operates on mounted > > filesystems, it almost certainly goes through the kernel, if it only > > operates on unmounted filesystems, it's almost certainly done in userspace > > (except dev scan and technically fi show). > > Then btrfs check is a userspace-only matter, as it wants the fs > unmounted, and it is irrelevant that I did btrfs check from a rescue > system with an older kernel, 3.16 if I recall correctly. And it also means that I should not try btrfs balance from grml because btrfs balance goes through the kernel code. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "bad metadata" not fixed by btrfs repair
arc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "bad metadata" not fixed by btrfs repair
On Mon, Mar 28, 2016 at 03:35:32PM -0400, Austin S. Hemmelgarn wrote: > Did you convert this filesystem from ext4 (or ext3)? No. > You hadn't mentioned what version of btrfs-progs you're using, and that is > somewhat important for recovery. I'm not sure if current versions of btrfs > check can fix this issue, but I know for a fact that older versions (prior > to at least 4.1) can not fix it. 4.1 for creation and btrfs check. > As far as what the kernel is involved with, the easy way to check is if it's > operating on a mounted filesystem or not. If it only operates on mounted > filesystems, it almost certainly goes through the kernel, if it only > operates on unmounted filesystems, it's almost certainly done in userspace > (except dev scan and technically fi show). Then btrfs check is a userspace-only matter, as it wants the fs unmounted, and it is irrelevant that I did btrfs check from a rescue system with an older kernel, 3.16 if I recall correctly. > 2. Regarding general support: If you're using an enterprise distribution > (RHEL, SLES, CentOS, OEL, or something similar), you are almost certainly > going to get better support from your vendor than from the mailing list or > IRC. My "productive" desktops (fan is one of them) run Debian unstable with a current vanilla kernel. At the moment, I can't use 4.5 because it acts up with KVM. When I need a rescue system, I use grml, which unfortunately hasn't released since November 2014 and is still with kernel 3.16 Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "bad metadata" not fixed by btrfs repair
On Mon, Mar 28, 2016 at 06:51:02PM +, Hugo Mills wrote: >"Could not find root 8" is harmless (and will be going away as a > message soon). It just means that systemd is probing the FS for > quotas, and you don't have quotas enabled. *phew* That message was not what I wanted to read on this filesystem. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "bad metadata" not fixed by btrfs repair
On Mon, Mar 28, 2016 at 04:37:14PM +0200, Marc Haber wrote: > I have a btrfs which btrfs check --repair doesn't fix: > > # btrfs check --repair /dev/mapper/fanbtr > bad metadata [4425377054720, 4425377071104) crossing stripe boundary > bad metadata [4425380134912, 4425380151296) crossing stripe boundary > bad metadata [4427532795904, 4427532812288) crossing stripe boundary > bad metadata [4568321753088, 4568321769472) crossing stripe boundary > bad metadata [4568489656320, 4568489672704) crossing stripe boundary > bad metadata [4571474493440, 4571474509824) crossing stripe boundary > bad metadata [4571946811392, 4571946827776) crossing stripe boundary > bad metadata [4572782919680, 4572782936064) crossing stripe boundary > bad metadata [4573086351360, 4573086367744) crossing stripe boundary > bad metadata [4574221041664, 4574221058048) crossing stripe boundary > bad metadata [4574373412864, 4574373429248) crossing stripe boundary > bad metadata [4574958649344, 4574958665728) crossing stripe boundary > bad metadata [4575996018688, 4575996035072) crossing stripe boundary > bad metadata [4580376772608, 4580376788992) crossing stripe boundary > repaired damaged extent references > Fixed 0 roots. > checking free space cache > checking fs roots > checking csums > checking root refs > enabling repair mode > Checking filesystem on /dev/mapper/fanbtr > UUID: 90f8d728-6bae-4fca-8cda-b368ba2c008e > cache and super generation don't match, space cache will be invalidated > found 97171628230 bytes used err is 0 > total csum bytes: 91734220 > total tree bytes: 3021848576 > total fs tree bytes: 2762784768 > total extent tree bytes: 148570112 > btree space waste bytes: 545440822 > file data blocks allocated: 308328280064 > referenced 177314340864 Mounting this filesystem gives: Mar 28 20:25:18 fan kernel: [ 20.979673] BTRFS error (device dm-16): could not find root 8 Mar 28 20:25:18 fan kernel: [ 20.979739] BTRFS error (device dm-16): could not find root 8 Mar 28 20:25:18 fan kernel: [ 20.980900] BTRFS error (device dm-16): could not find root 8 Mar 28 20:25:18 fan kernel: [ 20.980948] BTRFS error (device dm-16): could not find root 8 Mar 28 20:25:18 fan kernel: [ 20.981428] BTRFS error (device dm-16): could not find root 8 Mar 28 20:25:18 fan kernel: [ 20.981472] BTRFS error (device dm-16): could not find root 8 which is not detected by btrfs check. What is going on here? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"bad metadata" not fixed by btrfs repair
Hi, I have a btrfs which btrfs check --repair doesn't fix: # btrfs check --repair /dev/mapper/fanbtr bad metadata [4425377054720, 4425377071104) crossing stripe boundary bad metadata [4425380134912, 4425380151296) crossing stripe boundary bad metadata [4427532795904, 4427532812288) crossing stripe boundary bad metadata [4568321753088, 4568321769472) crossing stripe boundary bad metadata [4568489656320, 4568489672704) crossing stripe boundary bad metadata [4571474493440, 4571474509824) crossing stripe boundary bad metadata [4571946811392, 4571946827776) crossing stripe boundary bad metadata [4572782919680, 4572782936064) crossing stripe boundary bad metadata [4573086351360, 4573086367744) crossing stripe boundary bad metadata [4574221041664, 4574221058048) crossing stripe boundary bad metadata [4574373412864, 4574373429248) crossing stripe boundary bad metadata [4574958649344, 4574958665728) crossing stripe boundary bad metadata [4575996018688, 4575996035072) crossing stripe boundary bad metadata [4580376772608, 4580376788992) crossing stripe boundary repaired damaged extent references Fixed 0 roots. checking free space cache checking fs roots checking csums checking root refs enabling repair mode Checking filesystem on /dev/mapper/fanbtr UUID: 90f8d728-6bae-4fca-8cda-b368ba2c008e cache and super generation don't match, space cache will be invalidated found 97171628230 bytes used err is 0 total csum bytes: 91734220 total tree bytes: 3021848576 total fs tree bytes: 2762784768 total extent tree bytes: 148570112 btree space waste bytes: 545440822 file data blocks allocated: 308328280064 referenced 177314340864 # btrfs check --repair /dev/mapper/fanbtr checking extents bad metadata [4425377054720, 4425377071104) crossing stripe boundary bad metadata [4425380134912, 4425380151296) crossing stripe boundary bad metadata [4427532795904, 4427532812288) crossing stripe boundary bad metadata [4568321753088, 4568321769472) crossing stripe boundary bad metadata [4568489656320, 4568489672704) crossing stripe boundary bad metadata [4571474493440, 4571474509824) crossing stripe boundary bad metadata [4571946811392, 4571946827776) crossing stripe boundary bad metadata [4572782919680, 4572782936064) crossing stripe boundary bad metadata [4573086351360, 4573086367744) crossing stripe boundary bad metadata [4574221041664, 4574221058048) crossing stripe boundary bad metadata [4574373412864, 4574373429248) crossing stripe boundary bad metadata [4574958649344, 4574958665728) crossing stripe boundary bad metadata [4575996018688, 4575996035072) crossing stripe boundary bad metadata [4580376772608, 4580376788992) crossing stripe boundary repaired damaged extent references Fixed 0 roots. checking free space cache checking fs roots checking csums checking root refs enabling repair mode Checking filesystem on /dev/mapper/fanbtr UUID: 90f8d728-6bae-4fca-8cda-b368ba2c008e cache and super generation don't match, space cache will be invalidated found 97171628230 bytes used err is 0 total csum bytes: 91734220 total tree bytes: 3021848576 total fs tree bytes: 2762784768 total extent tree bytes: 148570112 btree space waste bytes: 545440822 file data blocks allocated: 308328280064 referenced 177314340864 How do I fix this? Does the kernel play a role in btrfs check --repair, or is this all a userspace matter? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New file system with same issue
On Tue, Mar 15, 2016 at 09:54:06AM -0400, Austin S. Hemmelgarn wrote: > On 2016-03-15 09:46, Marc Haber wrote: > >On Tue, Mar 15, 2016 at 11:52:30AM +0100, Holger Hoffstätte wrote: > >>On 03/14/16 21:13, Marc Haber wrote: > >>>Do I need to wait for clear_cache to finish, like until I see disk > >>>usage dropping? > >> > >>The cache isn't that big, so you won't see a huge drop. Just use the > >>disk normally for a few minutes, after some time the cache will be > >>written out again. > > > >Is it necessary to actually cause activity on the file system or is it > >ok to just let it sit there for an hour or so? > It should be OK to just let it sit there for ten or fifteen minutes. I'm > pretty certain that the free space cache gets rebuilt relatively quickly, > and I'm almost 100% certain that the old one gets dropped within seconds of > the FS being mounted with -o clear_cache. I've rebuilt the cache on the 64G > root filesystem on my laptop a couple of times before, and it consistently > appears to take about 2-3 minutes to do so at most (based on disk usage from > the kernel itself). In my case, atop has not seen any notable disk activity after mounting with -o clerar_cache. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New file system with same issue
On Tue, Mar 15, 2016 at 11:52:30AM +0100, Holger Hoffstätte wrote: > On 03/14/16 21:13, Marc Haber wrote: > > Do I need to wait for clear_cache to finish, like until I see disk > > usage dropping? > > The cache isn't that big, so you won't see a huge drop. Just use the > disk normally for a few minutes, after some time the cache will be > written out again. Is it necessary to actually cause activity on the file system or is it ok to just let it sit there for an hour or so? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
On Tue, Mar 15, 2016 at 02:29:32PM +0100, Marc Haber wrote: > After umounting and btrfs check the block device, things seem to be > fine now But, umounting the btrfs seemed to trigger the following kernel traces: Mar 15 14:21:30 fan kernel: [92308.377104] [ cut here ] Mar 15 14:21:30 fan kernel: [92308.377135] WARNING: CPU: 5 PID: 28243 at fs/btrfs/extent-tree.c:5380 bt rfs_free_block_groups+0x1bc/0x36f [btrfs]() Mar 15 14:21:30 fan kernel: [92308.377137] Modules linked in: vhost_net vhost macvtap macvlan tun iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp dummy ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative bridge stp llc snd_cmipci snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm_amd snd_mpu401_uart snd_opl3_lib snd_rawmidi kvm snd_hda_intel snd_seq_device snd_hda_codec snd_hda_core snd_hwdep amd64_edac_mod snd_pcm_oss edac_mce_amd irqbypass input_leds snd_mixer_oss pcspkr k10temp edac_core snd_pcm snd_timer snd i2c_piix4 asus_atk0110 soundcore acpi_cpufreq tpm_tis tpm sg processor evdev shpchp hwmon_vid autofs4 crc32c_generic btrfs xor raid6_pq ext4 crc16 mbcache jbd2 hmac sha256_ssse3 sha256_generic drbg ansi_cprng xts gf128mul algif_skcipher af_alg dm_crypt dm_mod hid_generic usbhid hid usb_storage sr_mod sd_mod cdrom ohci_pci r8169 mii amdkfd radeon i2c_algo_bit ahci ttm sym53c8xx libahci xhci_pci scsi_transport_spi drm_kms_helper ohci_hcd ehci_pci xhci_hcd libata ehci_hcd drm usbcore scsi_mod usb_common i2c_core button Mar 15 14:21:30 fan kernel: [92308.377203] CPU: 5 PID: 28243 Comm: umount Not tainted 4.4.5-zgws1 #2 Mar 15 14:21:30 fan kernel: [92308.377205] Hardware name: System manufacturer System Product Name/M5A88-V EVO, BIOS 160310/12/2012 Mar 15 14:21:30 fan kernel: [92308.377207] 005b 811dd418 0009 Mar 15 14:21:30 fan kernel: [92308.377210] 81051e21 a047a147 880600a28000 Mar 15 14:21:30 fan kernel: [92308.377212] 880600a28080 8805af7eea00 a047a147 880600a28000 Mar 15 14:21:30 fan kernel: [92308.377215] Call Trace: Mar 15 14:21:30 fan kernel: [92308.377221] [] ? dump_stack+0x5a/0x6f Mar 15 14:21:30 fan kernel: [92308.377224] [] ? warn_slowpath_common+0x8e/0xa3 Mar 15 14:21:30 fan kernel: [92308.377239] [] ? btrfs_free_block_groups+0x1bc/0x36f[btrfs] Mar 15 14:21:30 fan kernel: [92308.377252] [] ? btrfs_free_block_groups+0x1bc/0x36f[btrfs] Mar 15 14:21:30 fan kernel: [92308.377267] [] ? close_ctree+0x1e6/0x2f2 [btrfs] Mar 15 14:21:30 fan kernel: [92308.377271] [] ? generic_shutdown_super+0x64/0xdf Mar 15 14:21:30 fan kernel: [92308.377273] [] ? kill_anon_super+0x9/0xe Mar 15 14:21:30 fan kernel: [92308.377285] [] ? btrfs_kill_super+0xd/0x16 [btrfs] Mar 15 14:21:30 fan kernel: [92308.377288] [] ? deactivate_locked_super+0x2f/0x56 Mar 15 14:21:30 fan kernel: [92308.377291] [] ? cleanup_mnt+0x4f/0x6b Mar 15 14:21:30 fan kernel: [92308.377293] [] ? task_work_run+0x5d/0x71 Mar 15 14:21:30 fan kernel: [92308.377296] [] ? prepare_exit_to_usermode+0x70/0x99 Mar 15 14:21:30 fan kernel: [92308.377300] [] ? int_ret_from_sys_call+0x25/0x8f Mar 15 14:21:30 fan kernel: [92308.377302] ---[ end trace 18c6bb90b0c6c689 ]--- Mar 15 14:21:30 fan kernel: [92308.377303] [ cut here ] Mar 15 14:21:30 fan kernel: [92308.377318] WARNING: CPU: 5 PID: 28243 at fs/btrfs/extent-tree.c:5381 btrfs_free_block_groups+0x1d7/0x36f [btrfs]() Mar 15 14:21:30 fan kernel: [92308.377319] Modules linked in: vhost_net vhost macvtap macvlan tun iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp dummy ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative bridge stp llc snd_cmipci snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm_amd snd_mpu401_uart snd_opl3_lib snd_rawmidi kvm snd_hda_intel snd_seq_device snd_hda_codec snd_hda_core snd_hwdep amd64_edac_mod snd_pcm_oss edac_mce_amd irqbypass input_leds snd_mixer_oss pcspkr k10temp edac_core snd_pcm snd_timer snd i2c_piix4 asus_atk0110 soundcore acpi_cpufreq tpm_tis tpm sg processor evdev shpchp hwmon_vid autofs4 crc32c_generic btrfs xor raid6_pq ext4 crc16 mbcache jbd2 hmac sha256_ssse3 sha256_generic drbg ansi_cprng xts gf128mul algif_skcipher af_alg dm_crypt dm_mod hid_generic usbhid hid usb_storage sr_mod sd_mod cdrom ohci_pci r8169 mii amdkfd radeon i2c_algo_bit ahci ttm sym53c8xx libahci xhci_pci scsi_transport_spi drm_kms_helper ohci_hcd ehci_pci xhci_hcd libata ehci_hcd drm usbcore scsi_mod usb_common i2c_core button Mar 15 14:21:30 fan kernel: [92308.377362] CPU: 5 PID: 28243 Comm:
Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
On Mon, Mar 14, 2016 at 09:05:46PM +0100, Marc Haber wrote: > [10/509]mh@fan:~$ sudo btrfs check /media/tempdisk/ > Superblock bytenr is larger than device size > Couldn't open file system > [11/509]mh@fan:~$ After umounting and btrfs check the block device, things seem to be fine now: [34/532]mh@fan:~$ sudo btrfs check /dev/mapper/ofanbtr Checking filesystem on /dev/mapper/ofanbtr UUID: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3 checking extents checking free space cache checking fs roots checking csums checking root refs found 86554574954 bytes used err is 0 total csum bytes: 81815012 total tree bytes: 2476670976 total fs tree bytes: 2246311936 total extent tree bytes: 133201920 btree space waste bytes: 452859567 file data blocks allocated: 292994375680 referenced 132664688640 [35/533]mh@fan:~$ sudo btrfs check /dev/mapper/ofanbtr Checking filesystem on /dev/mapper/ofanbtr UUID: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3 checking extents checking free space cache checking fs roots checking csums checking root refs found 86554574954 bytes used err is 0 total csum bytes: 81815012 total tree bytes: 2476670976 total fs tree bytes: 2246311936 total extent tree bytes: 133201920 btree space waste bytes: 452859567 file data blocks allocated: 292994375680 referenced 132664688640 [36/533]mh@fan:~$ This does not indicate an error, does it? Greetings Marc, who would like to the tools a bit more explicit and consistent in whether they want the fs mounted, umounted, the mountpoint or the device on their command line -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
On Tue, Mar 15, 2016 at 01:15:33PM +0100, Henk Slager wrote: > On Tue, Mar 15, 2016 at 8:16 AM, Marc Haber <mh+linux-bt...@zugschlus.de> > wrote: > > On Tue, Mar 15, 2016 at 12:22:00AM +0100, Henk Slager wrote: > >> The other question is: What is mounted on /media/tempdisk/ ? > > > > The "old" btrfs filesystem "ofanbtr", formerly 417 GB in size, now > > resized to 300 GB. Does it need to be umounted to be checked? > > Yes, that's the whole point > > >> At least I think a check of the current 200GiB fs is needed. As it is > >> a rootfs and encrypted, some work is needed to make that happen. > > > > You suggested a btrfs check after looking at the image of "ofanbtr". > > Do you want me to check the new "fanbtr" also? > > I was not sure if 'ofanbtr' is an image created by btrfs-image or a > extra dd created image you might have locally. Both 'ofanbtr' and > 'fanbtr' have the same balance issue, but 'fanbtr' is created with > newer and known kernel+tools version I assume, so that's why the > suggestion. ofanbtr is the old btrfs, on /dev/mapper/ofanbtr: Label: 'ofanbtr' uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3 Total devices 1 FS bytes used 80.63GiB devid1 size 300.00GiB used 122.06GiB path /dev/mapper/ofanbtr it was created as 'fanbtr' in September, 300 GiB in Size, then - in February, I think, resized to 417 MiB to make room for more data and for balancing, used until March 7, and then renamed to ofanbtr with lvrename and btrfs fi label. It was then imaged, and then resized back to 300 GiB in the hope that this will fix the size issue. fanbtr is the new btrfs, on /dev/mapper/fanbtr: Label: 'fanbtr' uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e Total devices 1 FS bytes used 82.45GiB devid1 size 200.00GiB used 113.03GiB path /dev/mapper/fanbtr it was created on march 7, had the data from ofanbtr cp'ed over, and is being used as the active filesystem since then. It is smaller because I don't have much more room on the SSD. Both do have the same balance issue, yes. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
On Mon, Mar 14, 2016 at 01:00:13AM +0100, Henk Slager wrote: > On Sun, Mar 13, 2016 at 9:56 PM, Marc Haber <mh+linux-bt...@zugschlus.de> > wrote: > > Yes, I want to keep the possibility to remove huge files from > > snapshots that shouldnt have been on a snapshotted volume in the first > > place without having to ditch the entire snapshot. > > You could do ro snapshotting and in case you want to modify something > inside a snapshot/subvolume: > # btrfs property set ro false > # rm / > # btrfs property set ro true I was not aware that it is possible to fiddle with the ro property of an already existing snapshot. I am not yet sure whether I love or hate this. > >> Also, If some part of the OS or tools scans through the snapshot dirs > >> every now and then with atime creation on, metadata grows without a > >> real need. > > > > I mount with noatime and nodiratime anyway, and the directory the > > snapshots are mounted to (/mnt/snapshots) are excluded in > > updatedb.conf. Any other idea which tool might scan filesystems and > > that might not be noticed when it's running about a five digit number > > of snapshots? > > Maybe baloo or so if you use KDE. I usually do those tests via ssh without even being logged in to a local desktop. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
On Tue, Mar 15, 2016 at 12:22:00AM +0100, Henk Slager wrote: > The other question is: What is mounted on /media/tempdisk/ ? The "old" btrfs filesystem "ofanbtr", formerly 417 GB in size, now resized to 300 GB. Does it need to be umounted to be checked? > At least I think a check of the current 200GiB fs is needed. As it is > a rootfs and encrypted, some work is needed to make that happen. You suggested a btrfs check after looking at the image of "ofanbtr". Do you want me to check the new "fanbtr" also? Too bad that we went back to looking at "ofanbtr" after I changed the subject to avoid mixing up both instances. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
On Mon, Mar 14, 2016 at 09:39:51PM +0100, Henk Slager wrote: > >> BTW, I restored and mounted your 20160307-fanbtr-image: > >> > >> [266169.207952] BTRFS: device label fanbtr devid 1 transid 22215732 > >> /dev/loop0 > >> [266203.734804] BTRFS info (device loop0): disk space caching is enabled > >> [266203.734806] BTRFS: has skinny extents > >> [266204.022175] BTRFS: checking UUID tree > >> [266239.407249] attempt to access beyond end of device > >> [266239.407252] loop0: rw=1073, want=715202688, limit=70576 > >> [266239.407254] BTRFS error (device loop0): bdev /dev/loop0 errs: wr > >> 1, rd 0, flush 0, corrupt 0, gen 0 > >> [266239.407272] attempt to access beyond end of device > >> .. and 16 more > >> > >> As a quick fix/workaround, I truncated the image to 1T > > > > The original fs was 417 GiB in size. What size does the image claim? > > ls -alFh of the restored image showed 337G I remember. > btrfs fi us showed also a number over 400G, I don't have the > files/loopdev anymore. sounds legit. > It could some side effect of btrfs-image, I only have used it for > multi-device, where dev id's are ignore, but total image size did not > lead to problems. The original "ofanbtr" seems to have a problem, since btrfs check /media/tempdisk says: > > [10/509]mh@fan:~$ sudo btrfs check /media/tempdisk/ > > Superblock bytenr is larger than device size > > Couldn't open file system > > [11/509]mh@fan:~$ > > > > Can this be fixed? > > What I would do in order to fix it, is resize the fs to let's say > 190GiB. That should write correct values to the superblocks I /hope/. > And then resize back to max. It doesn't: [20/518]mh@fan:~$ sudo btrfs filesystem resize 300G /media/tempdisk/ Resize '/media/tempdisk/' of '300G' [22/520]mh@fan:~$ sudo btrfs check /media/tempdisk/ Superblock bytenr is larger than device size Couldn't open file system [23/521]mh@fan:~$ df -h > Maybe btrfs check --repair can also fix it, but before doing --repair > or other actions, I would see what else besides btrfs could be wrong, > see also suggestion of Holger. Like putting the filesystem on an unencrypted medium? Sorry, no, private data, paranoia. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
Hi Henk, On Mon, Mar 14, 2016 at 02:46:54PM +0100, Henk Slager wrote: > On Mon, Mar 14, 2016 at 1:07 PM, Marc Haber <mh+linux-bt...@zugschlus.de> > wrote: > > Mar 14 10:23:49 fan mh: BEGIN btrfs-balance script > > Mar 14 10:23:49 fan mh: btrfs fi df / > > Mar 14 10:23:49 fan root: Data, single: total=79.00GiB, used=78.42GiB > > Mar 14 10:23:49 fan root: System, single: total=32.00MiB, used=16.00KiB > > Mar 14 10:23:49 fan root: Metadata, single: total=10.00GiB, used=2.46GiB > > Mar 14 10:23:49 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B > > Mar 14 10:23:49 fan mh: btrfs fi show / > > Mar 14 10:23:49 fan root: Label: 'fanbtr' uuid: > > 90f8d728-6bae-4fca-8cda-b368ba2c008e > > Mar 14 10:23:49 fan root: #011Total devices 1 FS bytes used 80.89GiB > > Mar 14 10:23:49 fan root: #011devid1 size 200.00GiB used 89.03GiB path > > /dev/mapper/fanbtr > > Mar 14 10:23:49 fan root: > > Mar 14 10:23:49 fan mh: btrfs fi usage / > > Mar 14 10:23:49 fan root: Overall: > > Mar 14 10:23:49 fan root: Device size:#011#011 200.00GiB > > Mar 14 10:23:49 fan root: Device allocated:#011#011 89.03GiB > > Mar 14 10:23:49 fan root: Device unallocated:#011#011 110.97GiB > > Mar 14 10:23:49 fan root: Device missing:#011#011 0.00B > > Mar 14 10:23:49 fan root: Used:#011#011#011 80.89GiB > > Mar 14 10:23:49 fan root: Free (estimated):#011#011 111.54GiB#011(min: > > 111.54GiB) > > Mar 14 10:23:49 fan root: Data ratio:#011#011#011 1.00 > > Mar 14 10:23:49 fan root: Metadata ratio:#011#011 1.00 > > Mar 14 10:23:49 fan root: Global reserve:#011#011 512.00MiB#011(used: > > 0.00B) > It it looks a bit strange to me that this is already 512MiB for and fs > of 200GiB. Just after creation (4.4 tools) it should be something like > 16MiB. And grows when fs is used, but 512MiB... An fs created with > older tools had 512MiB from start AFAIK Confirmed, a new btrfs of 200 GB made on a rotating disk has 16 MiB of global reserve. Unfortunately, I do not have history about how this grew over time. The first btrfs fi usage I have on file was about half a day into this fs' existence on Mar 7, after copying data on to it, and Global reserve was already at 512 MiB. > > Mar 14 10:51:06 fan root: BEGIN btrfs balance start -mprofiles=dup / > > This probably should have been -mprofiles=single > So that its gets more clear where and when the enospc errors occur Good catch. So I'd need to parse btrfs fi df's output to call the right balance option. I blindly copied that over from the script I wrote for the older btrfs which still has DUP metadata and system. > BTW, I restored and mounted your 20160307-fanbtr-image: > > [266169.207952] BTRFS: device label fanbtr devid 1 transid 22215732 /dev/loop0 > [266203.734804] BTRFS info (device loop0): disk space caching is enabled > [266203.734806] BTRFS: has skinny extents > [266204.022175] BTRFS: checking UUID tree > [266239.407249] attempt to access beyond end of device > [266239.407252] loop0: rw=1073, want=715202688, limit=70576 > [266239.407254] BTRFS error (device loop0): bdev /dev/loop0 errs: wr > 1, rd 0, flush 0, corrupt 0, gen 0 > [266239.407272] attempt to access beyond end of device > .. and 16 more > > As a quick fix/workaround, I truncated the image to 1T The original fs was 417 GiB in size. What size does the image claim? > After re-loop and mount and while doing a balance of the metadata I got this: > [27.431704] BTRFS error (device loop0): bad tree block start 0 > 5827368812544 > > So something is/was wrong with the fs. Did you do a btrfs check before > imaging? No, I didn't. And there is indeed something wrong: [10/509]mh@fan:~$ sudo btrfs check /media/tempdisk/ Superblock bytenr is larger than device size Couldn't open file system [11/509]mh@fan:~$ Can this be fixed? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
On Sun, Mar 13, 2016 at 12:58:09PM +0100, Marc Haber wrote: > On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote: > > The alternative if this can't be fixed, is to recreate the filesystem > > because there's no practical way yet to migrate so many snapshots to a > > new file system. > > I recreated the file system on March 7, with 200 GiB in size, using > btrfs-tools 4.4. The snapshot-taking process has been running since > then, but I also regularly cleaned up. The number of snapshots on the > new filesystem has never exceeded 1000, with the current count being > at 148. > > And btrfs balance runs into the same ENOSPC issues as the old one: ... with Qu's patch, I now get a reproducible kernel trace: Mar 14 10:23:49 fan mh: BEGIN btrfs-balance script Mar 14 10:23:49 fan mh: btrfs fi df / Mar 14 10:23:49 fan root: Data, single: total=79.00GiB, used=78.42GiB Mar 14 10:23:49 fan root: System, single: total=32.00MiB, used=16.00KiB Mar 14 10:23:49 fan root: Metadata, single: total=10.00GiB, used=2.46GiB Mar 14 10:23:49 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B Mar 14 10:23:49 fan mh: btrfs fi show / Mar 14 10:23:49 fan root: Label: 'fanbtr' uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e Mar 14 10:23:49 fan root: #011Total devices 1 FS bytes used 80.89GiB Mar 14 10:23:49 fan root: #011devid1 size 200.00GiB used 89.03GiB path /dev/mapper/fanbtr Mar 14 10:23:49 fan root: Mar 14 10:23:49 fan mh: btrfs fi usage / Mar 14 10:23:49 fan root: Overall: Mar 14 10:23:49 fan root: Device size:#011#011 200.00GiB Mar 14 10:23:49 fan root: Device allocated:#011#011 89.03GiB Mar 14 10:23:49 fan root: Device unallocated:#011#011 110.97GiB Mar 14 10:23:49 fan root: Device missing:#011#011 0.00B Mar 14 10:23:49 fan root: Used:#011#011#011 80.89GiB Mar 14 10:23:49 fan root: Free (estimated):#011#011 111.54GiB#011(min: 111.54GiB) Mar 14 10:23:49 fan root: Data ratio:#011#011#011 1.00 Mar 14 10:23:49 fan root: Metadata ratio:#011#011 1.00 Mar 14 10:23:49 fan root: Global reserve:#011#011 512.00MiB#011(used: 0.00B) Mar 14 10:23:49 fan root: Mar 14 10:23:49 fan root: Data,single: Size:79.00GiB, Used:78.42GiB Mar 14 10:23:49 fan root:/dev/mapper/fanbtr#011 79.00GiB Mar 14 10:23:49 fan root: Mar 14 10:23:49 fan root: Metadata,single: Size:10.00GiB, Used:2.46GiB Mar 14 10:23:49 fan root:/dev/mapper/fanbtr#011 10.00GiB Mar 14 10:23:49 fan root: Mar 14 10:23:49 fan root: System,single: Size:32.00MiB, Used:16.00KiB Mar 14 10:23:49 fan root:/dev/mapper/fanbtr#011 32.00MiB Mar 14 10:23:49 fan root: Mar 14 10:23:49 fan root: Unallocated: Mar 14 10:23:49 fan root:/dev/mapper/fanbtr#011 110.97GiB Mar 14 10:23:49 fan mh: BEGIN btrfs balance start / Mar 14 10:36:46 fan kernel: [ 890.995815] BTRFS info (device dm-15): 6 enospc errors during balance Mar 14 10:36:46 fan root: ERROR: error during balancing '/': No space left on device Mar 14 10:36:46 fan root: There may be more info in syslog - try dmesg | tail Mar 14 10:36:46 fan root: btrfs fi df / Mar 14 10:36:46 fan root: Data, single: total=79.00GiB, used=78.42GiB Mar 14 10:36:46 fan root: System, single: total=32.00MiB, used=16.00KiB Mar 14 10:36:46 fan root: Metadata, single: total=12.00GiB, used=2.46GiB Mar 14 10:36:46 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B Mar 14 10:36:46 fan root: btrfs fi show / Mar 14 10:36:46 fan root: Label: 'fanbtr' uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e Mar 14 10:36:46 fan root: #011Total devices 1 FS bytes used 80.89GiB Mar 14 10:36:46 fan root: #011devid1 size 200.00GiB used 91.03GiB path /dev/mapper/fanbtr Mar 14 10:36:46 fan root: Mar 14 10:36:46 fan root: btrfs fi usage / Mar 14 10:36:46 fan root: Overall: Mar 14 10:36:46 fan root: Device size:#011#011 200.00GiB Mar 14 10:36:46 fan root: Device allocated:#011#011 91.03GiB Mar 14 10:36:46 fan root: Device unallocated:#011#011 108.97GiB Mar 14 10:36:46 fan root: Device missing:#011#011 0.00B Mar 14 10:36:46 fan root: Used:#011#011#011 80.89GiB Mar 14 10:36:46 fan root: Free (estimated):#011#011 109.54GiB#011(min: 109.54GiB) Mar 14 10:36:46 fan root: Data ratio:#011#011#011 1.00 Mar 14 10:36:46 fan root: Metadata ratio:#011#011 1.00 Mar 14 10:36:46 fan root: Global reserve:#011#011 512.00MiB#011(used: 0.00B) Mar 14 10:36:46 fan root: Mar 14 10:36:46 fan root: Data,single: Size:79.00GiB, Used:78.42GiB Mar 14 10:36:46 fan root:/dev/mapper/fanbtr#011 79.00GiB Mar 14 10:36:46 fan root: Mar 14 10:36:46 fan root: Metadata,single: Size:12.00GiB, Used:2.46GiB Mar 14 10:36:46 fan root:/dev/mapper/fanbtr#011 12.00GiB Mar 14 10:36:46 fan root: Mar 14 10:36:46 fan root: System,single: Size:32.00MiB, Used:16.00KiB Mar 14 10:36:46 fan root:/dev/mapper/fanbtr#011 32.00MiB Mar 14 10:36:46 fan root: Mar 14 10:36:46 fan root: Unallocated: Mar 14 10:36:46 fan root:/de
Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
On Mon, Mar 14, 2016 at 01:05:39AM +, Duncan wrote: > But according to the mkfs.btrfs manpage, the detection is based on > /sys/block/DEV/queue/rotational (with DEV substituted appropriately), and > various layers got support for correctly passing that thru at various > times, some before btrfs, some after. So that's very likely why btrfs > didn't detect it originally, if it was on top of crypto and/or some other > layer that might not have been passing that thru. That explains it, thanks. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
On Sun, Mar 13, 2016 at 05:12:35PM +, Duncan wrote: > Marc Haber posted on Sun, 13 Mar 2016 12:58:10 +0100 as excerpted: > > I see the same metadata spread as with the old filesystem in btrfs fi > > df, > > totl at 23 and used at 2.38 GiB. What I find strange is that this > > filesystem has Data, System and Metadata in "single" profile, is this > > the new default for a 200 GiB file system? > > Single is default for data. Metadata (and system) will normally default > to dup on a single device, raid1 on multi-device, EXCEPT on detected > SSDs, where it defaults to single as well, because the firmware on some > ssds will dedup it in any case. If you know your ssd isn't one of the > deduping ones (as I do, here), you can of course overrule that by > specifying modes at mkfs.btrfs time. It was both times the same Samsung 840 EVO. Has this SSD detection been added recently, or did older versions of mkfs.btrfs not detect an SSD through a crypto layer, maybe? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
On Sun, Mar 13, 2016 at 08:14:45PM +0100, Henk Slager wrote: > On Sun, Mar 13, 2016 at 12:58 PM, Marc Haber > <mh+linux-bt...@zugschlus.de> wrote: > > Hi, > > > > On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote: > >> The alternative if this can't be fixed, is to recreate the filesystem > >> because there's no practical way yet to migrate so many snapshots to a > >> new file system. > > > > I recreated the file system on March 7, with 200 GiB in size, using > > btrfs-tools 4.4. The snapshot-taking process has been running since > > then, but I also regularly cleaned up. The number of snapshots on the > > new filesystem has never exceeded 1000, with the current count being > > at 148. > > Is the snapshotting still read-write? Yes, I want to keep the possibility to remove huge files from snapshots that shouldnt have been on a snapshotted volume in the first place without having to ditch the entire snapshot. > Also, If some part of the OS or tools scans through the snapshot dirs > every now and then with atime creation on, metadata grows without a > real need. I mount with noatime and nodiratime anyway, and the directory the snapshots are mounted to (/mnt/snapshots) are excluded in updatedb.conf. Any other idea which tool might scan filesystems and that might not be noticed when it's running about a five digit number of snapshots? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
On Sun, Mar 13, 2016 at 01:43:50PM -0600, Chris Murphy wrote: > On Sat, Mar 12, 2016 at 12:57 PM, Marc Haber > <mh+linux-bt...@zugschlus.de> wrote: > > On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote: > >> Something is happening with the usage of this file system that's out > >> of the ordinary. This is the first time I've seen such a large amount > >> of unused metadata allocation. And then for it not only fail to > >> balance, but for the allocation amount to increase is a first. So > >> understanding the usage is important to figuring out what's happening. > >> I'd file a bug and include as much information on how the fs got into > >> this state as possible. And also if possible make a btrfs-image using > >> the proper flags to blot out the filenames for privacy. And what > >> btrfs-progs tools were used to create this file system. Etc. > > > > https://bugzilla.kernel.org/show_bug.cgi?id=114451 > > > > Please advise if there is something missing. > > No enospc_debug mount option used for kernel messages. I apologize for not having this mentioned, but why do you think that it wasn't active? |[28/527]mh@fan:~$ grep enospc /proc/mounts |/dev/mapper/fanbtr / btrfs rw,noatime,nodiratime,ssd,space_cache,enospc_debug,subvolid=257,subvol=/fan-root 0 0 |/dev/mapper/fanbtr /mnt/snapshots/fanbtr btrfs rw,noatime,nodiratime,ssd,space_cache,enospc_debug,subvolid=266,subvol=/snapshots 0 0 |[29/528]mh@fan:~$ > And no indication you applied Qu's patch mentioned on March 1 to get > more info with enospc_debug mount: > > >Oh, I'm sorry that the output is not necessary, it's better to use the newer > >patch: > >https://patchwork.kernel.org/patch/8462881/ > >With the newer patch, you will need to use enospc_debug mount option to get > >the debug information. That one didn't make it in 4.4.5 yet? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
On Mon, Mar 14, 2016 at 12:17:24AM +1100, Andrew Vaughan wrote: > On 13 March 2016 at 22:58, Marc Haber <mh+linux-bt...@zugschlus.de> wrote: > > Hi, > > > > On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote: > >> The alternative if this can't be fixed, is to recreate the filesystem > >> because there's no practical way yet to migrate so many snapshots to a > >> new file system. > > > > I recreated the file system on March 7, with 200 GiB in size, using > > btrfs-tools 4.4. The snapshot-taking process has been running since > > then, but I also regularly cleaned up. The number of snapshots on the > > new filesystem has never exceeded 1000, with the current count being > > at 148. > > > > > > I'm not a dev, so I'll just thouw out a random, and possibly naive idea. > > How much i/o load is this filesystem under? > What type of access pattern(s), how frequent and large are the changes? Nearly none. It's a workstation which I have avoided using in the last days due to the filesystem trouble and to avoid impact of local work to the filesystem behavior. I even log out after working on the box for a few minutes. There is a Debian apt-cacher running on the box and writing its cache to this btrfs, but /var is on its own subvolume that is only snapshotted once a day. I'll move /var/cache to its own subvolume and set this subvolume on a "no snapshots" schedule. The box itself is running a couple of KVM VMs, but the virtual disks of the VMs are on dedicated LVs. > Are you still making snapshots every 10m? I am snapshotting the subvolume /home/mh, with the obvious contents, every ten minutes, yes. Most of the other subvolumes is snapshotted once daily, with some of them not getting snapshotted at all. > How often do you delete old snapshots? Also every 10m, or do you > delete them in batchs every hour or so? I delete them in batches about every ohter day. > How long does "btrfs subvolume delete -c " take? > What does "time btrfs subvolume delete -C ; [4/504]mh@fan:~$ time sudo btrfs subvolume delete -c /mnt/snapshots/fanbtr/user/subdaily/2016/03/13/07/5001/-home-mh Delete subvolume (commit): '/mnt/snapshots/fanbtr/user/subdaily/2016/03/13/07/5001/-home-mh' real0m0.100s user0m0.000s sys 0m0.016s [5/505]mh@fan:~$ time sudo btrfs subvolume delete -C /mnt/snapshots/fanbtr/user/subdaily/2016/03/13/07/4001/-home-mh Delete subvolume (commit): '/mnt/snapshots/fanbtr/user/subdaily/2016/03/13/07/4001/-home-mh' real0m0.079s user0m0.012s sys 0m0.000s [6/506]mh@fan:~$ The difference between -c and -C does only show when there is more than one snapshot to be deleted. > time btrfs subvolume sync " print ? [8/508]mh@fan:~$ time sudo btrfs subvolume sync / real0m0.030s user0m0.004s sys 0m0.008s [9/509]mh@fan:~$ > The reason for asking is that even on a lightly loaded filesystem I > have seen btrfs subvolume delete take more than 30 seconds. On a more > heavily load filesystem I have seen 5+ minutes before btrfs subvolume > delete had finished. In my experience, deleting snapshot in huge batches slows down quite a bit, but this btrfs does not suffer from this disease. > If you have a high enough i/o load, plus large enough changes per > snapshot, it might be possible to get btrfs into a situation were it > never actually finishes cleaning up deleted snapshots. (I'm also not > sure what happens if you shutdown or unmount whilst btrfs is still > cleaning up, but I expect the devs thought of that). It is a COW filesystem, I'd expect it to be consistent no matter what. But that's the theory. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
Device unallocated:#011#011 106.97GiB Mar 13 11:36:17 fan root: Device missing:#011#011 0.00B Mar 13 11:36:17 fan root: Used:#011#011#011 80.09GiB Mar 13 11:36:17 fan root: Free (estimated):#011#011 107.26GiB#011(min: 107.26GiB) Mar 13 11:36:17 fan root: Data ratio:#011#011#011 1.00 Mar 13 11:36:17 fan root: Metadata ratio:#011#011 1.00 Mar 13 11:36:17 fan root: Global reserve:#011#011 512.00MiB#011(used: 0.00B) Mar 13 11:36:17 fan root: Mar 13 11:36:17 fan root: Data,single: Size:78.00GiB, Used:77.71GiB Mar 13 11:36:17 fan root:/dev/mapper/fanbtr#011 78.00GiB Mar 13 11:36:17 fan root: Mar 13 11:36:17 fan root: Metadata,single: Size:15.00GiB, Used:2.38GiB Mar 13 11:36:17 fan root:/dev/mapper/fanbtr#011 15.00GiB Mar 13 11:36:17 fan root: Mar 13 11:36:17 fan root: System,single: Size:32.00MiB, Used:16.00KiB Mar 13 11:36:17 fan root:/dev/mapper/fanbtr#011 32.00MiB Mar 13 11:36:17 fan root: Mar 13 11:36:17 fan root: Unallocated: Mar 13 11:36:17 fan root:/dev/mapper/fanbtr#011 106.97GiB Mar 13 11:36:17 fan root: BEGIN btrfs balance start / Mar 13 11:51:23 fan root: ERROR: error during balancing '/': No space left on device Mar 13 11:51:23 fan root: There may be more info in syslog - try dmesg | tail Mar 13 11:51:23 fan root: btrfs fi df / Mar 13 11:51:23 fan root: Data, single: total=78.00GiB, used=77.70GiB Mar 13 11:51:23 fan root: System, single: total=32.00MiB, used=16.00KiB Mar 13 11:51:23 fan root: Metadata, single: total=23.00GiB, used=2.38GiB Mar 13 11:51:23 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B Mar 13 11:51:23 fan root: btrfs fi show / Mar 13 11:51:23 fan root: Label: 'fanbtr' uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e Mar 13 11:51:23 fan root: #011Total devices 1 FS bytes used 80.08GiB Mar 13 11:51:23 fan root: #011devid1 size 200.00GiB used 101.03GiB path /dev/mapper/fanbtr Mar 13 11:51:23 fan root: Mar 13 11:51:23 fan root: btrfs fi usage / Mar 13 11:51:23 fan root: Overall: Mar 13 11:51:23 fan root: Device size:#011#011 200.00GiB Mar 13 11:51:23 fan root: Device allocated:#011#011 101.03GiB Mar 13 11:51:23 fan root: Device unallocated:#011#011 98.97GiB Mar 13 11:51:23 fan root: Device missing:#011#011 0.00B Mar 13 11:51:23 fan root: Used:#011#011#011 80.08GiB Mar 13 11:51:23 fan root: Free (estimated):#011#011 99.26GiB#011(min: 99.26GiB) Mar 13 11:51:23 fan root: Data ratio:#011#011#011 1.00 Mar 13 11:51:23 fan root: Metadata ratio:#011#011 1.00 Mar 13 11:51:23 fan root: Global reserve:#011#011 512.00MiB#011(used: 0.00B) Mar 13 11:51:23 fan root: Mar 13 11:51:23 fan root: Data,single: Size:78.00GiB, Used:77.70GiB Mar 13 11:51:23 fan root:/dev/mapper/fanbtr#011 78.00GiB Mar 13 11:51:23 fan root: Mar 13 11:51:23 fan root: Metadata,single: Size:23.00GiB, Used:2.38GiB Mar 13 11:51:23 fan root:/dev/mapper/fanbtr#011 23.00GiB Mar 13 11:51:23 fan root: Mar 13 11:51:23 fan root: System,single: Size:32.00MiB, Used:16.00KiB Mar 13 11:51:23 fan root:/dev/mapper/fanbtr#011 32.00MiB Mar 13 11:51:23 fan root: Mar 13 11:51:23 fan root: Unallocated: Mar 13 11:51:23 fan root:/dev/mapper/fanbtr#011 98.97GiB Mar 13 11:51:23 fan root: END btrfs-balance script [10/509]mh@fan:~$ I see the same metadata spread as with the old filesystem in btrfs fi df, totl at 23 and used at 2.38 GiB. What I find strange is that this filesystem has Data, System and Metadata in "single" profile, is this the new default for a 200 GiB file system? Full log is at http://q.bofh.de/~mh/stuff/20160313-fanbtr-btrfs-syslog The log was taken with enospc_debug active on the file system and all file system, block device and storage relevant log lines were left in. Is there anything missing? Is this the same issue? Would the log help as addition in https://bugzilla.kernel.org/show_bug.cgi?id=114451? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
On Mon, Mar 07, 2016 at 11:39:11AM -0700, Chris Murphy wrote: > Since there's no hardware issue suspect, you could filter for just btrfs. > > journalctl -o short-iso | grep -i btrfs Which is exactly what I did. Why did you suspect that my logs were "trimmed"? That's what got me kind of furious. I took great care to not trim relevant information. > When there's hardware stuff suspect it's better to include all the > SCSI and libata (and USB if it's a USB drive) messages also. None there. > If you have any logs that include the filesystem mounted with > enospc_debug, that might be useful for a developer? The later logs I posted were actually taken with enospc_debug, the 4.4.3 ones even with Duncan's patch. I think I didn't apply it before building 4.4.4. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
On Sun, Mar 06, 2016 at 01:27:10PM -0700, Chris Murphy wrote: > So if it were me, I'd gather all possible data, including complete, > not trimmed, logs. And as for the btrfs-image, it could be huge. [5/504]mh@q:~/.www/public_html/stuff$ unxz --list 20160307-fanbtr-image.xz Strms Blocks Compressed Uncompressed Ratio Check Filename 1 19248.0 MiB 2385.2 MiB 0.104 CRC32 20160307-fanbtr-image.xz > It might not be > a bad idea to capture a complete btrfs-debug-tree also, and compress > that, add as attachment. How do I do that? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
time sudo btrfs balance start -musage=74 /media/tempdisk Done, had to relocate 50 out of 134 chunks real0m4.546s user0m0.000s sys 0m0.620s [32/531]mh@fan:~$ sudo btrfs fi df /media/tempdisk/ Data, single: total=79.00GiB, used=78.32GiB System, DUP: total=32.00MiB, used=16.00KiB Metadata, DUP: total=27.00GiB, used=2.30GiB GlobalReserve, single: total=512.00MiB, used=0.00B So one does not see a decrease in total Metadata size until -musage has gone up to 70, then it decreases by half a gig. -musage=75 is the first musage value that leads to the ENOSPC condition, with total Metadata size going up to 27 GiB again, and -musage=74 being the biggest musage value that finishs without ENOSPC, but no visible decrease of total Metadata size. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote: > Something is happening with the usage of this file system that's out > of the ordinary. This is the first time I've seen such a large amount > of unused metadata allocation. And then for it not only fail to > balance, but for the allocation amount to increase is a first. So > understanding the usage is important to figuring out what's happening. > I'd file a bug and include as much information on how the fs got into > this state as possible. And also if possible make a btrfs-image using > the proper flags to blot out the filenames for privacy. And what > btrfs-progs tools were used to create this file system. Etc. https://bugzilla.kernel.org/show_bug.cgi?id=114451 Please advise if there is something missing. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
On Mon, Mar 07, 2016 at 01:56:54PM -0500, Austin S. Hemmelgarn wrote: > Yeah, in general, if you want to get good upstream support for BTRFS (such > as from the mailing lists), you still want to steer clear of 'Enterprise' > branded distros (RHEL (and by extension CentOS) is particularly bad about > kernel versioning Just to get back to this thread's subject, I am using Debian unstable, with a vanilla kernel, 4.4.3 at the beginning of this thread, and 4.4.4 today. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-image run time
On Mon, Mar 07, 2016 at 07:15:24PM +0100, Garmine 42 wrote: > According to the manpage duplicate -s is valid and the high CPU usage is > intended. Although a warning could be valid in case of -ss. Or use a different letter. Anyway, that was my stupidity and no developer time should be wasted for that. btrfs-image behaves as documented, everything else was a problem existing between chair and keyboard. Sorry for the noise. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-image run time
On Mon, Mar 07, 2016 at 06:27:17PM +0100, Marc Haber wrote: > how long is btrfs-image taking to run on a 400 GiB filesystem? > > I have /bin/btrfs-image -s -t 8 -s /dev/mapper/mydevice - | pixz -9 > > file.on.other.fs running for four hours now Strike my question please, I didn't see that I had the -s doubled. With one -s I now see actual progress. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs-image run time
Hi, how long is btrfs-image taking to run on a 400 GiB filesystem? I have /bin/btrfs-image -s -t 8 -s /dev/mapper/mydevice - | pixz -9 > file.on.other.fs running for four hours now, and it's constantly taking a single core, but is neither reading from the disk nor writing to its output. Is that expeced behavior? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
On Sat, Mar 05, 2016 at 09:09:09PM +0100, Marc Haber wrote: > On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote: > > So understanding the usage is important to figuring out what's > > happening. I'd file a bug and include as much information on how the > > fs got into this state as possible. And also if possible make a > > btrfs-image using the proper flags to blot out the filenames for > > privacy. > > That would btrfs-image -s? btrfs-image -s -t 8 -s /dev/mapper/fanbtr complains about a mounted filesystem. Will an image made from the running system with the filesyste mounted help, or do I need to take down the machine while the image is being made? Also, threading does not seem to work, despite the -t 8 CPU usage never increases 100 % in atop. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
On Sun, Mar 06, 2016 at 01:37:31PM -0700, Chris Murphy wrote: > On Sun, Mar 6, 2016 at 1:27 PM, Chris Murphy <li...@colorremedies.com> wrote: > > So if it were me, I'd gather all possible data, including complete, > > not trimmed, logs. > > Also include in the bug, the balance script being used. It might be a > contributing factor. The balance script was only written after Duncan asked me to do filtered balances instead of a full balance. The issue showed itself while the filesystem was still managed using the procedures from "the book" ;-) > I wonder if the ENOSPC is happening just prior to the point where > balance would free up the unused portion of allocated metadata chunks > and that's why this just keeps getting worse? The balance function is > COW, so I wonder if there are a bunch of failed chunk migrations that > are just accumulating due to the ENOSPC stopping the balance? How do we find out? > Anyway, after collecting all data and btrfs-image, I would blow away > this fs using current kernel and tools. And then go back to the > original workload. I would not pare down the number or frequency of > snapshots. If anything increase it. The idea is to reproduce the bug. ... losing another pile of snapshots in the process? This is a productive machine[1]. Greetings Marc [1] yes, with off-line backups being made -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
On Sun, Mar 06, 2016 at 01:27:10PM -0700, Chris Murphy wrote: > Marc said it was created maybe 2 years ago and doesn't remember what > version of the tools were used. Between it being two years ago and > also being Debian, for all we know it could've been 0.19. *shrug* You are mixing up Debian unstable and Debian stable *snort*. You're lucky that I'm not on RHEL 6[1]. > On the one hand, the practical advice is to just blow it away and use > everything current, go back to the same workload including thousands > of snapshots, and see if this balance problem is reproducible. That's > pretty clearly a bug. To have the same thing happen in half a year again? That's not why I converted to a snapshottable file system. > On the other hand, we're approaching the state with Btrfs where the > problems we're seeing are at least as much about aging file systems, > because the stability is permitting file systems to get older. And this is really something to be proud of? I mean, this is a file system that is part of the vanilla linux kernel, not marked as experimental or something, and you're still concerned about file systems that were made a year ago? This is a new experience for me. > As they get older though, the issues get more non-deterministic. So > it's an interesting bug from that perspective, the current kernel > code ought to be able to contend with this (as in, the user is right > to expect the code to deal with this scenario, and if it doesn't it's > a bug; not that I expect today's code to actually do this). Kernel 4.4.4 as of the day before yesterday, thanks for considering. > So if it were me, I'd gather all possible data, including complete, > not trimmed, logs. So you seriously want all messages like Mar 7 09:25:23 fan systemd[1]: Started http per-connection Server, forwarding to 3142 ([2a01:238:4071:328d:5054:ff:fea9:6807]:41060). Mar 7 09:25:23 fan named[3000]: client 2a01:238:4071:328d:5054:ff:fea9:6807#59920 (debian.debian.zugschlus.de): query: debian.debian.zugschlus.de IN + (fec0:0:0:::1) Mar 7 09:21:34 fan dhcpd[2468]: DHCPREQUEST for 192.168.182.29 from 54:04:a6:82:21:00 via eth0: unknown lease 192.168.182.29. Mar 7 09:17:01 fan CRON[19474]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Mar 7 09:18:06 fan systemd[1]: Started Session c101 of user mh. Mar 7 08:21:40 fan smartd[1956]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 31 to 30 I _can_ swamp the bug report literally with gigabytes of logs, but is that really what you want? If it is not, please state what you mean by "not trimmed" as I only removed those clutter messages from the logs I sent. Greetings Marc [1] Does RHEL 6 have btrfs in the first place? -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
On Sun, Mar 06, 2016 at 06:43:46AM +, Duncan wrote: > Marc Haber posted on Sat, 05 Mar 2016 21:09:09 +0100 as excerpted: > > On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote: > >> Something is happening with the usage of this file system that's out of > >> the ordinary. This is the first time I've seen such a large amount of > >> unused metadata allocation. And then for it not only fail to balance, > >> but for the allocation amount to increase is a first. > > > > It is just a root filesystem of a workstation running Debian Linux, in > > daily use, with daily snapshots of the system, and ten-minute-increment > > snapshots of /home, with no cleanup happening for a few months. > > > >> So understanding the usage is important to figuring out what's > >> happening. I'd file a bug and include as much information on how the > >> fs got into this state as possible. And also if possible make a > >> btrfs-image using the proper flags to blot out the filenames for > >> privacy. > > Now you're homing in on what I picked up on. There's something very > funky about that metadata, 100+ GiB of metadata total, only just over 2 > GiB metadata used, and attempts to balance it don't help with the spread > between the two at all, only increasing the total metadata, if anything, > but still seem to complete without error. There's gotta be some sort of > bug going on there, and I'd /bet/ it's the same one that's keeping full > balances from working, as well. I don't understand a single word of this, but you seem to understand it. Good. > > OK, this question's out of left field, but it's the only thing (well, > /almost/ only, see below) I've seen do anything /remotely/ like that: > > Was the filesystem originally created as a convert from ext*, using btrfs- > convert? If so, was the ext2_saved or whatever subvolume removed, and a > successful defrag and balance completed at that time? I have dug aroud in my auth.logs, and thanks to my not working in a root shell but using sudo for every single command I can say that the filesystem was created on September 1, 2015, so it is not _this_ old, and snapshot.debian.net tells me that Debian unstable had btrfs-tools 4.1.2 uploaded on August 31, so i guess that the filesystem was either created by the 4.0 version we had since May 2015 or by the brand new 4.1.2. And it was a mkfs.btrfs with no special options. I suspected this since I would probably not have made an ext4 filesystem of 300 GB in size. Back in the ext4 days, I usually made /, /usr, /var, /home and /boot their own filesystems. > Tho AFAIK there was in addition a very narrow timeframe in which a bug in > mkfs.btrfs would create invalid btrfs'. That was with btrfs-progs 4.1.1, > released in July 2015, with an urgent bugfix release 4.1.2 in the same > month to fix the problem, so the timeframe was days or weeks. Debian is chastized for their allegedly quirky release schedules even in this thread, I usually ignore that, but this time a smile comes to my face when I say that btrfs-progs 4.1.1 was never packaged in Debian, hence we're clear of this bug here. We went from 4.0 straight to 4.1.2. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote: > I can't tell what this btrfs-balance script is doing because not every > btrfs balance command is in the log. It is. I wrote it to produce reproducible logs. [1/499]mh@fan:~$ cat btrfs-balance #!/bin/bash FS="/mnt/fanbtr" showdf() { logger -- btrfs fi df $FS btrfs fi df $FS 2>&1 | logger logger -- btrfs fi show / btrfs fi show / | logger logger -- btrfs fi usage / btrfs fi usage / | logger } logger -- BEGIN btrfs-balance script showdf btrfs balance start $FS 2>&1 | logger showdf logger -- BEGIN btrfs balance start -dprofiles=single $FS btrfs balance start -dprofiles=single $FS 2>&1 | logger showdf logger -- BEGIN btrfs balance start -mprofiles=dup $FS btrfs balance start -mprofiles=dup $FS 2>&1 | logger showdf logger -- BEGIN btrfs balance start --force -sprofiles=dup $FS btrfs balance start --force -sprofiles=dup $FS 2>&1 | logger showdf logger -- BEGIN btrfs balance start $FS btrfs balance start $FS 2>&1 | logger showdf logger -- END btrfs-balance script [2/500]mh@fan:~$ I see. The logger -- BEGIN is missing for the very first command. My bad. > Something is happening with the usage of this file system that's out > of the ordinary. This is the first time I've seen such a large amount > of unused metadata allocation. And then for it not only fail to > balance, but for the allocation amount to increase is a first. It is just a root filesystem of a workstation running Debian Linux, in daily use, with daily snapshots of the system, and ten-minute-increment snapshots of /home, with no cleanup happening for a few months. > So understanding the usage is important to figuring out what's > happening. I'd file a bug and include as much information on how the > fs got into this state as possible. And also if possible make a > btrfs-image using the proper flags to blot out the filenames for > privacy. That would btrfs-image -s? > And what btrfs-progs tools were used to create this file system. Etc. The file system is at least two years old, I do not remember, which version of btrfs-tools was in Debian unstable back then. Is this information somewhere in the filesystem label? How do I obtain this one? > The alternative if this can't be fixed, is to recreate the filesystem > because there's no practical way yet to migrate so many snapshots to a > new file system. I am now back to a mid three-digit number of snapshots. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: balance hangs and starts again on reboot
On Sat, Mar 05, 2016 at 04:38:57PM +0100, Holger Hoffstätte wrote: > On 03/05/16 15:17, Marc Haber wrote: > >> Then try to balance in small increments. > > > > -dusage=5 and incrementing? Or what do you mean with "in small > > increments"? > > Exactly, yes. Sorry for not being more clear. So you would recommend something along for nr in $(seq 5 5 100); do btrfs balance start -dusage=$nr $FS done right? Won't this take ages longer than a straight unfiltered balance? > FWIW I've been balancing a lot recently (both for stress testing and > cleaning up a few filesystems) and have never run into this particular > stall, but only ever do filtered balances. Also I wouldn't be surprised > at all if this is yet another problem where md does something in a way > that btrfs doesn' expect, and things go wrong. md as in the Linux Software RAID? That's not in the game here, it's a single SATA hard disk. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
On Thu, Mar 03, 2016 at 02:28:36AM +0200, Dāvis Mosāns wrote: > I've same issue, 4.4.3 kernel on Arch Linux > > $ sudo btrfs fi show /mnt/fs/ > Label: 'fs' uuid: a3c66d25-2c25-40e5-a827-5f7e5208e235 > Total devices 1 FS bytes used 396.94GiB > devid1 size 435.76GiB used 435.76GiB path /dev/sdi2 > > $ sudo btrfs fi df /mnt/fs/ > Data, single: total=416.70GiB, used=390.62GiB > System, DUP: total=32.00MiB, used=96.00KiB > Metadata, DUP: total=9.50GiB, used=6.32GiB > GlobalReserve, single: total=512.00MiB, used=0.00B > > $ sudo btrfs fi usage /mnt/fs/ > Overall: > Device size: 435.76GiB > Device allocated:435.76GiB > Device unallocated:1.00MiB > Device missing: 0.00B > Used:403.26GiB > Free (estimated): 26.07GiB (min: 26.07GiB) > Data ratio: 1.00 > Metadata ratio: 2.00 > Global reserve: 512.00MiB (used: 0.00B) > > Data,single: Size:416.70GiB, Used:390.62GiB >/dev/sdi2 416.70GiB > > Metadata,DUP: Size:9.50GiB, Used:6.32GiB >/dev/sdi2 19.00GiB > > System,DUP: Size:32.00MiB, Used:96.00KiB >/dev/sdi2 64.00MiB > > Unallocated: >/dev/sdi2 1.00MiB http://paste.ubuntu.com/15292589/ has another log of mine with btrfs fi usage calls as well, just in case this helps. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
Hi, I have not seen this message coming back to the mailing list. Was it again too long? I have pastebinned the log at http://paste.debian.net/412118/ On Tue, Mar 01, 2016 at 08:51:32PM +, Duncan wrote: > There has been something bothering me about this thread that I wasn't > quite pinning down, but here it is. > > If you look at the btrfs fi df/usage numbers, data chunk total vs. used > are very close to one another (113 GiB total, 112.77 GiB used, single > profile, assuming GiB data chunks, that's only a fraction of a single > data chunk unused), so balance would seem to be getting thru them just > fine. Where would you see those numbers? I have those, pre-balance: Mar 2 20:28:01 fan root: Data, single: total=77.00GiB, used=76.35GiB Mar 2 20:28:01 fan root: System, DUP: total=32.00MiB, used=48.00KiB Mar 2 20:28:01 fan root: Metadata, DUP: total=86.50GiB, used=2.11GiB Mar 2 20:28:01 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B > But there's a /huge/ spread between total vs. used metadata (32 GiB > total, under 4 GiB used, clearly _many_ empty or nearly empty chunks), > implying that has not been successfully balanced in quite some time, if > ever. This is possible, yes. > So I'd surmise the problem is in metadata, not in data. > > Which would explain why balancing data works fine, but a whole-filesystem > balance doesn't, because it's getting stuck on the metadata, not the data. > > Now the balance metadata filters include system as well, by default, and > the -mprofiles=dup and -sprofiles=dup balances finished, apparently > without error, which throws a wrench into my theory. Also finishes without changing things, post-balance: Mar 2 21:55:37 fan root: Data, single: total=77.00GiB, used=76.36GiB Mar 2 21:55:37 fan root: System, DUP: total=32.00MiB, used=80.00KiB Mar 2 21:55:37 fan root: Metadata, DUP: total=99.00GiB, used=2.11GiB Mar 2 21:55:37 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B Wait, Metadata used actually _grew_??? > But while we have the btrfs fi df from before the attempt with the > profiles filters, we don't have the same output from after. s We now have everything. New log attached. > > I'd like to remove unused snapshots and keep the number of them to 4 > > digits, as a workaround. > > I'll strongly second that recommendation. Btrfs is known to have > snapshot scaling issues at 10K snapshots and above. My strong > recommendation is to limit snapshots per filesystem to 3000 or less, with > a target of 2000 per filesystem or less if possible, and an ideal of 1000 > per filesystem or less if it's practical to keep it to that, which it > should be with thinning, if you're only snapshotting 1-2 subvolumes, but > may not be if you're snapshotting more. I'm snapshotting /home every 10 minutes, the filesystem that I have been posting logs from has about 400 snapshots, and snapshot cleanup works fine. The slow snapshot removal is a different filesystem on the same host which is on a rotating rust HDD, and is much bigger. > By 3000 snapshots per filesystem, you'll be beginning to notice slowdowns > in some btrfs maintenance commands if you're sensitive to it, tho it's > still at least practical to work with, and by 10K, it's generally > noticeable by all, at least once they thin down to 2K or so, as it's > suddenly faster again! Above 100K, some btrfs maintenance commands slow > to a crawl and doing that sort of maintenance really becomes impractical > enough that it's generally easier to backup what you need to and blow > away the filesystem to start again with a new one, than it is to try to > recover the existing filesystem to a workable state, given that > maintenance can at that point take days to weeks. Ouch. This shold not be the case, or btrfs subvolume snapshot should at least emit a warning. It is not good that it is so easy to get a filesystem into a state this bad. > So 5-digits of snapshots on a filesystem is definitely well outside of > the recommended range, to the point that in some cases, particularly > approaching 6-digits of snapshots, it'll be more practical to simply > ditch the filesystem and start over, than to try to work with it any > longer. Just don't do it; setup your thinning schedule so your peak is > 3000 snapshots per filesystem or under, and you won't have that problem > to worry about. =:^) That needs to be documented prominently. Ths ZFS fanbois will love that. > Oh, and btrfs quota management exacerbates the scaling issues > dramatically. If you're using btrfs quotas Am not, thankfully. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things.&
Re: balance hangs and starts again on reboot
On Fri, Mar 04, 2016 at 07:09:39PM +0100, Holger Hoffstätte wrote: > On 03/04/16 18:31, Marc Haber wrote: > > I have another btrfs on the same host that has no the no space left on > > device balance issue, but on another disk. On this btrfs, it seems > > like a balance process is stuck, with a lot of hanging kernel > > threads. After a reboot, when I mount the filesystem, the balance > > immediately starts again. btrfs balance cancel just hangs around with > > no visible reaction for hours. > > > > Log appended. Is there rescue? > > Can't offer much help other than to recommend to *always* mount with > -o skip_balance, which IMHO should have been the default behaviour > from the beginning. That's an important hint. The btrfs balance cancel has worked over night though. > Then try to balance in small increments. -dusage=5 and incrementing? Or what do you mean with "in small increments"? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: balance hangs and starts again on reboot
Hi Chris, I apologize for not being able to deliver logs in the way you might find them more helpful. On Fri, Mar 04, 2016 at 12:08:10PM -0700, Chris Murphy wrote: > On Fri, Mar 4, 2016 at 10:31 AM, Marc Haber <mh+linux-bt...@zugschlus.de> > wrote: > > I have another btrfs on the same host that has no the no space left on > > device balance issue, but on another disk. On this btrfs, it seems > > like a balance process is stuck, with a lot of hanging kernel > > threads. After a reboot, when I mount the filesystem, the balance > > immediately starts again. btrfs balance cancel just hangs around with > > no visible reaction for hours. > > > > Log appended. Is there rescue? > > The log is made much more useful if you can sysrq+w while the blocked > task is happening; and then dmesg or journalctl -k to get the results > into a file for attachment to avoid the annoying MUA wrapping. This list has repeatedly eaten log attachments without giving any indication why. I had assumed that attachments are disallowed here, and am taking careful attention that inserted logs are not wrapped on my side. The list archives (http://www.spinics.net/lists/linux-btrfs/msg52663.html) show that my efforts not to cause wrapping on my side were actually successful. What is the most helpful way to include logs? Pastebinning them would probably reduce the list archives' usefulness due to pastebin expiring, attaching doesn't work (see above), and including them causes "annoying MUA wrapping". I do only have 24 years of e-mail experience, so I'm a clueless newbie, maybe one can give advice how to do that properly. I'm going to try the sysrq+w thing next time things happen. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
balance hangs and starts again on reboot
Hi, I have another btrfs on the same host that has no the no space left on device balance issue, but on another disk. On this btrfs, it seems like a balance process is stuck, with a lot of hanging kernel threads. After a reboot, when I mount the filesystem, the balance immediately starts again. btrfs balance cancel just hangs around with no visible reaction for hours. Log appended. Is there rescue? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 Mar 4 17:27:36 fan mh: mount /mnt/snapshots/fanbtr_r Mar 4 17:27:41 fan kernel: [ 453.124792] BTRFS info (device dm-17): disk space caching is enabled Mar 4 17:27:41 fan kernel: [ 453.124797] BTRFS: has skinny extents Mar 4 17:27:46 fan kernel: [ 458.308485] BTRFS: checking UUID tree Mar 4 17:27:46 fan kernel: [ 458.308493] BTRFS info (device dm-17): continuing balance Mar 4 17:27:50 fan kernel: [ 462.297618] BTRFS info (device dm-17): relocating block group 150434162 flags 36 Mar 4 17:32:08 fan kernel: [ 720.473141] INFO: task btrfs-balance:3753 blocked for more than 120 seconds. Mar 4 17:32:08 fan kernel: [ 720.473154] Not tainted 4.4.4-zgws1 #2 Mar 4 17:32:08 fan kernel: [ 720.473159] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 4 17:32:08 fan kernel: [ 720.473165] btrfs-balance D 88062fc556c0 0 3753 2 0x Mar 4 17:32:08 fan kernel: [ 720.473176] 88060c3f0c00 0001 880036da4000 880036da3bd0 Mar 4 17:32:08 fan kernel: [ 720.473184] 8806113b6c60 0002 8140d24c 88060c3f0c00 Mar 4 17:32:08 fan kernel: [ 720.473192] 8140b759 7fff 8140d28a 88062fc156c0 Mar 4 17:32:08 fan kernel: [ 720.473199] Call Trace: Mar 4 17:32:08 fan kernel: [ 720.473214] [] ? usleep_range+0x35/0x35 Mar 4 17:32:08 fan kernel: [ 720.473225] [] ? schedule+0x6f/0x7c Mar 4 17:32:08 fan kernel: [ 720.473231] [] ? schedule_timeout+0x3e/0x128 Mar 4 17:32:08 fan kernel: [ 720.473241] [] ? check_preempt_curr+0x25/0x63 Mar 4 17:32:08 fan kernel: [ 720.473248] [] ? ttwu_do_wakeup+0xf/0xd0 Mar 4 17:32:08 fan kernel: [ 720.473255] [] ? _raw_spin_unlock_irqrestore+0xd/0xe Mar 4 17:32:08 fan kernel: [ 720.473263] [] ? try_to_wake_up+0x1cb/0x1dc Mar 4 17:32:08 fan kernel: [ 720.473271] [] ? __wait_for_common+0x121/0x16d Mar 4 17:32:08 fan kernel: [ 720.473278] [] ? __wait_for_common+0x121/0x16d Mar 4 17:32:08 fan kernel: [ 720.473286] [] ? wake_up_q+0x3b/0x3b Mar 4 17:32:08 fan kernel: [ 720.473339] [] ? btrfs_async_run_delayed_refs+0xbf/0xd5 [btrfs] Mar 4 17:32:08 fan kernel: [ 720.473390] [] ? __btrfs_end_transaction+0x291/0x2d5 [btrfs] Mar 4 17:32:08 fan kernel: [ 720.473438] [] ? relocate_block_group+0x2b8/0x4ab [btrfs] Mar 4 17:32:08 fan kernel: [ 720.473488] [] ? btrfs_wait_ordered_roots+0x175/0x191 [btrfs] Mar 4 17:32:08 fan kernel: [ 720.473536] [] ? btrfs_relocate_block_group+0x132/0x25a [btrfs] Mar 4 17:32:08 fan kernel: [ 720.473585] [] ? btrfs_relocate_chunk.isra.35+0x3c/0xad [btrfs] Mar 4 17:32:08 fan kernel: [ 720.473633] [] ? btrfs_balance+0xd23/0xd8f [btrfs] Mar 4 17:32:08 fan kernel: [ 720.473684] [] ? balance_kthread+0x4f/0x6d [btrfs] Mar 4 17:32:08 fan kernel: [ 720.473732] [] ? btrfs_balance+0xd8f/0xd8f [btrfs] Mar 4 17:32:08 fan kernel: [ 720.473740] [] ? kthread+0x95/0x9d Mar 4 17:32:08 fan kernel: [ 720.473747] [] ? kthread_parkme+0x16/0x16 Mar 4 17:32:08 fan kernel: [ 720.473754] [] ? ret_from_fork+0x3f/0x70 Mar 4 17:32:08 fan kernel: [ 720.473761] [] ? kthread_parkme+0x16/0x16 Mar 4 17:34:08 fan kernel: [ 840.465597] INFO: task btrfs-balance:3753 blocked for more than 120 seconds. Mar 4 17:34:08 fan kernel: [ 840.465610] Not tainted 4.4.4-zgws1 #2 Mar 4 17:34:08 fan kernel: [ 840.465615] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 4 17:34:08 fan kernel: [ 840.465621] btrfs-balance D 88062fc556c0 0 3753 2 0x Mar 4 17:34:08 fan kernel: [ 840.465632] 88060c3f0c00 0001 880036da4000 880036da3bd0 Mar 4 17:34:08 fan kernel: [ 840.465641] 8806113b6c60 0002 8140d24c 88060c3f0c00 Mar 4 17:34:08 fan kernel: [ 840.465648] 8140b759 7fff 8140d28a 88062fc156c0 Mar 4 17:34:08 fan kernel: [ 840.465655] Call Trace: Mar 4 17:34:08 fan kernel: [ 840.465669] [] ? usleep_range+0x35/0x35 Mar 4 17:34:08 fan kernel: [ 840.465680] [] ? schedule+0x6f/0x7c Mar 4 17:34:08 fan kernel: [ 840.465687] [] ? schedule_timeout+0x3e/0x128 Mar
Re: Again, no space left on device while rebalancing and recipe doesnt work
Hi, On Mon, Feb 29, 2016 at 09:56:58AM +0800, Qu Wenruo wrote: > Marc Haber wrote on 2016/02/27 22:14 +0100: > >I have again the issue of no space left on device while rebalancing > >(with btrfs-tools 4.4.1 on kernel 4.4.2 on Debian unstable): > > > >mh@fan:~$ sudo btrfs balance start /mnt/fanbtr > >ERROR: error during balancing '/mnt/fanbtr': No space left on device > > It seems that, only when balancing all chunks, ENOSPC error happens. > > And did you run any other heavy IO at background? Not when running those last commands for the mailing list post. > BTW, is there any kernel log when the ENOSPC happens? > Would you please try the following commands to see which one caused the > problem? > And would you please provide the dmesg of them? > > # btrfs balance start -dprofiles=single /mnt/fanbtr > # btrfs balance start -mprofile=dup /mnt/fanbtr > # btrfs balance start -sprofile=dup /mnt/fanbtr I have attached the logs. I used logger(1) to have in syslog which command I executed, and I have piped the userspace's output to logger so that the syslog entries match the userspace output. -mprofile gave an error message, I therefore tried -mprofiles, and -sprofiles wanted me to use the --force, so I did that as well. The three balance commands above all three finshed alright without running into ENOSPC, while running a plain balance (which is also part of the log) errors out every time. And, the -dprofiles=single log caused a number of INFOs regarding btrfs-cleaner and btrfa-balance processes gotten stuck for more than 120 seconds during the run. I now have a kworker and a btfs-transact kernel process taking most of one CPU core each, even after the userspace programs have terminated. Is there a way to find out what these threads are actually doing? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
On Sun, Feb 28, 2016 at 12:22:45AM +, Hugo Mills wrote: > On Sun, Feb 28, 2016 at 01:08:29AM +0100, Marc Haber wrote: > > Why wouldn't btrfs allocate more data chunks from the ample free space? > >It's a bug. It's been around for years (literally), but nobody's > tracked it down and fixed it yet. Is there a fix/workaround? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
On Sun, Feb 28, 2016 at 12:15:21AM +0100, Martin Steigerwald wrote: > On Samstag, 27. Februar 2016 22:14:50 CET Marc Haber wrote: > > I have again the issue of no space left on device while rebalancing > > (with btrfs-tools 4.4.1 on kernel 4.4.2 on Debian unstable): > > > > mh@fan:~$ sudo btrfs balance start /mnt/fanbtr > > ERROR: error during balancing '/mnt/fanbtr': No space left on device > > mh@fan:~$ sudo btrfs fi show /mnt/fanbtr > > mh@fan:~$ sudo btrfs fi show -m > > Label: 'fanbtr' uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3 > > Total devices 1 FS bytes used 116.49GiB > > devid1 size 417.19GiB used 177.06GiB path /dev/mapper/fanbtr > > Hmmm, thats still a ton of space to allocate chunks from. > > > mh@fan:~$ sudo btrfs fi df /mnt/fanbtr > > Data, single: total=113.00GiB, used=112.77GiB > > System, DUP: total=32.00MiB, used=48.00KiB > > Metadata, DUP: total=32.00GiB, used=3.72GiB > > GlobalReserve, single: total=512.00MiB, used=0.00B > > mh@fan:~$ > > > > The filesystem was recently resized from 300 GB to 420 GB. > > > > Why does btrfs fi show /mnt/fanbtr not give any output? Wy does btrfs > > fi df /mnt/fanbtr say that my data space is only 113 GiB large? > > Cause it is. > > The "used" in "devid 1" line is btrfs fi sh is "data + 2x system + 2x > metadata > = 113 GiB + 2 * 32 GiB + 2 * 32 MiB, i.e. what amount of the size of the > device is allocated for chunks. > > The value one line above is what is allocated inside the chunks. > > I.e. the line in "devid 1" is "total" of btrfs fi df summed up, and the line > above is "used" in btrfs fi df summed up. And… with more devices you have > more > fun. Why wouldn't btrfs allocate more data chunks from the ample free space? > I suggest: > > merkaba:~> btrfs fi usage -T /daten [2/498]mh@fan:~$ sudo btrfs fi usage /mnt/fanbtr Overall: Device size: 417.19GiB Device allocated:177.06GiB Device unallocated: 240.12GiB Device missing: 0.00B Used:120.23GiB Free (estimated):240.33GiB (min: 120.27GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:113.00GiB, Used:112.79GiB /dev/mapper/fanbtr113.00GiB Metadata,DUP: Size:32.00GiB, Used:3.72GiB /dev/mapper/fanbtr 64.00GiB System,DUP: Size:32.00MiB, Used:48.00KiB /dev/mapper/fanbtr 64.00MiB [3/498]mh@fan:~$ sudo btrfs fi usage -T /mnt/fanbtr Overall: Device size: 417.19GiB Device allocated:177.06GiB Device unallocated: 240.12GiB Device missing: 0.00B Used:120.23GiB Free (estimated):240.33GiB (min: 120.27GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data Metadata System Id Path singleDUP DUP Unallocated -- -- - --- 1 /dev/mapper/fanbtr 113.00GiB 64.00GiB 64.00MiB 240.12GiB -- -- - --- Total 113.00GiB 32.00GiB 32.00MiB 240.12GiB Used 112.79GiB 3.72GiB 48.00KiB [4/499]mh@fan:~$ > (this is actually the situation asking for hung task trouble with kworker > threads seeking for free space inside chunks, as no new chunks can be > allocated, lets hope kernel 4.4 finally really has fixes for this) I am running a 4.4.2 kernel on the system in question. > Adding a new device temporarily, doing the balance and then removing it. I currently refuse to do this on a 400 GiB device that has more than half of its capacity free. I do expect a modern filesystem to get out of that situation without a manual intervention this invasive. > Before that I´d try to balance the metadata chunks, cause > > > Metadata, DUP: total=32.00GiB, used=3.72GiB > > 32 GiB chunks allocated, only 3,72 GiB used. Why would I rebalance metadata if there is less than 20 % used? [21/504]mh@fan:~$ sudo btrfs balance start -musage=5 /mnt/fanbtr ERROR: error during balancing '/mnt/fanbtr': No space left on device There may be more info in syslog - try dmesg | tail [22/505]mh@fan:~$ sudo btrfs balance start -musage=1 /mnt/fanbtr Done, had to relocate 56 out of 179 chunks [23/506]mh@fan:~$ sudo btrfs balance start -musage=1 /mnt/fanbtr Done, had to relocate 56 out of 179 chunks [24/506]mh@fan:~$ sudo btrfs balance start -musage=1 /mnt/fanbtr Done, had to relocate 56 out of 1
Again, no space left on device while rebalancing and recipe doesnt work
Hi, I have again the issue of no space left on device while rebalancing (with btrfs-tools 4.4.1 on kernel 4.4.2 on Debian unstable): mh@fan:~$ sudo btrfs balance start /mnt/fanbtr ERROR: error during balancing '/mnt/fanbtr': No space left on device mh@fan:~$ sudo btrfs fi show /mnt/fanbtr mh@fan:~$ sudo btrfs fi show -m Label: 'fanbtr' uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3 Total devices 1 FS bytes used 116.49GiB devid1 size 417.19GiB used 177.06GiB path /dev/mapper/fanbtr mh@fan:~$ sudo btrfs fi df /mnt/fanbtr Data, single: total=113.00GiB, used=112.77GiB System, DUP: total=32.00MiB, used=48.00KiB Metadata, DUP: total=32.00GiB, used=3.72GiB GlobalReserve, single: total=512.00MiB, used=0.00B mh@fan:~$ The filesystem was recently resized from 300 GB to 420 GB. Why does btrfs fi show /mnt/fanbtr not give any output? Wy does btrfs fi df /mnt/fanbtr say that my data space is only 113 GiB large? btrfs balance start -dusage=5 works up to -dusage=100: mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr Done, had to relocate 111 out of 179 chunks mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr Done, had to relocate 111 out of 179 chunks mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr Done, had to relocate 110 out of 179 chunks mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr Done, had to relocate 109 out of 179 chunks mh@fan:~$ sudo btrfs balance start /mnt/fanbtr ERROR: error during balancing '/mnt/fanbtr': No space left on device mh@fan:~$ What is going on here? How do I get away from here? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Transaction aborted (error -17) during balance
Hi, during a balance on my main notebook, I have received the following call trace: [ 1545.229672] [ cut here ] [ 1545.229688] WARNING: CPU: 4 PID: 5545 at /build/linux-eGTGmU/linux-4.3/fs/btrfs/extent-tree.c:2093 __btrfs_inc_extent_ref.isra.52+0x20e/0x280 [btrfs]() [ 1545.229689] BTRFS: Transaction aborted (error -17) [ 1545.229690] Modules linked in: ctr ccm tun rfcomm cpufreq_userspace binfmt_misc cpufreq_stats cpufreq_powersave cpufreq_conservative nf_conntrack_netlink nfnetlink bnep ip6table_filter ip6_tables xt_TCPMSS xt_tcpudp iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables bridge stp llc joydev arc4 iTCO_wdt iwldvm iTCO_vendor_support mac80211 snd_hda_codec_conexant intel_rapl snd_hda_codec_generic iosf_mbi x86_pkg_temp_thermal btusb intel_powerclamp btrtl snd_hda_intel iwlwifi btbcm kvm_intel snd_hda_codec btintel kvm snd_hda_core psmouse bluetooth snd_hwdep snd_pcm_oss pcspkr serio_raw i2c_i801 sg cfg80211 snd_mixer_oss lpc_ich snd_pcm mfd_core snd_timer mei_me shpchp mei thinkpad_acpi nvram [ 1545.229718] tpm_tis snd tpm soundcore rfkill evdev battery ac processor coretemp loop drbd lru_cache libcrc32c parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq ext4 crc16 mbcache jbd2 algif_skcipher af_alg dm_crypt dm_mod md_mod hid_generic hid_logitech_hidpp hid_logitech_dj usbhid hid sd_mod uas usb_storage crct10dif_pclmul crc32_pclmul crc32c_intel jitterentropy_rng sha256_ssse3 sha256_generic hmac drbg ansi_cprng aesni_intel aes_x86_64 lrw gf128mul glue_helper i915 ahci ablk_helper cryptd libahci sdhci_pci i2c_algo_bit libata ehci_pci drm_kms_helper sdhci ehci_hcd scsi_mod mmc_core e1000e usbcore ptp usb_common drm pps_core thermal wmi video button [ 1545.229747] CPU: 4 PID: 5545 Comm: kworker/u16:1 Not tainted 4.3.0-trunk-amd64 #1 Debian 4.3-1~exp2 [ 1545.229747] Hardware name: LENOVO 4240CTO/4240CTO, BIOS 8AET63WW (1.43 ) 05/08/2013 [ 1545.229758] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [ 1545.229760] a0627250 812c5319 88020dc23ba0 8106ebcd [ 1545.229761] 880406146000 88020dc23bf0 8803c90b9410 [ 1545.229762] 0106 8106ec4c a0627420 0020 [ 1545.229764] Call Trace: [ 1545.229768] [] ? dump_stack+0x40/0x57 [ 1545.229771] [] ? warn_slowpath_common+0x7d/0xb0 [ 1545.229772] [] ? warn_slowpath_fmt+0x4c/0x50 [ 1545.229778] [] ? insert_tree_block_ref+0x49/0x60 [btrfs] [ 1545.229783] [] ? __btrfs_inc_extent_ref.isra.52+0x20e/0x280 [btrfs] [ 1545.229789] [] ? __btrfs_run_delayed_refs+0xc47/0x1050 [btrfs] [ 1545.229792] [] ? sched_clock+0x5/0x10 [ 1545.229795] [] ? check_preempt_curr+0x50/0x90 [ 1545.229797] [] ? ttwu_do_wakeup+0x14/0xc0 [ 1545.229803] [] ? btrfs_run_delayed_refs+0x78/0x2a0 [btrfs] [ 1545.229808] [] ? delayed_ref_async_start+0x32/0x80 [btrfs] [ 1545.229816] [] ? btrfs_scrubparity_helper+0xc8/0x260 [btrfs] [ 1545.229818] [] ? process_one_work+0x19f/0x3d0 [ 1545.229819] [] ? worker_thread+0x4d/0x450 [ 1545.229821] [] ? process_one_work+0x3d0/0x3d0 [ 1545.229822] [] ? kthread+0xbd/0xe0 [ 1545.229824] [] ? kthread_create_on_node+0x170/0x170 [ 1545.229827] [] ? ret_from_fork+0x3f/0x70 [ 1545.229829] [] ? kthread_create_on_node+0x170/0x170 [ 1545.229830] ---[ end trace 6671e30ac2882b40 ]--- [ 1545.229832] BTRFS: error (device dm-11) in __btrfs_inc_extent_ref:2093: errno=-17 Object already exists [ 1545.229834] BTRFS info (device dm-11): forced readonly [ 1545.229836] BTRFS: error (device dm-11) in btrfs_run_delayed_refs:2851: errno=-17 Object already exists I have been trying to balance this filesystem for the better part of the afternoon, with numerous freezes of my notebook. I was able to finish the balance by not doing anything on the notebook while the balance was running. I then proceeded to initiate a second rebalance of the same filesystem "just to be sure", which led to a read-only btrfs and me at least being able to obtain this trace. This is a distribution kernel, I have debug symbols installed after this log extrct was obtained. Is there a tool which can help to make this trace useable? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "disk full" on a 5 GB btrfs filesystem, FAQ outdated?
On Mon, Nov 30, 2015 at 05:44:23AM +, Duncan wrote: > Yes, you can get dup metadata back, but because data and metadata > are now combined in the same blockgroups (aka chunks), they must > both be the same replication type. Thanks for this explanation, it's perfectly clear to me now. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "disk full" on a 5 GB btrfs filesystem, FAQ outdated?
Hi Hugo, On Sun, Nov 29, 2015 at 02:18:06PM +, Hugo Mills wrote: > On Sun, Nov 29, 2015 at 02:07:54PM +0100, Marc Haber wrote: > > However, the FAQ > > https://btrfs.wiki.kernel.org/index.php/FAQ#Help.21_Btrfs_claims_I.27m_out_of_space.2C_but_it_looks_like_I_should_have_lots_left.21 > > suggests that for small filesystems (<16 GB), the best solution would > > be to upgrade to at least 2.6.37 and recreate the filesystem. 2.6.37 > > is ancient, from 2011, so I am pretty sure that the filesystem _was_ > > created at least with a kernel more recent than that. > >You missed the most important thing from that paragraph: Use mixed > block groups. That's "mkfs.btrfs --mixed ..." (which I realise is > missing from the text, and I'll be adding it after I send this email). Yes, that was the important bit of missing information. My filesystem now reads: [26/512]mh@fan:/mnt/tempdisk$ df -h . Filesystem Size Used Avail Use% Mounted on /dev/mapper/banana-root 6,0G 836M 5,2G 14% /mnt/tempdisk [27/513]mh@fan:/mnt/tempdisk$ sudo btrfs fi show . Label: none uuid: b2906231-70a9-46d9-9830-38a13cb73171 Total devices 1 FS bytes used 861.29MiB devid1 size 6.00GiB used 6.00GiB path /dev/mapper/banana-root btrfs-progs v4.3 [28/514]mh@fan:/mnt/tempdisk$ sudo btrfs fi df . System, single: total=4.00MiB, used=4.00KiB Data+Metadata, single: total=6.00GiB, used=861.29MiB GlobalReserve, single: total=20.00MiB, used=0.00B [29/515]mh@fan:/mnt/tempdisk$ Can I somehow get duplicate metadata back? Or is that unnecessary? > > My normal way to recover from this situation is to btrfs add a new > > device, btrfs balance, btrfs --convert=single --force balance, btfs > > device remove, btr balance start -mconvert=dup --force and finally > > balance start again. > > > > Is there any solution to solve this more elegantly? > >Recreate the FS with --mixed, and that should deal with it. Done. Thanks! Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"disk full" on a 5 GB btrfs filesystem, FAQ outdated?
Hi, I have a banana pi with a btrfs filesystem of 5 GB in size, which frequently runs out of space (lots of snapshots). This is currently again the case: [27/524]mh@banana:~$ sudo btrfs balance start / ERROR: error during balancing '/' - No space left on device There may be more info in syslog - try dmesg | tail [28/525]mh@banana:~$ sudo btrfs balance start / -dlimit=3 [sudo] password for mh on banana: ERROR: error during balancing '/' - No space left on device There may be more info in syslog - try dmesg | tail [29/526]mh@banana:~$ sudo btrfs balance start / -dlimit=3 ERROR: error during balancing '/' - No space left on device There may be more info in syslog - try dmesg | tail [30/526]mh@banana:~$ sudo btrfs balance start / -dusage=0 Done, had to relocate 0 out of 8 chunks [31/527]mh@banana:~$ sudo btrfs balance start / -dlimit=3 ERROR: error during balancing '/' - No space left on device There may be more info in syslog - try dmesg | tail [32/528]mh@banana:~$ sudo btrfs fi show / Label: none uuid: ada6b7f5-98d6-4fee-a3a3-b73bd152ff6c Total devices 1 FS bytes used 3.37GiB devid1 size 6.89GiB used 4.22GiB path /dev/mapper/banana-root btrfs-progs v4.3 [33/529]mh@banana:~$ sudo btrfs fi df / Data, single: total=3.41GiB, used=3.25GiB System, DUP: total=32.00MiB, used=16.00KiB Metadata, DUP: total=384.00MiB, used=121.75MiB GlobalReserve, single: total=48.00MiB, used=0.00B [34/530]mh@banana:~$ uname -a Linux banana 4.3.0-zgbpi-armmp-lpae+ #2 SMP Sat Nov 7 13:07:34 UTC 2015 armv7l GNU/Linux [36/532]mh@banana:~$ df -h / Filesystem Size Used Avail Use% Mounted on /dev/mapper/banana-root 6.9G 3.6G 2.9G 56% / [37/533]mh@banana:~$ The first kernel that was ever booted on the device was 4.1, I am therefore reasonably sure that the filesystem was also created with a recent kernel. Is there any possibility to find out about the kernel version that a filesystem was created with? However, the FAQ https://btrfs.wiki.kernel.org/index.php/FAQ#Help.21_Btrfs_claims_I.27m_out_of_space.2C_but_it_looks_like_I_should_have_lots_left.21 suggests that for small filesystems (<16 GB), the best solution would be to upgrade to at least 2.6.37 and recreate the filesystem. 2.6.37 is ancient, from 2011, so I am pretty sure that the filesystem _was_ created at least with a kernel more recent than that. My normal way to recover from this situation is to btrfs add a new device, btrfs balance, btrfs --convert=single --force balance, btfs device remove, btr balance start -mconvert=dup --force and finally balance start again. Is there any solution to solve this more elegantly? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html