Re: ENOSPC errors during raid1 rebalance
Hello, I want to report that I have the same problem as Michael Russo, except in my case there is definitely *a lot* of free space. I had ext4 on a 1Tb LVM mirror. The filesystem was 96% full, with many multi-Gb files. I successfully converted it into btrfs, removed ext2_saved subvolume, but did *not* defragment or balance. Then I added two fresh 4Tb disks to the filesystem, and tried to convert it to raid1. My plan was to then delete the original LVM disk and have all my data migrated to the new 4Tb disks under btrfs mirror. But balancing cannot complete with the same symptoms: [...] [12746.391828] block group has cluster?: no [12746.391830] 0 blocks of free space at or bigger than bytes is [12747.420098] btrfs: 35 enospc errors during balance root@pccross:~# btrfs fi sh Label: 'export' uuid: 02f39e9d-9115-4a79-9015-a3a9decb87cf Total devices 3 FS bytes used 798.15GB devid3 size 3.64TB used 855.03GB path /dev/sdd1 devid2 size 3.64TB used 855.00GB path /dev/sdc1 devid1 size 891.51GB used 175.48GB path /dev/md5 Btrfs v0.20-rc1 root@pccross:~# btrfs fi df /export Data, RAID1: total=849.00GB, used=650.25GB Data: total=175.48GB, used=145.64GB System: total=32.00MB, used=136.00KB Metadata, RAID1: total=6.00GB, used=2.21GB root@pccross:~# uname -a Linux pccross 3.13.0-17-generic #37-Ubuntu SMP Mon Mar 10 21:44:01 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Attempt "btrfs device delete" fails with the same "no space" diagnostic. I am running defragmentation on all files bigger than 1Gb now, and see what happens. If that does not help, is there any other advice? I can collect debugging data if needed. Thanks, Eugene signature.asc Description: OpenPGP digital signature
RE: ENOSPC errors during raid1 rebalance
Alright! After doing: cd /mymedia; find . -type f | while read file; do mv -v "$file" /dev/shm; f2=`basename "$file"`; mv -v "/dev/shm/$f2" "$file"; done I finally moved whatever files out of the "single" allocation and back onto the new RAID1 profile: oot@ossy:~# /usr/src/btrfs-progs/btrfs fi df /mymedia Data, RAID1: total=1.23TiB, used=1000.43GiB Data, single: total=10.00GiB, used=0.00 System, RAID1: total=32.00MiB, used=184.00KiB Metadata, RAID1: total=3.00GiB, used=1.34GiB And then the rebalance finally was able to move those two block groups and I got no errors: root@ossy:~# btrfs balance start -dconvert=raid1,soft /mymedia Done, had to relocate 2 out of 1264 chunks And now I'm totally RAID1: root@ossy:~# /usr/src/btrfs-progs/btrfs fi df /mymedia Data, RAID1: total=1.23TiB, used=1000.43GiB System, RAID1: total=32.00MiB, used=184.00KiB Metadata, RAID1: total=3.00GiB, used=1.34GiB Yay!! If I want to get back the space I can do another defragment but I don't really care right now. Thanks everyone for all your help on this! Hopefully this thread will assist anyone else in the future if this occurs again. Sincerely, Michael Russo, Systems Engineer PaperSolve, Inc. 268 Watchogue Road Staten Island, NY 10314 Randomly generated quote of the last 5 minutes: A good plan today is better than a perfect plan tomorrow. -- Patton -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ENOSPC errors during raid1 rebalance
Thanks Hugo, that makes sense, and maybe leads to a possible way to fix the issue in future versions of btrfs-convert or a way to handle it in the balance code. What I did to find files with extents: cd /mymedia find . -type f -print0 | xargs -0 filefrag | grep -v 1\ extent | grep -v 0\ extent | awk -F: '{print $1}' > /tmp/extent cat /tmp/extent | while read file; do mv -v "$file" /dev/shm; f2=`basename "$file"`; mv -v "/dev/shm/$f2" "$file"; done I no longer care that even after doing this I still have a file with multiple extents, I just don't want them inside those block groups with >1GB extents. (BTW my terminology may be all off here, I just know that in my syslog btrfs tries to relocate two particular "block groups" and gets 2 enospc errors for them.) But since I'm still getting the errors the files I care about probably aren't in this list so I'll do it for all the files on the system (since I can't find out what files are in those block groups). -Original Message- From: Hugo Mills [mailto:h...@carfax.org.uk] Sent: Friday, March 07, 2014 3:02 AM To: Mike Russo Cc: linux-btrfs@vger.kernel.org Subject: Re: ENOSPC errors during raid1 rebalance The defrag operation, by its nature, _doesn't_ preserve extents, and thus can act to break up the large extents, making it possible to balance the chunks that the offending extents live on. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- If the first-ever performance is the première, is the --- last-ever performance the derrière? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
Hugo Mills posted on Fri, 07 Mar 2014 08:02:13 + as excerpted: > On Fri, Mar 07, 2014 at 01:13:53AM +, Michael Russo wrote: >> Duncan <1i5t5.duncan cox.net> writes: >> >> > But if you're not using compression, /that/ can't explain it... >> > >> > >> Ha! Well while that was an interesting discussion of fragmentation, I >> am only using the default mount options here and so no compression. > >I _think_ the problem here is that there may have been some extents > created during the conversion which were over 1 GiB in size (or at least > which run across two or more chunks). This causes problems, because > there's nowhere that they can be written to by the balance -- > which preserves extents -- because none of the allocation units (chunks) > are big enough. > >The defrag operation, by its nature, _doesn't_ preserve extents, > and thus can act to break up the large extents, making it possible to > balance the chunks that the offending extents live on. Now /that/ would explain the issue, or at least all I've read of it from here. Nice job! =:^) Obviously >1 GB files break/fragment on 1-gig data-chunk lines when (re)written on btrfs, since that's the biggest allocation unit btrfs has, and at least once the filesystem has been (near) full once, btrfs is extremely unlikely to be able to place those gig-chunks contiguously, thus triggering the issue. Meanwhile, the move-to-tmpfs-and-back should indeed break up those >1-gig blocks too, since the that would force a rewrite and thus a reallocation, which would then follow btrfs 1-gig-chunk rules. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
On Fri, Mar 07, 2014 at 01:13:53AM +, Michael Russo wrote: > Duncan <1i5t5.duncan cox.net> writes: > > > But if you're not using compression, /that/ can't explain it... > > > > Ha! Well while that was an interesting discussion of fragmentation, > I am only using the default mount options here and so no compression. > The only reason I'm really even looking at the fragmentation > issue is because running the defragment (with varying sizes of > "-t" which I'm not sure why that's even necessary) seemed > to force btrfs to move segments around and let me rebalance > more block groups. I don't even care if the files are defragged, > I just want them all in the RAID1 profile. Hopefully if I move > each file out to some other FS like /dev/shm and then back > it will work, I just gotta hack a script together to do so. I _think_ the problem here is that there may have been some extents created during the conversion which were over 1 GiB in size (or at least which run across two or more chunks). This causes problems, because there's nowhere that they can be written to by the balance -- which preserves extents -- because none of the allocation units (chunks) are big enough. The defrag operation, by its nature, _doesn't_ preserve extents, and thus can act to break up the large extents, making it possible to balance the chunks that the offending extents live on. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- If the first-ever performance is the première, is the --- last-ever performance the derrière? signature.asc Description: Digital signature
Re: ENOSPC errors during raid1 rebalance
Duncan <1i5t5.duncan cox.net> writes: > But if you're not using compression, /that/ can't explain it... > Ha! Well while that was an interesting discussion of fragmentation, I am only using the default mount options here and so no compression. The only reason I'm really even looking at the fragmentation issue is because running the defragment (with varying sizes of "-t" which I'm not sure why that's even necessary) seemed to force btrfs to move segments around and let me rebalance more block groups. I don't even care if the files are defragged, I just want them all in the RAID1 profile. Hopefully if I move each file out to some other FS like /dev/shm and then back it will work, I just gotta hack a script together to do so. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
On Mar 5, 2014, at 3:13 PM, Michael Russo wrote: > Chris Murphy colorremedies.com> writes: > >> Did you do a defrag and balance after ext4>btrfs conversion, >> but before data/metadata profile conversion? > > No I didn't, as I thought it was only optional and didn't realize > it might later affect my ability to change profiles. It's possible it's more necessary in certain situations than others. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
Michael Russo posted on Wed, 05 Mar 2014 22:13:10 + as excerpted: > Chris Murphy colorremedies.com> writes: > >> You could also try a full defragment by specifying -r on the mount >> point with a small -t value to effectively cause everything to be >> subject to defragmenting. If this still doesn't permit soft rebalance, >> then maybe filefrag can find files that have more than 1 extent and >> just copy them (make duplicates, delete the original). Any copy will be >> allocated into chunks with the new profile. > > I would think so too. But it doesn't seem to be happening. > Here is an example with one file: > > root@ossy:/mymedia# filefrag output.wav output.wav: 2 extents found > root@ossy:/mymedia# /usr/src/btrfs-progs/btrfs fi de -t 1 > /mymedia/output.wav root@ossy:/mymedia# filefrag output.wav output.wav: > 2 extents found > > btrfs does not defrag the file. And copying the file usually doesn't > defrag it either: > > root@ossy:/mymedia# cp output.wav output.wav.bak root@ossy:/mymedia# > filefrag output.wav.bak output.wav.bak: 2 extents found > > I even tried copying a large file to another filesystem (/dev/shm), > removing the original, and copying it back, and more often than not > it still had more than 1 extent. This was covered in one thread recently, but looking back in this one I didn't find it covered here, so... What are your mount options? Do they include compress and/or compress- force? Because filefrag doesn't understand btrfs compression yet and counts each 128 KiB (I believe) compression block as a separate extent. I'm not sure whether that's 128 KiB pre-compression or post-compression size, and I'm not even positive it's 128 KiB, but certainly, if the file is large enough and btrfs is compressing it, filefrag will false-positive report multiple extents. That's a known issue with it ATM. Meanwhile, there's ongoing work to teach filefrag about btrfs compression so it can report fragmentation accurately, but from what I've read they're working on a general kernel-VFS-level API for that so the same general API can be used by other filesystems, and getting proper agreement on that API, and having both the kernel and filefrag implement it isn't a simple single-kernel-cycle project. There's a lot of filesystems other than btrfs that could potentially use this sort of thing, and getting a solution that will work well for all of them is hard work, both technically and politically. But once it's implemented, / correctly/, the entire Linux kernel filesystem space will benefit, just as btrfs is getting the benefit of the filefrag tool that ships with e2fsprogs, and the filesystem testing that ships as xfstests. =:^) But if you're not using compression, /that/ can't explain it... -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
Chris Murphy colorremedies.com> writes: > Did you do a defrag and balance after ext4>btrfs conversion, > but before data/metadata profile conversion? No I didn't, as I thought it was only optional and didn't realize it might later affect my ability to change profiles. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
Chris Murphy colorremedies.com> writes: > You could also try a full defragment by specifying -r on the mount point > with a small -t value to effectively cause everything to be subject > to defragmenting. If this still doesn't permit soft rebalance, then maybe > filefrag can find files that have more than 1 extent and just copy > them (make duplicates, delete the original). Any copy will be > allocated into chunks with the new profile. I would think so too. But it doesn't seem to be happening. Here is an example with one file: root@ossy:/mymedia# filefrag output.wav output.wav: 2 extents found root@ossy:/mymedia# /usr/src/btrfs-progs/btrfs fi de -t 1 /mymedia/output.wav root@ossy:/mymedia# filefrag output.wav output.wav: 2 extents found btrfs does not defrag the file. And copying the file usually doesn't defrag it either: root@ossy:/mymedia# cp output.wav output.wav.bak root@ossy:/mymedia# filefrag output.wav.bak output.wav.bak: 2 extents found I even tried copying a large file to another filesystem (/dev/shm), removing the original, and copying it back, and more often than not it still had more than 1 extent. If I copy each file out to another filesystem and then back, will btrfs not use any of the space on the "single" and just re-allocate space on the RAID1 like I want it to? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
On Mar 4, 2014, at 5:27 PM, Mike Russo wrote: > I'm sure this is due to the ext4 conversion, but that means the utility is > making a btrfs filesystem that later can't be converted to another profile > for some reason. https://btrfs.wiki.kernel.org/index.php/Conversion_from_Ext3 "But still, the new filesystem inherits the block placement and file data fragmentation. It is highly recommended to do full defragmentation and full rebalance before first use. It's not required, but will have impact on performance." Did you do a defrag and balance after ext4>btrfs conversion, but before data/metadata profile conversion? Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
On Mar 4, 2014, at 5:27 PM, Mike Russo wrote: > Chris Murphy colorremedies.com> writes: > >>> How can I find out what file is on the block group that >>> it's having a problem with? >> >> I think that's btrfs-debug-tree -b ? But don't hold me to that. >> I haven't done enough debugging to find files >> from block numbers. Another that might be relevant is >> btrfs inspect-internal logical-resolve. >> > > Nah, I don't think they're talking about the same thing. The messages I get > in syslog for the 2 blocks are: > > Mar 4 07:59:23 ossy kernel: [124731.175116] BTRFS info (device sdd1): > relocating block group 894460493824 flags 1 > Mar 4 07:59:32 ossy kernel: [124739.613467] BTRFS error (device sdd1): > allocation failed flags 17, wanted 1368420352 > Mar 4 07:59:32 ossy kernel: [124739.613473] BTRFS: space_info 1 has > 105551204352 free, is not full > Mar 4 07:59:32 ossy kernel: [124739.613476] BTRFS: space_info > total=1312112508928, used=1201166741504, pinned=0, reserved=565596160, > may_use=1368420352, readonly=4828966912 > > Mar 4 10:49:33 ossy kernel: [134932.248146] BTRFS info (device sdd1): > relocating block group 846142111744 flags 1 > Mar 4 10:49:39 ossy kernel: [134938.440347] BTRFS error (device sdd1): > allocation failed flags 17, wanted 1219825664 > Mar 4 10:49:39 ossy kernel: [134938.440353] BTRFS: space_info 1 has > 116232978432 free, is not full > Mar 4 10:49:39 ossy kernel: [134938.440356] BTRFS: space_info > total=1322849927168, used=1201311391744, pinned=0, reserved=476590080, > may_use=1224540160, readonly=4828966912 > > But if I try to examine either of those block groups I get an error message > that makes me think the 2 numbers are not really the same thing. You could also try a full defragment by specifying -r on the mount point with a small -t value to effectively cause everything to be subject to defragmenting. If this still doesn't permit soft rebalance, then maybe filefrag can find files that have more than 1 extent and just copy them (make duplicates, delete the original). Any copy will be allocated into chunks with the new profile. > > If anyone knows why that allocation should fail (what is "flags 17"?), or how > to force something more to happen, please reply. I'm sure this is due to the > ext4 conversion, but that means the utility is making a btrfs filesystem that > later can't be converted to another profile for some reason. The conversion description sounds like the main thing that's occurring is adding Btrfs metadata. The ext4 data and metadata are left intact. I'm not sure what the ext4 allocation looks like compared to data allocated into chunks by Btrfs directly. That could be a contributing factor, but it's also possible older Btrfs filesystems get fragmented in a similar way that they could have profile conversion problems too. Answering that would help determine if btrfs-convert or the rebalancing code needs to account for this possibility. A reproducer would be useful. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ENOSPC errors during raid1 rebalance
Chris Murphy colorremedies.com> writes: > > How can I find out what file is on the block group that > > it's having a problem with? > > I think that's btrfs-debug-tree -b ? But don't hold me to that. > I haven't done enough debugging to find files > from block numbers. Another that might be relevant is > btrfs inspect-internal logical-resolve. > Nah, I don't think they're talking about the same thing. The messages I get in syslog for the 2 blocks are: Mar 4 07:59:23 ossy kernel: [124731.175116] BTRFS info (device sdd1): relocating block group 894460493824 flags 1 Mar 4 07:59:32 ossy kernel: [124739.613467] BTRFS error (device sdd1): allocation failed flags 17, wanted 1368420352 Mar 4 07:59:32 ossy kernel: [124739.613473] BTRFS: space_info 1 has 105551204352 free, is not full Mar 4 07:59:32 ossy kernel: [124739.613476] BTRFS: space_info total=1312112508928, used=1201166741504, pinned=0, reserved=565596160, may_use=1368420352, readonly=4828966912 Mar 4 10:49:33 ossy kernel: [134932.248146] BTRFS info (device sdd1): relocating block group 846142111744 flags 1 Mar 4 10:49:39 ossy kernel: [134938.440347] BTRFS error (device sdd1): allocation failed flags 17, wanted 1219825664 Mar 4 10:49:39 ossy kernel: [134938.440353] BTRFS: space_info 1 has 116232978432 free, is not full Mar 4 10:49:39 ossy kernel: [134938.440356] BTRFS: space_info total=1322849927168, used=1201311391744, pinned=0, reserved=476590080, may_use=1224540160, readonly=4828966912 But if I try to examine either of those block groups I get an error message that makes me think the 2 numbers are not really the same thing. root@ossy:~# /usr/src/btrfs-progs/btrfs-debug-tree -b 894460493824 /dev/sdc1 Check tree block failed, want=894460493824, have=7159859086769936959 Check tree block failed, want=894460493824, have=7159859086769936959 Check tree block failed, want=894460493824, have=7159859086769936959 read block failed check_tree_block Check tree block failed, want=894460493824, have=7159859086769936959 Check tree block failed, want=894460493824, have=7159859086769936959 Check tree block failed, want=894460493824, have=7159859086769936959 read block failed check_tree_block failed to read 894460493824 Sigh. So I'm stuck at 10GB on single, but I don't know which 10GB it is. root@ossy:~# /usr/src/btrfs-progs/btrfs fi df /mymedia Data, RAID1: total=1.21TiB, used=1.09TiB Data, single: total=10.00GiB, used=5.50GiB System, RAID1: total=32.00MiB, used=180.00KiB Metadata, RAID1: total=2.00GiB, used=1.45GiB If anyone knows why that allocation should fail (what is "flags 17"?), or how to force something more to happen, please reply. I'm sure this is due to the ext4 conversion, but that means the utility is making a btrfs filesystem that later can't be converted to another profile for some reason. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
On Mar 4, 2014, at 11:54 AM, Michael Russo wrote: > Chris Murphy colorremedies.com> writes: >> Based on my reading of the man page, I think it's expected. >> You either need -s -l or -t. > > Ok, although the man page uses [ ] instead of < > and something > does happen if I don't add them. But if I use "-t 1" wouldn't that > get everything? Yeah you're right, you do get a default behavior but I think it's really permissive. I've thrown 3 extent files at it, and it's still 3 extents afterward. > >> >>> Now I've gotten >>> it down to only 2 5GB segments that won't move. >> >> Another way would be to just copy it (not reflink) and delete the original. >> > > How can I find out what file is on the block group that > it's having a problem with? I think that's btrfs-debug-tree -b ? But don't hold me to that. I haven't done enough debugging to find files from block numbers. Another that might be relevant is btrfs inspect-internal logical-resolve. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
Chris Murphy colorremedies.com> writes: > Based on my reading of the man page, I think it's expected. >You either need -s -l or -t. Ok, although the man page uses [ ] instead of < > and something does happen if I don't add them. But if I use "-t 1" wouldn't that get everything? > > > Now I've gotten > > it down to only 2 5GB segments that won't move. > > Another way would be to just copy it (not reflink) and delete the original. > How can I find out what file is on the block group that it's having a problem with? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
On Mar 4, 2014, at 8:55 AM, Michael Russo wrote: > Hugo Mills carfax.org.uk> writes: > >> This is just a guess, but you might have some large (>1GB) > extents >> in there that span across multiple chunks. I'd suggest running a > btrfs >> defrag on any particularly big files and see if that helps the > situation. >> > > Doing this is definitely helping, but I have to run the defrag multiple > times with different values for "-t". If I don't include -t the defrag > runs really quickly but doesn't seem to do anything. Based on my reading of the man page, I think it's expected. You either need -s -l or -t. > Now I've gotten > it down to only 2 5GB segments that won't move. Another way would be to just copy it (not reflink) and delete the original. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
Hugo Mills carfax.org.uk> writes: >This is just a guess, but you might have some large (>1GB) extents > in there that span across multiple chunks. I'd suggest running a btrfs > defrag on any particularly big files and see if that helps the situation. > Doing this is definitely helping, but I have to run the defrag multiple times with different values for "-t". If I don't include -t the defrag runs really quickly but doesn't seem to do anything. Now I've gotten it down to only 2 5GB segments that won't move. I'm going to extract the very big log generated when I run "btrfs ba start -dconvert=raid1,soft /mymedia" with enospc_debug and send it as a file, hopefully it will help. The first couple lines are: Mar 4 10:49:23 ossy kernel: [134922.806764] BTRFS info (device sdd1): relocating block group 894460493824 flags 1 Mar 4 10:49:32 ossy kernel: [134931.650504] BTRFS error (device sdd1): allocation failed flags 17, wanted 1368420352 Mar 4 10:49:32 ossy kernel: [134931.650507] BTRFS: space_info 1 has 113996488704 free, is not full Mar 4 10:49:32 ossy kernel: [134931.650509] BTRFS: space_info total=1320702443520, used=1201311391744, pinned=0, reserved=565596160, may_use=1368420352, readonly=4828966912 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
On Mar 3, 2014, at 1:50 PM, Michael Russo wrote: > Oh yeah, it was definitely a problem with either the drives > or the external enclosure, which was converting USB to SATA > and mirroring the drives internally (it was a WD MyBook > Mirror Edition). There was a problem with one of the drives > and I replaced it, but before I did it screwed up some > data. I think one drive kept trying to do a retry on a read > and somehow the logic decided to just give me a "nearby" > sector to satisfy the read (I can't really figure out how else > 10 seconds of a random but "nearby" in alphabetical order > MP3 could get stuck into the middle of another MP3). If it's a persistent problem with a particular file, that has the same 10 seconds from some other song, it's actually been written to disk wrong. That's a misdirected write. 10 seconds of some other file playing translates into how many blocks of data roughly? That's probably a pretty big misdirect. So I don't know that this would be more likely a drive problem, or possibly memory or controller. I'd burn a memtest86+ run for a few days if you haven't done this recently to remove that as a factor. > So I removed the entire enclosure and put them inside > my case and decided I never wanted it to happen again so I > converted to btrfs. :) Very happy it exists! The USB to SATA chipsets are all over the map quality wise, a lot of them suck. So this sounds like you've eliminated the controller as a factor going forward, by changing back to a direct SATA connection. That's a good idea. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
Chris Murphy colorremedies.com> writes: > > > Chris Murphy colorremedies.com> writes: > Gotcha. I think Hugo has the best next step. Defragment. I think this is going to work. I cancelled a partial defrag and did another move attempt, and this time 5GB got moved! So I'm going to let the whole thing finish and try again and that will probably fix it. Interestingly I had to specify the -t option to get some big files to move, it wouldn't work without it (I'm doing "btrfs fi defrag -r -v /mymedia -t 5"). > > That's not good. It sounds to me like misdirected writes. > Or file system corruption. Anyone else? Oh yeah, it was definitely a problem with either the drives or the external enclosure, which was converting USB to SATA and mirroring the drives internally (it was a WD MyBook Mirror Edition). There was a problem with one of the drives and I replaced it, but before I did it screwed up some data. I think one drive kept trying to do a retry on a read and somehow the logic decided to just give me a "nearby" sector to satisfy the read (I can't really figure out how else 10 seconds of a random but "nearby" in alphabetical order MP3 could get stuck into the middle of another MP3). So I removed the entire enclosure and put them inside my case and decided I never wanted it to happen again so I converted to btrfs. :) Very happy it exists! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
On Mar 3, 2014, at 12:24 PM, Michael Russo wrote: > Chris Murphy colorremedies.com> writes: > >> It might be worth adding enospc_debug as a mount option > > These messages only appear when I mount with enospc_debug. Gotcha. I think Hugo has the best next step. Defragment. > As for whether btrfs-convert should be recommended, I > mean probably not, but when you've got a big media drive filled > with music, pictures, and movies, with no other place to store it, Sure I understand the use case. And I think it should work. But I'm wondering whether it's more highly regarded than merely just be functional, or not. > and you've experienced corruption where 5% of your MP3 files > have 10 seconds of another MP3 stuck in the middle of them > somehow, and suddenly realize you NEED a checksumming > next-generation file system like btrfs, you're going to convert! That's not good. It sounds to me like misdirected writes. Or file system corruption. Anyone else? Btrfs can detect and (with data redundancy) can correct for torn and misdirected writes, among other problems. But 5% of 1TB is 50GB. That's a lot of corrupt data. I think what you need is to replace one or more drives, or look for problems elsewhere. Using Btrfs is not bad in this case, but it's also a crutch for a bigger problem that shouldn't be happening in the first place. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
Chris Murphy colorremedies.com> writes: > It might be worth adding enospc_debug as a mount option These messages only appear when I mount with enospc_debug. I included them as examples but I can post the full output later if needed. As for whether btrfs-convert should be recommended, I mean probably not, but when you've got a big media drive filled with music, pictures, and movies, with no other place to store it, and you've experienced corruption where 5% of your MP3 files have 10 seconds of another MP3 stuck in the middle of them somehow, and suddenly realize you NEED a checksumming next-generation file system like btrfs, you're going to convert! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
On Mar 3, 2014, at 11:24 AM, Michael Russo wrote: > Duncan <1i5t5.duncan cox.net> writes: >> >> That allows rollback if desired, but does tie up some some space with the >> automatically created btrfs "snapshot" that contains the ext3/4 metadata >> and untouched data. > > Nope, I definitely deleted the snapshots, running btrfs sub list > gives me nothing back: > > root@ossy:/usr/src/btrfs-progs# /usr/src/btrfs-progs/btrfs sub list /mymedia > root@ossy:/usr/src/btrfs-progs# > > Thanks for the detailed reply though. While doing this operation I > wanted to not have any snapshots so that I needed the minimum > amount of space to do the rebalance. > > But this all does seem strange right? Why the heck would it refuse > to move these 70GB and think there are 0 blocks of free space? It might be worth adding enospc_debug as a mount option and then retrying the balance -dconvert=raid1,soft which should only try to balance unconverted data chunks. The metadata all looks converted to raid1. The debug output probably only means something to a developer but maybe it'll be enlightening. Slightly off-topic to your question is whether btrfs-convert is expected to be a recommended method of migrating from ext3/4 or if it's more proof of concept that ought to work? The current version uses the ext block size for btrfs leaf size, rather than the 16KB default leaf size by mkfs.btfs. David Sterba was working on this at one point. Also I don't think btrfs-convert yet enables extref, although it can be enabled with btrfs-tune after conversion. In any case, the conversion isn't quite the same thing as you get with a new file system. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
On Mon, Mar 03, 2014 at 05:23:43PM +, Mike Russo wrote: > Hi guys - > I'm trying to convert a disk from single (/dev/sdc1) to RAID1 (dev/sdd1), and > the filesystem was previously ext4 but the conversion seemed to go just fine, > and I have no snapshots. System and metadata convert, and almost all my data > converts, but there are 70 stubborn GB (14 blocks of 5GB each) that refuse to > convert and I get ENOSPC errors when trying to reallocate them. This is just a guess, but you might have some large (>1GB) extents in there that span across multiple chunks. I'd suggest running a btrfs defrag on any particularly big files and see if that helps the situation. Hugo. > Here's my vital stats: (on kernel 3.14-rc4): > > root@ossy:/usr/src/btrfs-progs# /usr/src/btrfs-progs/btrfs fi show > Label: MyBook uuid: c99cbefb-6a56-41e2-8f99-19f8b5f67884 > Total devices 2 FS bytes used 1.09TiB > devid1 size 1.82TiB used 1.12TiB path /dev/sdc1 > devid2 size 1.82TiB used 1.05TiB path /dev/sdd1 > > Btrfs v3.12 > root@ossy:/usr/src/btrfs-progs# /usr/src/btrfs-progs/btrfs fi df /mymedia > Data, RAID1: total=1.04TiB, used=1.03TiB > Data, single: total=70.00GiB, used=64.98GiB > System, RAID1: total=32.00MiB, used=160.00KiB > Metadata, RAID1: total=2.00GiB, used=1.33GiB > > > I've already done a -dusage=0 and -dusage=5 balance and that cleans up > anything that gets left around when these 14 blocks fail get moved.When > mounting with enospc_debug I get messages like this for each block with the > problem: > > > Mar 3 11:58:16 ossy kernel: [52724.423024] BTRFS: block group 935262683136 > has 5368709120 bytes, 5321801728 used 0 pinned 0 reserved [readonly] > Mar 3 11:58:16 ossy kernel: [52724.423025] BTRFS info (device sdd1): block > group has cluster?: no > Mar 3 11:58:16 ossy kernel: [52724.423026] BTRFS info (device sdd1): 0 > blocks of free space at or bigger than bytes is > Mar 3 11:58:16 ossy kernel: [52724.423027] BTRFS: block group 942778875904 > has 5368709120 bytes, 5309296640 used 0 pinned 0 reserved [readonly] > Mar 3 11:58:16 ossy kernel: [52724.423028] BTRFS info (device sdd1): block > group has cluster?: no > Mar 3 11:58:16 ossy kernel: [52724.423029] BTRFS info (device sdd1): 0 > blocks of free space at or bigger than bytes is > > I don't have the space to reformat and if this is a genuine bug I'm sure you > guys want to fix it too. What else can I do to resolve or help? > > > Sincerely, > > Michael Russo, Systems Engineer > PaperSolve, Inc. > 268 Watchogue Road > Staten Island, NY 10314 > -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Nothing wrong with being written in Perl... Some of my best --- friends are written in Perl. signature.asc Description: Digital signature
Re: ENOSPC errors during raid1 rebalance
Duncan <1i5t5.duncan cox.net> writes: > > That allows rollback if desired, but does tie up some some space with the > automatically created btrfs "snapshot" that contains the ext3/4 metadata > and untouched data. Nope, I definitely deleted the snapshots, running btrfs sub list gives me nothing back: root@ossy:/usr/src/btrfs-progs# /usr/src/btrfs-progs/btrfs sub list /mymedia root@ossy:/usr/src/btrfs-progs# Thanks for the detailed reply though. While doing this operation I wanted to not have any snapshots so that I needed the minimum amount of space to do the rebalance. But this all does seem strange right? Why the heck would it refuse to move these 70GB and think there are 0 blocks of free space? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during raid1 rebalance
Mike Russo posted on Mon, 03 Mar 2014 17:23:43 + as excerpted: > I'm trying to convert a disk from single (/dev/sdc1) to RAID1 > (dev/sdd1), and the filesystem was previously ext4 but the conversion > seemed to go just fine, and I have no snapshots. System and metadata > convert, and almost all my data converts, but there are 70 stubborn GB > (14 blocks of 5GB each) that refuse to convert and I get ENOSPC errors > when trying to reallocate them. While I created entirely new btrfs filesystems here and copied everything over rather than converting so I've not had personal experience with the conversion process... The wiki[1] says[2] that while the conversion process uses the same data blocks for both ext3/4 and btrfs, it duplicates the ext3/4 metadata, creating a new btrfs copy (or two, for default metadata dup mode), leaving the original ext3/4 copy untouched. Btrfs modifications are then done using standard btrfs COW (copy-on-write) methods, so the ext3/4 data, while originally shared, remains untouched as well. That allows rollback if desired, but does tie up some some space with the automatically created btrfs "snapshot" that contains the ext3/4 metadata and untouched data. While you say you have no snapshots, it's unclear whether you mean none that you've created /since/ the conversion, but you didn't delete that original snapshot so still have it, or whether you deleted that automatically created btrfs snapshot of the old ext3/4 filesystem and simply didn't specifically mention it. I'm guessing that it's the former, and that btrfs is refusing to balance/ restripe that old ext3/4 snapshot albeit with a very confusing ENOSPC error message, since it'd kill the old ext3/4 filesystem and you could no longer rollback. If that's the case, the page at [2] explains how you get rid of the old ext3/4 snapshot once you're sure you won't be rolling back and thus no longer need it. With a bit of luck, that's all you need to do, and after deleting that, you can finish your balance/restripe. =:^) If you've already btrfs subvol delete-ed the ext2_saved subvolume, or if you hadn't but doing so doesn't solve the problem, well, I went for the low-hanging-fruit solution but obviously that wasn't it. =:^( Hopefully someone else can help further. --- [1] https://btrfs.wiki.kernel.org Bookmark it! =:^) [2] https://btrfs.wiki.kernel.org/index.php/Conversion_from_Ext3 -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html