Re: ENOSPC errors during balance
Am Tue, 22 Jul 2014 03:26:39 + (UTC) schrieb Duncan 1i5t5.dun...@cox.net: Marc Joliet posted on Tue, 22 Jul 2014 01:30:22 +0200 as excerpted: And now that the background deletion of the old snapshots is done, the file system ended up at: # btrfs filesystem df /run/media/marcec/MARCEC_BACKUP Data, single: total=219.00GiB, used=140.13GiB System, DUP: total=32.00MiB, used=36.00KiB Metadata, DUP: total=4.50GiB, used=2.40GiB unknown, single: total=512.00MiB, used=0.00 I don't know how reliable du is for this, but I used it to estimate how much used data I should expect, and I get 138 GiB. That means that the snapshots yield about 2 GiB overhead, which is very reasonable, I think. Obviously I'll be starting a full balance now. [snip total/used discussion] No, you misunderstand: read my email three steps above yours (from the 21. at 15:22). I am wondering about why the disk usage ballooned to 200 GiB in the first place. -- Marc Joliet -- People who think they know everything really annoy those of us who know we don't - Bjarne Stroustrup signature.asc Description: PGP signature
Re: Unstable v3.15-rc tags
On Sat, Jul 05, 2014 at 03:10:17PM +0100, WorMzy Tykashi wrote: The v3.15-rc{2,3,4} tags seem to have disappeared from the unstable repo in the last day or so. Please could you re-push the tags, or were they removed for a reason? The tags were meant to mark the points in time where different groups of patches were added to the to-be-v3.15, but then I noticed that one of the patches had my note in the subject and Qu sent an updated version anyway. So I had rebased the branch and all the tags became stale. I did not bother to move them as they did not work as expected. The patch flux is the good and bad habit of the integration repository. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] btrfs-progs: Add minimum device size check
On Mon, Jul 07, 2014 at 08:39:25AM +0800, Qu Wenruo wrote: Oh crap, I left my note in subject of the v3 patch in git. I'd rather not let it to the final branch, but the rebase is be necessary. Sorry. I'll take v4 then. Sorry for the extra rebase work. No problem, it was my mistake and your patch gave it a more solid reason to do the rebase. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs-progs: Add mount point check for 'btrfs fi df' command
On Mon, Jul 07, 2014 at 08:44:56AM +0800, Qu Wenruo wrote: The 'fi usage' is supposed to give the user-friendly overview, but the patchset is stuck because I found the numbers wrong or misleading under some circumstances. Although these patchset may not be merged, I'm curious about the wrong numbers. Would you please provide some leads or hints? The calculations led to bogus numbers in some cases for the minimum free due to rounding errors using the calculated factor 'K' and a small negative number appeared as 16E. I have patches that calculate the numbers a bit differently. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
Wang Shilong wangsl.fnst at cn.fujitsu.com writes: The latest btrfs-progs include man page of btrfs-replace. Actually, you could use it something like: btrfs replace start srcdev|devid targetdev mnt You could use 'btrfs file show' to see missing device id. and then run btrfs replace. Hi Wang, I physically removed the drive before the rebuild, having a failing device as a source is not a good idea anyway. Without the device in place, the device name is not showing up, since the missing device is not under /dev/sdXX or anything else. That is why I asked if the special parameter 'missing' may be used in a replace. I can't say if it is supported. But I guess not, since I found no documentation on this matter. So I guess replace is not aimed at fault tolerance / rebuilding. It's just a convenient way to lets lay replace the disks with larger disks , to extend your array. A convenience tool, not an emergency tool. TM Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
On 07/19/2014 02:23 PM, Martin Steigerwald wrote: Running 3.15.6 with this patch applied on top: - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/` - no extra error messages printed (`dmesg | grep racing`) compared to without the patch I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far. To recap some details (so I can have it all in one place): - /home/ is btrfs with compress=lzo BTRFS RAID 1 with lzo. - I have _not_ created any nodatacow files. Me neither. - Full stack is: sata - dmcrypt - lvm - btrfs (I noticed others mentioning the use of dmcrypt) Same, except no dmcrypt. Thanks for the help in tracking this down everyone. We'll get there! Are you all running multi-disk systems (from a btrfs POV, more than one device?) I don't care how many physical drives this maps to, just does btrfs think there's more than one drive. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
On 07/22/2014 04:53 PM, Chris Mason wrote: On 07/19/2014 02:23 PM, Martin Steigerwald wrote: Running 3.15.6 with this patch applied on top: - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/` - no extra error messages printed (`dmesg | grep racing`) compared to without the patch I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far. To recap some details (so I can have it all in one place): - /home/ is btrfs with compress=lzo BTRFS RAID 1 with lzo. - I have _not_ created any nodatacow files. Me neither. - Full stack is: sata - dmcrypt - lvm - btrfs (I noticed others mentioning the use of dmcrypt) Same, except no dmcrypt. Thanks for the help in tracking this down everyone. We'll get there! Are you all running multi-disk systems (from a btrfs POV, more than one device?) I don't care how many physical drives this maps to, just does btrfs think there's more than one drive. -chris Hi, In case it's interesting: From an earlier email thread with subject: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 TLDR: yes, btrfs sees multiple devices. sata - dmcrypt - btrfs raid10 btrfs raid10 consist of multiple dmcrypt devices from multiple sata devices. Mount: /dev/mapper/sdu on /mnt/storage type btrfs (rw,noatime,space_cache,compress=lzo,inode_cache,subvol=storage) (yes I know inode_cache is not recommended for general use) I have a nocow directory in a separate subvolume containing vm-images used by kvm. The same kvm-vms are reading/writing data from that array over nfs. I'm still holding that system on 3.14. Anything above causes blocks. -- Torbjørn -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On Tue, 22 Jul 2014 14:43:45 + (UTC), Tm wrote: Wang Shilong wangsl.fnst at cn.fujitsu.com writes: The latest btrfs-progs include man page of btrfs-replace. Actually, you could use it something like: btrfs replace start srcdev|devid targetdev mnt You could use 'btrfs file show' to see missing device id. and then run btrfs replace. Hi Wang, I physically removed the drive before the rebuild, having a failing device as a source is not a good idea anyway. Without the device in place, the device name is not showing up, since the missing device is not under /dev/sdXX or anything else. That is why I asked if the special parameter 'missing' may be used in a replace. I can't say if it is supported. But I guess not, since I found no documentation on this matter. So I guess replace is not aimed at fault tolerance / rebuilding. It's just a convenient way to lets lay replace the disks with larger disks , to extend your array. A convenience tool, not an emergency tool. TM, Just read the man-page. You could have used the replace tool after physically removing the failing device. Quoting the man page: If the source device is not available anymore, or if the -r option is set, the data is built only using the RAID redundancy mechanisms. Options -r only read from srcdev if no other zero-defect mirror exists (enable this if your drive has lots of read errors, the access would be very slow) Concerning the rebuild performance, the access to the disk is linear for both reading and writing, I measured above 75 MByte/s at that time with regular 7200 RPM disks, which would be less than 10 hours to replace a 3TB disk (in worst case, if it is completely filled up). Unused/unallocated areas are skipped and additionally improve the rebuild speed. For missing disks, unfortunately the command invocation is not using the term missing but the numerical device-id instead of the device name. missing _is_ implemented in the kernel part of the replace code, but was simply forgotten in the user mode part, at least it was forgotten in the man page. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
On Tue, Jul 22, 2014 at 10:53:03AM -0400, Chris Mason wrote: Thanks for the help in tracking this down everyone. We'll get there! Are you all running multi-disk systems (from a btrfs POV, more than one device?) I don't care how many physical drives this maps to, just does btrfs think there's more than one drive. In the bugs I sent you, it was a mix of arrays that were mdraid / dmcrypt / btrfs I have also one array with: disk1 / dmcrypt \ - btrfs (2 drives visible by btrfs) disk2 / dmcrypt / The multidrive setup seemed a bit worse, I just destroyed it and went back to putting all the drives together with mdadm and showing a single dmcrypted device to btrfs. But that is still super unstable on my server with 3.15, while being somewhat usable with my laptop (it still hangs, but more rarely) The one difference is that my laptop actually does disk dmcrypt btrfs while my server does disks mdadm dmcrypt btrfs Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On Jul 21, 2014, at 8:51 PM, Duncan 1i5t5.dun...@cox.net wrote: It does not matter at all what the average file size is. … and the filesize /does/ matter. I'm not sure how. A rebuild is replicating chunks, not doing the equivalent of cp or rsync on files. Copying chunks (or strips of chunks in the case of raid10) should be a rather sequential operation. So I'm not sure where the random write behavior would come from that could drop the write performance to ~5MB/s on drives that can read/write ~100MB/s. Thus is is perfectly reasonabe to expect ~50MByte/second, per spindle, when doing a raid rebuild. ... And perfectly reasonable, at least at this point, to expect ~5 MiB/ sec total thruput, one spindle at a time, for btrfs. It's been a while since I did a rebuild on HDDs, but on SSDs the rebuilds have maxed out the replacement drive. Obviously the significant difference is rotational latency. If everyone with spinning disks and many small files is getting 5MB/s rebuilds, it suggests a rotational latency penalty if the performance is expected. I'm just not sure where that would be coming from. Random IO would incur the effect of rotational latency, but the rebuild shouldn't be random IO, rather sequential. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] btrfs: code optimize use btrfs_get_bdev_and_sb() at btrfs_scan_one_device
On Tue, Jul 08, 2014 at 02:38:37AM +0800, Anand Jain wrote: (for review comments pls). btrfs_scan_one_device() needs SB, instead of doing it from scratch could use btrfs_get_bdev_and_sb() Please see http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6f60cbd3ae442cb35861bb522f388db123d42ec1 your patch actually reverts the changes. btrfs_get_bdev_and_sb does not work and it was a pretty nasty bug, we've spent a long afternoon catching it. Some of the check may seem unnecessary, but they 'make sure' that something unexpected will not happen. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs: syslog when quota is enabled
On Tue, Jul 08, 2014 at 02:41:28AM +0800, Anand Jain wrote: must syslog when btrfs working config changes so is to support offline investigation of the issues. Missing Signed-off-by line Reviewed-by: David Sterba dste...@suse.cz -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs: syslog when quota is disabled
On Tue, Jul 08, 2014 at 02:41:29AM +0800, Anand Jain wrote: Offline investigations of the issues would need to know when quota is disabled. Signed-off-by: Anand Jain anand.j...@oracle.com Reviewed-by: David Sterba dste...@suse.cz -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs-progs: Add mount point output for 'btrfs fi df' command.
On Wed, Jul 09, 2014 at 02:56:57PM +0800, Qu Wenruo wrote: Add mount point output for 'btrfs fi df'. Also since the patch uses find_mount_root() to find mount point, now 'btrfs fi df' can output more meaningful error message when given a non-btrfs path. If a non-btrfs path is passed, the Mounted on line is printed, followed by 2 ERROR: lines. I suggest to print it only if get_df succeeds, ie. right before print_df. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests/btrfs: add test for quota groups and drop snapshot
On Wed, Jul 09, 2014 at 03:41:50PM -0700, Mark Fasheh wrote: +$BTRFS_UTIL_PROG fi sy $SCRATCH_MNT +$BTRFS_UTIL_PROG su sna $SCRATCH_MNT $SCRATCH_MNT/snap1 +$BTRFS_UTIL_PROG qu en $SCRATCH_MNT +$BTRFS_UTIL_PROG su de $SCRATCH_MNT/snap1 Please spell the full names of the subcommands. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests/btrfs: add test for quota groups and drop snapshot
On Thu, Jul 10, 2014 at 12:00:55PM -0700, Mark Fasheh wrote: On Thu, Jul 10, 2014 at 11:32:28AM -0700, Zach Brown wrote: On Thu, Jul 10, 2014 at 10:36:14AM -0700, Mark Fasheh wrote: On Thu, Jul 10, 2014 at 10:43:30AM +1000, Dave Chinner wrote: On Wed, Jul 09, 2014 at 03:41:50PM -0700, Mark Fasheh wrote: + +# Enable qgroups now that we have our filesystem prepared. This +# will kick off a scan which we will have to wait for below. +$BTRFS_UTIL_PROG qu en $SCRATCH_MNT +sleep 30 That seems rather arbitrary. The sleeps you are adding add well over a minute to the runtime, and a quota scan of a filesystem with 200 files should be almost instantenous. Yeah I'll bring that back down to 5 seconds? How long does it usually take? What interfaces would be needed for this to work precisely so we don't have to play this game ever again? Well there's also the 'sleep 45' below because we need to be certain that btrfs_drop_snapshot gets run. This was all a bit of a pain during debugging to be honest. If you need to wait just for the quota scan, then use 'btrfs quota rescan -w' that will wait until the rescan finishes. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests/btrfs: add test for quota groups and drop snapshot
On Thu, Jul 10, 2014 at 12:00:55PM -0700, Mark Fasheh wrote: On Thu, Jul 10, 2014 at 11:32:28AM -0700, Zach Brown wrote: What interfaces would be needed for this to work precisely so we don't have to play this game ever again? Well there's also the 'sleep 45' below because we need to be certain that btrfs_drop_snapshot gets run. This was all a bit of a pain during debugging to be honest. So in my experience, an interface to make debugging easier would involve running every delayed action in the file system to completion, including a sync of dirty blocks to disk. In theory, this would include any delayed actions that were kicked off as a result of the actions you are syncing. You'd do it all from a point in time of course so that we don't spin forever on a busy filesystem. I do not know whether this is feasible. Given something like that, you'd just replace the calls to sleep with 'btrfs fi synctheworldandwait' and know that on return, the actions you just queued up completed. Waiting until some subvolume gets completely removed needs some work. In your case the cleaner thread sleeps and is woken up at the transaction commit time. As there is no other activity on the filesystem this happens at the periodic commit time, usually 30 seconds. 'sync' will not help, because it needs a transaction in progress. I have patchset in works that addresses a different problem but introduces a functionality that keeps track of some global pending actions. This could be easily enhanced to trigger the commit with sync if there was a snapshot deletion since last commit regardless of a transaction running status. This still does not cover the part where we want a command that waits until a given subvolume is completely removed, but I have a draft for that as well. Unfortunatelly until both parts are in place the sleep is the only reliable way. Oh well. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RESEND 1/4] btrfs-progs: Check fstype in find_mount_root()
On Thu, Jul 10, 2014 at 11:05:10AM +0800, Qu Wenruo wrote: When calling find_mount_root(), caller in fact wants to find the mount point of *BTRFS*. So also check ent-fstype in find_mount_root() and output proper error messages if needed. The utils.c functions should be mostly silent about the errors as this this the common code and it's up to the callers to print the messages. The existing printf in find_mount_root had appeared before the function was moved to utils.c. This will suppress a lot of Inapproiate ioctl for device error message. Catching the error early is a good thing of course. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
On 07/22/2014 04:53 PM, Chris Mason wrote: On 07/19/2014 02:23 PM, Martin Steigerwald wrote: Running 3.15.6 with this patch applied on top: - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/` - no extra error messages printed (`dmesg | grep racing`) compared to without the patch I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far. To recap some details (so I can have it all in one place): - /home/ is btrfs with compress=lzo BTRFS RAID 1 with lzo. - I have _not_ created any nodatacow files. Me neither. - Full stack is: sata - dmcrypt - lvm - btrfs (I noticed others mentioning the use of dmcrypt) Same, except no dmcrypt. Thanks for the help in tracking this down everyone. We'll get there! Are you all running multi-disk systems (from a btrfs POV, more than one device?) I don't care how many physical drives this maps to, just does btrfs think there's more than one drive. -chris 3.16-rc6 with your patch on top still causes hangs here. No traces of racing in dmesg Hang is on a btrfs raid 0 consisting of 3 drives. Full stack is: sata - dmcrypt - btrfs raid0 Hang was caused by 1. Several rsync -av --inplace --delete source backup subvol 2. btrfs subvolume snapshot -r backup subvol bacup snap The rsync jobs are done one at a time btrfs is stuck when trying to create the read only snapshot -- Torbjørn All output via netconsole. sysrq-w: https://gist.github.com/anonymous/d1837187e261f9a4cbd2#file-gistfile1-txt sysrq-t: https://gist.github.com/anonymous/2bdb73f035ab9918c63d#file-gistfile1-txt dmesg: [ 9352.784136] INFO: task btrfs-transacti:3874 blocked for more than 120 seconds. [ 9352.784222] Tainted: GE 3.16.0-rc6+ #64 [ 9352.784270] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 9352.784354] btrfs-transacti D 88042fc943c0 0 3874 2 0x [ 9352.784413] 8803fb9dfca0 0002 8800c4214b90 8803fb9dffd8 [ 9352.784502] 000143c0 000143c0 88041977b260 8803d29f23a0 [ 9352.784592] 8803d29f23a8 7fff 8800c4214b90 880232e2c0a8 [ 9352.784682] Call Trace: [ 9352.784726] [8170eb59] schedule+0x29/0x70 [ 9352.784774] [8170df99] schedule_timeout+0x209/0x280 [ 9352.784827] [8170874b] ? __slab_free+0xfe/0x2c3 [ 9352.784879] [810829f4] ? wake_up_worker+0x24/0x30 [ 9352.784929] [8170f656] wait_for_completion+0xa6/0x160 [ 9352.784981] [8109d4e0] ? wake_up_state+0x20/0x20 [ 9352.785049] [c045b936] btrfs_wait_and_free_delalloc_work+0x16/0x30 [btrfs] [ 9352.785141] [c04658be] btrfs_run_ordered_operations+0x1ee/0x2c0 [btrfs] [ 9352.785260] [c044bbb7] btrfs_commit_transaction+0x27/0xa40 [btrfs] [ 9352.785324] [c0447d65] transaction_kthread+0x1b5/0x240 [btrfs] [ 9352.785385] [c0447bb0] ? btrfs_cleanup_transaction+0x560/0x560 [btrfs] [ 9352.785469] [8108cc52] kthread+0xd2/0xf0 [ 9352.785517] [8108cb80] ? kthread_create_on_node+0x180/0x180 [ 9352.785571] [81712dfc] ret_from_fork+0x7c/0xb0 [ 9352.785620] [8108cb80] ? kthread_create_on_node+0x180/0x180 [ 9352.785678] INFO: task kworker/u16:3:6932 blocked for more than 120 seconds. [ 9352.785732] Tainted: GE 3.16.0-rc6+ #64 [ 9352.785780] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 9352.785863] kworker/u16:3 D 88042fd943c0 0 6932 2 0x [ 9352.785930] Workqueue: btrfs-flush_delalloc normal_work_helper [btrfs] [ 9352.785983] 88035f1bbb58 0002 880417e564c0 88035f1bbfd8 [ 9352.786072] 000143c0 000143c0 8800c1a03260 88042fd94cd8 [ 9352.786160] 88042ffb4be8 88035f1bbbe0 0002 81159930 [ 9352.786250] Call Trace: [ 9352.786292] [81159930] ? wait_on_page_read+0x60/0x60 [ 9352.786343] [8170ee6d] io_schedule+0x9d/0x130 [ 9352.786393] [8115993e] sleep_on_page+0xe/0x20 [ 9352.786443] [8170f3e8] __wait_on_bit_lock+0x48/0xb0 [ 9352.786495] [81159a4a] __lock_page+0x6a/0x70 [ 9352.786544] [810b14a0] ? autoremove_wake_function+0x40/0x40 [ 9352.786607] [c046711e] ? flush_write_bio+0xe/0x10 [btrfs] [ 9352.786669] [c046b0c0] extent_write_cache_pages.isra.28.constprop.46+0x3d0/0x3f0 [btrfs] [ 9352.786766] [c046cd2d] extent_writepages+0x4d/0x70 [btrfs] [ 9352.786828] [c04506f0] ? btrfs_submit_direct+0x6a0/0x6a0 [btrfs] [ 9352.786883] [810b0d78] ? __wake_up_common+0x58/0x90 [ 9352.786943] [c044e1d8] btrfs_writepages+0x28/0x30 [btrfs] [ 9352.786997] [811668ee] do_writepages+0x1e/0x40 [ 9352.787045] [8115b409] __filemap_fdatawrite_range+0x59/0x60 [ 9352.787097] [8115b4bc] filemap_flush+0x1c/0x20 [
Re: Blocked tasks on 3.15.1
On 07/22/2014 03:42 PM, Torbjørn wrote: On 07/22/2014 04:53 PM, Chris Mason wrote: On 07/19/2014 02:23 PM, Martin Steigerwald wrote: Running 3.15.6 with this patch applied on top: - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/` - no extra error messages printed (`dmesg | grep racing`) compared to without the patch I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far. To recap some details (so I can have it all in one place): - /home/ is btrfs with compress=lzo BTRFS RAID 1 with lzo. - I have _not_ created any nodatacow files. Me neither. - Full stack is: sata - dmcrypt - lvm - btrfs (I noticed others mentioning the use of dmcrypt) Same, except no dmcrypt. Thanks for the help in tracking this down everyone. We'll get there! Are you all running multi-disk systems (from a btrfs POV, more than one device?) I don't care how many physical drives this maps to, just does btrfs think there's more than one drive. -chris 3.16-rc6 with your patch on top still causes hangs here. No traces of racing in dmesg Hang is on a btrfs raid 0 consisting of 3 drives. Full stack is: sata - dmcrypt - btrfs raid0 Hang was caused by 1. Several rsync -av --inplace --delete source backup subvol 2. btrfs subvolume snapshot -r backup subvol bacup snap The rsync jobs are done one at a time btrfs is stuck when trying to create the read only snapshot The trace is similar, but you're stuck trying to read the free space cache. This one I saw earlier this morning, but I haven't seen these parts from the 3.15 bug reports. Maybe they are related though, I'll dig into the 3.15 bug reports again. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
On 07/22/2014 09:50 PM, Chris Mason wrote: On 07/22/2014 03:42 PM, Torbjørn wrote: On 07/22/2014 04:53 PM, Chris Mason wrote: On 07/19/2014 02:23 PM, Martin Steigerwald wrote: Running 3.15.6 with this patch applied on top: - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/` - no extra error messages printed (`dmesg | grep racing`) compared to without the patch I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far. To recap some details (so I can have it all in one place): - /home/ is btrfs with compress=lzo BTRFS RAID 1 with lzo. - I have _not_ created any nodatacow files. Me neither. - Full stack is: sata - dmcrypt - lvm - btrfs (I noticed others mentioning the use of dmcrypt) Same, except no dmcrypt. Thanks for the help in tracking this down everyone. We'll get there! Are you all running multi-disk systems (from a btrfs POV, more than one device?) I don't care how many physical drives this maps to, just does btrfs think there's more than one drive. -chris 3.16-rc6 with your patch on top still causes hangs here. No traces of racing in dmesg Hang is on a btrfs raid 0 consisting of 3 drives. Full stack is: sata - dmcrypt - btrfs raid0 Hang was caused by 1. Several rsync -av --inplace --delete source backup subvol 2. btrfs subvolume snapshot -r backup subvol bacup snap The rsync jobs are done one at a time btrfs is stuck when trying to create the read only snapshot The trace is similar, but you're stuck trying to read the free space cache. This one I saw earlier this morning, but I haven't seen these parts from the 3.15 bug reports. Maybe they are related though, I'll dig into the 3.15 bug reports again. -chris In case it was not clear, this hang was on a different btrfs volume than the 3.15 hang (but the same server). Earlier the affected volume was readable during the hang. This time the volume is not readable either. I'll keep the patched 3.16 running and see if I can trigger something similar to the 3.15 hang. Thanks -- Torbjørn -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
Stefan Behrens sbehrens at giantdisaster.de writes: TM, Just read the man-page. You could have used the replace tool after physically removing the failing device. Quoting the man page: If the source device is not available anymore, or if the -r option is set, the data is built only using the RAID redundancy mechanisms. Options -r only read from srcdev if no other zero-defect mirror exists (enable this if your drive has lots of read errors, the access would be very slow) Concerning the rebuild performance, the access to the disk is linear for both reading and writing, I measured above 75 MByte/s at that time with regular 7200 RPM disks, which would be less than 10 hours to replace a 3TB disk (in worst case, if it is completely filled up). Unused/unallocated areas are skipped and additionally improve the rebuild speed. For missing disks, unfortunately the command invocation is not using the term missing but the numerical device-id instead of the device name. missing _is_ implemented in the kernel part of the replace code, but was simply forgotten in the user mode part, at least it was forgotten in the man page. Hi Stefan, thank you very much, for the comprehensive info, I will opt to use replace next time. Breaking news :-) from Jul 19 14:41:36 microserver kernel: [ 1134.244007] btrfs: relocating block group 8974430633984 flags 68 to Jul 22 16:54:54 microserver kernel: [268419.463433] btrfs: relocating block group 2991474081792 flags 65 Rebuild ended before counting down to So flight time was 3 days, and I see no more messages or btrfs processes utilizing cpu. So rebuild seams ready. Just a few hours ago another disk showed some earlly touble accumulating Current_Pending_Sector but no Reallocated_Sector_Ct yet. TM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
Am Dienstag, 22. Juli 2014, 10:53:03 schrieb Chris Mason: On 07/19/2014 02:23 PM, Martin Steigerwald wrote: Running 3.15.6 with this patch applied on top: - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/` - no extra error messages printed (`dmesg | grep racing`) compared to without the patch I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far. To recap some details (so I can have it all in one place): - /home/ is btrfs with compress=lzo BTRFS RAID 1 with lzo. - I have _not_ created any nodatacow files. Me neither. - Full stack is: sata - dmcrypt - lvm - btrfs (I noticed others mentioning the use of dmcrypt) Same, except no dmcrypt. Thanks for the help in tracking this down everyone. We'll get there! Are you all running multi-disk systems (from a btrfs POV, more than one device?) I don't care how many physical drives this maps to, just does btrfs think there's more than one drive. As I told before I am using BTRFS RAID 1. Two logival volumes on two distinct SSDs. RAID is directly in BTRFS, no SoftRAID here (which I wouldn´t want to use with SSDs anyway). -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
On 07/22/2014 05:13 PM, Martin Steigerwald wrote: Am Dienstag, 22. Juli 2014, 10:53:03 schrieb Chris Mason: On 07/19/2014 02:23 PM, Martin Steigerwald wrote: Running 3.15.6 with this patch applied on top: - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/` - no extra error messages printed (`dmesg | grep racing`) compared to without the patch I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far. To recap some details (so I can have it all in one place): - /home/ is btrfs with compress=lzo BTRFS RAID 1 with lzo. - I have _not_ created any nodatacow files. Me neither. - Full stack is: sata - dmcrypt - lvm - btrfs (I noticed others mentioning the use of dmcrypt) Same, except no dmcrypt. Thanks for the help in tracking this down everyone. We'll get there! Are you all running multi-disk systems (from a btrfs POV, more than one device?) I don't care how many physical drives this maps to, just does btrfs think there's more than one drive. As I told before I am using BTRFS RAID 1. Two logival volumes on two distinct SSDs. RAID is directly in BTRFS, no SoftRAID here (which I wouldn´t want to use with SSDs anyway). When you say logical volumes, you mean LVM right? Just making sure I know all the pieces involved. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs kernel workqueues performance regression
On Tue, Jul 15, 2014 at 01:39:11PM -0400, Chris Mason wrote: On 07/15/2014 11:26 AM, Morten Stevens wrote: Hi, I see that btrfs is using kernel workqueues since linux 3.15. After some tests I noticed performance regressions with fs_mark. mount options: rw,relatime,compress=lzo,space_cache fs_mark on Kernel 3.14.9: # fs_mark -d /mnt/btrfs/fsmark -D 512 -t 16 -n 4096 -s 51200 -L5 -S0 FSUse%Count SizeFiles/sec App Overhead 16553651200 17731.4 723894 1 13107251200 16832.6 685444 1 19660851200 19604.5 652294 1 26214451200 18663.6 630067 1 32768051200 20112.2 692769 The results are really nice! compress=lzo performs very good. fs_mark after upgrading to Kernel 3.15.4: # fs_mark -d /mnt/btrfs/fsmark -D 512 -t 16 -n 4096 -s 51200 -L5 -S0 FSUse%Count SizeFiles/sec App Overhead 06553651200 10718.1 749540 0 13107251200 8601.2 853050 0 19660851200 11623.2 558546 0 26214451200 11534.2 536342 0 32768051200 11167.4 578562 That's really a big performance regression :( What do you think? It's easy to reproduce with fs_mark. I wasn't able to trigger regressions here when we first merged it, but I was sure that something would pop up. fs_mark is sensitive to a few different factors outside just the worker threads, so it could easily be another change as well. With 16 threads, the btree locking also has a huge impact, and we've made change there too. FWIW, I ran my usual 16-way fsmark test last week on my sparse 500TB perf test rig on btrfs. It sucked, big time, much worse than it's sucked in the past. It didn't scale past a single thread - 1 thread got 24,000 files/s, 2 threads got 25,000 files/s 16 threads got 22,000 files/s. $ ./fs_mark -D 1 -S0 -n 10 -s 0 -L 32 -d /mnt/scratch/0 FSUse%Count SizeFiles/sec App Overhead 0 100 24808.8 686583 $ ./fs_mark -D 1 -S0 -n 10 -s 0 -L 32 -d /mnt/scratch/0 -d /mnt/scratch/1 -d /mnt/scratch/2 -d /mnt/scratch/3 -d /mnt/scratch/4 -d /mnt/scratch/5 -d /mnt/scratch/6 -d /mnt/scratch/7 -d /mnt/scratch/8 -d /mnt/scratch/9 -d /mnt/scratch/10 -d /mnt/scratch/11 -d /mnt/scratch/12 -d /mnt/scratch/13 -d /mnt/scratch/14 -d /mnt/scratch/15 FSUse%Count SizeFiles/sec App Overhead 0 1600 23599.7 38047237 Last time I ran this (probably about 3.12 - btrfs was simply too broken when I last tried on 3.14) I got about 80,000 files/s so this is a pretty significant regression. The 16-way run consumed most of the 16 CPUs in the system, and the perf top output showed this: + 44.48% [kernel] [k] _raw_spin_unlock_irqrestore + 28.60% [kernel] [k] queue_read_lock_slowpath + 14.34% [kernel] [k] queue_write_lock_slowpath + 1.91% [kernel] [k] _raw_spin_unlock_irq + 0.85% [kernel] [k] __do_softirq + 0.45% [kernel] [k] do_raw_read_lock + 0.43% [kernel] [k] do_raw_read_unlock + 0.42% [kernel] [k] btrfs_search_slot + 0.40% [kernel] [k] do_raw_spin_lock + 0.35% [kernel] [k] btrfs_tree_read_unlock + 0.33% [kernel] [k] do_raw_write_lock + 0.30% [kernel] [k] btrfs_clear_lock_blocking_rw + 0.29% [kernel] [k] btrfs_tree_read_lock All the CPU time is basically spend in locking functions. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Quota Ignored On write problem still exist with 3.16-rc5
Hi Wang, (2014/07/18 19:29), Wang Shilong wrote: On 07/18/2014 04:45 PM, Satoru Takeuchi wrote: Hi Josef, Chris, I found Quota Ignored On write problem still exist with 3.16-rc5, which Kevin reported before. Kevin's report: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35292.html The result of bisect: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35304.html I guess this is because Josef's patch delayed qgroup accounting, it will cause @refer and @excl updating very late... The patch maybe optimize to merge some delayed refs(for example), but it updates qgroup accounting when commiting transaction which will be very late, we may have accumulated many data.. Thank you for your comment. I know of the code logic which caused this problem. However, what I want to say here is that this problem should be fixed as soon as possible. It is a important regression problem and we've already know the root cause. If it's impossible to fix it by releasing 3.16, I consider this patch should be reverted. Thanks, Satoru Thanks, Wang I bisected and found the bad commit is the following patch. === commit fcebe4562dec83b3f8d3088d77584727b09130b2 Author: Josef Bacik jba...@fb.com Date: Tue May 13 17:30:47 2014 -0700 Btrfs: rework qgroup accounting === Josef, please take a look at this patch. Reproducer: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35299.html Could you tell me the progress of fixing this bug? In addition, could you fix it by 3.16? command log: === # ./test.sh + uname -a Linux luna.soft.fujitsu.com 3.16.0-rc5 #2 SMP Tue Jul 15 13:39:46 JST 2014 x86_64 x86_64 x86_64 GNU/Linux + df -T /test7 Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/sdc7 btrfs 29296640 1536 27169536 1% /test7 + btrfs quota ena /test7 + cd /test7 + btrfs sub cre test Create subvolume './test' + btrfs sub l -a /test7 ID 270 gen 66 top level 5 path test + btrfs qg lim 1G test # limit test subvol to 1GB + btrfs qg show -pcre /test7 qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16384 16384 0 0--- --- 0/27016384 16384 1073741824 0--- --- + dd if=/dev/zero of=test/file0 bs=1M count=2000 2000+0 records in 2000+0 records out 2097152000 bytes (2.1 GB) copied, 9.67876 s, 217 MB/s # write 2GB. It's a bug! + sync + ls -lisaR /test7 /test7: total 20 256 16 drwxr-xr-x 1 root root8 Jul 18 15:12 . 2 4 drwxr-xr-x. 43 root root 4096 Jul 16 08:34 .. 256 0 drwxr-xr-x 1 root root 10 Jul 18 15:17 test /test7/test: total 2048016 256 0 drwxr-xr-x 1 root root 10 Jul 18 15:17 . 256 16 drwxr-xr-x 1 root root 8 Jul 18 15:12 .. 257 2048000 -rw-r--r-- 1 root root 2097152000 Jul 18 15:17 file0 + btrfs qg show -pcre /test7 qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16384 16384 0 0--- --- 0/2702097168384 2097168384 1073741824 0--- --- + btrfs quota dis /test7 + btrfs sub del test Transaction commit: none (default) Delete subvolume '/test7/test' + set +x # === NOTE: The reproducer here (./test.sh) is a bit different from above-mentioned one because of some reason. Thanks, Satoru -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
On Tue, Jul 22, 2014 at 10:53 AM, Chris Mason c...@fb.com wrote: Thanks for the help in tracking this down everyone. We'll get there! Are you all running multi-disk systems (from a btrfs POV, more than one device?) I don't care how many physical drives this maps to, just does btrfs think there's more than one drive. I've been away on vacation so I haven't been able to try your latest patch, but I can try whatever is out there starting this weekend. I was getting fairly consistent hangs during heavy IO (especially rsync) on 3.15 with lzo enabled. This is on raid1 across 5 drives, directly against the partitions themselves (no dmcrypt, mdadm, lvm, etc). I disabled lzo and haven't had problems since. I'm now running on mainline without issue, but I think I did see the hang on mainline when I tried enabling lzo again briefly. Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RESEND 1/4] btrfs-progs: Check fstype in find_mount_root()
David, thanks for all the comments about the 'fi di' related patchset. Original Message Subject: Re: [PATCH RESEND 1/4] btrfs-progs: Check fstype in find_mount_root() From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年07月23日 03:15 On Thu, Jul 10, 2014 at 11:05:10AM +0800, Qu Wenruo wrote: When calling find_mount_root(), caller in fact wants to find the mount point of *BTRFS*. So also check ent-fstype in find_mount_root() and output proper error messages if needed. The utils.c functions should be mostly silent about the errors as this this the common code and it's up to the callers to print the messages. The existing printf in find_mount_root had appeared before the function was moved to utils.c. Thanks for the info about convention in utils.c I'll update the patch and remove the printf from the original codes and my patches. This will suppress a lot of Inapproiate ioctl for device error message. Catching the error early is a good thing of course. BTW, I did not see the patchset the latest integration branch, so after all the update about the patchset, should I resend the patchset rebased on the latest integration branch? Thanks, Qu. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs-progs: Add mount point output for 'btrfs fi df' command.
Original Message Subject: Re: [PATCH v2] btrfs-progs: Add mount point output for 'btrfs fi df' command. From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年07月23日 01:56 On Wed, Jul 09, 2014 at 02:56:57PM +0800, Qu Wenruo wrote: Add mount point output for 'btrfs fi df'. Also since the patch uses find_mount_root() to find mount point, now 'btrfs fi df' can output more meaningful error message when given a non-btrfs path. If a non-btrfs path is passed, the Mounted on line is printed, followed by 2 ERROR: lines. I suggest to print it only if get_df succeeds, ie. right before print_df. Thanks for mentioning, I'll update the patchset soon. Thanks, Qu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
3.16.0-0.rc6.git0.1.fc21.1.x86_64 btfs-progs 3.14.2 Fortunately this is a test system so it is dispensable. But in just an hour I ran into 5 bugs, and managed to apparently completely destroy a btrfs file system beyond repair, and it wasn't intentional. 1. mkfs.btrfs /dev/sda6 ## volume's life starts as single device, on an SSD 2. btrfs device add /dev/sdb1 / ## added an HDD partition 3. btrfs balance start -dconvert=raid1 -mconvert=raid1 4. clean shutdown, remove device 1 (leaving device 0) 5. poweron, mount degraded 6. gdm/gnome comes up very slowly, then I see a sad face graphic, with a message that there's only 60MB of space left. # df -h Filesystem Size Used Avail Use% Mounted on /dev/sda626G 13G 20M 100% / /dev/sda626G 13G 20M 100% /home /dev/sda626G 13G 20M 100% /var /dev/sda626G 13G 20M 100% /boot # btrfs fi df Data, RAID1: total=6.00GiB, used=5.99GiB System, RAID1: total=32.00MiB, used=32.00KiB Metadata, RAID1: total=768.00MiB, used=412.41MiB unknown, single: total=160.00MiB, used=0.00 # btrfs fi show Label: 'Rawhide2' uuid: f857c336-b8f5-4f5d-9500-a705ee1b6977 Total devices 2 FS bytes used 6.39GiB devid1 size 12.58GiB used 6.78GiB path /dev/sda6 *** Some devices missing Btrfs v3.14.2 BUG 1: The df command is clearly bogus six ways to Sunday. It's a 12.58 GiB partition, only 6.78GiB used, thus 5.8GiB free, yet df and apparently gvfs think it's full, maybe systemd too because the journal wigged out and stopped logging events while also kept stopping and starting. So whatever changes occurred to clean up the df reporting, are very problematic at best when mounting degraded. so then he gets curious about replacing the missing disk== 7. btrfs replace start 2 /dev/sdb1 / ## this is a ~13GB partition that matches the size of the missing device This completes, no disk activity for a little over a minute, and then I see a call trace with btrfs_replace implicated. Unfortunately the system becomes so unstable at this point, I can't even capture a dmesg to a separate volume. After 30 minutes of unresponsive local shells, I force a poweroff. 8. Power on. Dropped to a dracut shell, as the btrfs volume will not mount: [ 53.890761] rawhide kernel: BTRFS: failed to read the system array on sda6 [ 53.905058] rawhide kernel: BTRFS: open_ctree failed 9. mount with -o recovery, same message 10. Reboot using vbox pointed to these partitions as raw devices so I can better capture data, and not use a degraded fs as root; the devices are sdb and sdc. # mount -o ro /dev/sdb /mnt mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. [ 216.819927] BTRFS: failed to read the system array on sdc [ 216.835570] BTRFS: open_ctree failed So it's the same message as in dracut shell. Same message with ro,recovery. 11. mount -o degraded,ro /dev/sdb /mnt This works. Somehow the replace hasn't completed on some level. Very weird. And not intuitive. [root@localhost ~]# btrfs fi show Label: 'Rawhide2' uuid: f857c336-b8f5-4f5d-9500-a705ee1b6977 Total devices 2 FS bytes used 6.39GiB devid0 size 12.58GiB used 6.78GiB path /dev/sdc devid1 size 12.58GiB used 6.78GiB path /dev/sdb Btrfs v3.14.2 Does not show any missing devices. I vaguely recall in the dracut shell when booted baremetal that btrfs fi show did still show a missing devices along with the original and replacement devices, i.e. the replace didn't complete. I suspect that my 'btrfs replace start 2' is wrong, that devid 2 did not exist, it was actually devid 0 and 1 like above; but the problem is that btrfs fi show does not show devid for missing devices. I only saw the devid 1 for the remaining device, and assumed the missing one was 2. So that's why I did 'btrfs replace start 2' yet I didn't get an error message. The replace started, but apparently didn't complete. BUG 2: btrfs fi show needs to show the devid of the missing device. BUG 3: btrfs replace start should fail when specifying a non-existent devid. BUG 4: btrfs replace start can fail to complete (possibly related to bug 2 and 3). BUG 4: When mounting -degraded (rw), I get a major oops resulting in a completely unresponsive system. # mount -o degraded /dev/sdb /mnt [ 16.466995] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [ 55.081687] BTRFS info (device sdb): allowing degraded mounts [ 55.082107] BTRFS info (device sdb): disk space caching is enabled [ 55.117702] SELinux: initialized (dev sdb, type btrfs), uses xattr [ 55.117717] BTRFS: continuing dev_replace from missing disk (devid 2) to /dev/sdc @72% [ 55.530810] BTRFS: dev_replace from missing disk (devid 2) to /dev/sdc) finished [ 55.532149] BUG: unable to handle kernel
Re: BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
Interesting, if I remove /dev/sdc (the hdd), then this command works: [root@localhost ~]# mount -o degraded,recovery /dev/sdb /mnt [root@localhost ~]# btrfs replace status /mnt 72.1% done, 0 write errs, 0 uncorr. read errs 72.1% done, 0 write errs, 0 uncorr. read errs^C ## above command hangs, but cancels with control-c [root@localhost ~]# btrfs replace cancel /mnt [root@localhost ~]# btrfs replace status /mnt Started on 22.Jul 16:10:37, canceled on 22.Jul 19:41:23 at 0.0%, 0 write errs, 0 uncorr. read errs [root@localhost ~]# btrfs balance start -dconvert=single -mconvert=single /mnt -f Done, had to relocate 10 out of 10 chunks [root@localhost ~]# btrfs fi show Label: 'Rawhide2' uuid: f857c336-b8f5-4f5d-9500-a705ee1b6977 Total devices 2 FS bytes used 6.39GiB devid1 size 12.58GiB used 7.03GiB path /dev/sdb *** Some devices missing Btrfs v3.14.2 [root@localhost ~]# btrfs fi df /mnt Data, single: total=6.00GiB, used=5.99GiB System, single: total=32.00MiB, used=32.00KiB Metadata, single: total=1.00GiB, used=412.00MiB unknown, single: total=160.00MiB, used=0.00 [root@localhost ~]# btrfs device delete missing /mnt [root@localhost ~]# btrfs fi show Label: 'Rawhide2' uuid: f857c336-b8f5-4f5d-9500-a705ee1b6977 Total devices 1 FS bytes used 6.39GiB devid1 size 12.58GiB used 7.03GiB path /dev/sdb Btrfs v3.14.2 So it's recovered and back to normal. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Quota Ignored On write problem still exist with 3.16-rc5
Hi Satoru-san, On 07/23/2014 08:53 AM, Satoru Takeuchi wrote: Hi Wang, (2014/07/18 19:29), Wang Shilong wrote: On 07/18/2014 04:45 PM, Satoru Takeuchi wrote: Hi Josef, Chris, I found Quota Ignored On write problem still exist with 3.16-rc5, which Kevin reported before. Kevin's report: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35292.html The result of bisect: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35304.html I guess this is because Josef's patch delayed qgroup accounting, it will cause @refer and @excl updating very late... The patch maybe optimize to merge some delayed refs(for example), but it updates qgroup accounting when commiting transaction which will be very late, we may have accumulated many data.. Thank you for your comment. I know of the code logic which caused this problem. However, what I want to say here is that this problem should be fixed as soon as possible. It is a important regression problem and we've already know the root cause. If it's impossible to fix it by releasing 3.16, I consider this patch should be reverted. Since Btrfs Quota function is under heavy development, and should be considered as *broken*. I think we'd better close Btrfs quota function(Like snapshot-aware function) until we really sit down and solve everything. Thanks, Wang Thanks, Satoru Thanks, Wang I bisected and found the bad commit is the following patch. === commit fcebe4562dec83b3f8d3088d77584727b09130b2 Author: Josef Bacik jba...@fb.com Date: Tue May 13 17:30:47 2014 -0700 Btrfs: rework qgroup accounting === Josef, please take a look at this patch. Reproducer: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35299.html Could you tell me the progress of fixing this bug? In addition, could you fix it by 3.16? command log: === # ./test.sh + uname -a Linux luna.soft.fujitsu.com 3.16.0-rc5 #2 SMP Tue Jul 15 13:39:46 JST 2014 x86_64 x86_64 x86_64 GNU/Linux + df -T /test7 Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/sdc7 btrfs 29296640 1536 27169536 1% /test7 + btrfs quota ena /test7 + cd /test7 + btrfs sub cre test Create subvolume './test' + btrfs sub l -a /test7 ID 270 gen 66 top level 5 path test + btrfs qg lim 1G test # limit test subvol to 1GB + btrfs qg show -pcre /test7 qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16384 16384 0 0--- --- 0/27016384 16384 1073741824 0--- --- + dd if=/dev/zero of=test/file0 bs=1M count=2000 2000+0 records in 2000+0 records out 2097152000 bytes (2.1 GB) copied, 9.67876 s, 217 MB/s # write 2GB. It's a bug! + sync + ls -lisaR /test7 /test7: total 20 256 16 drwxr-xr-x 1 root root8 Jul 18 15:12 . 2 4 drwxr-xr-x. 43 root root 4096 Jul 16 08:34 .. 256 0 drwxr-xr-x 1 root root 10 Jul 18 15:17 test /test7/test: total 2048016 256 0 drwxr-xr-x 1 root root 10 Jul 18 15:17 . 256 16 drwxr-xr-x 1 root root 8 Jul 18 15:12 .. 257 2048000 -rw-r--r-- 1 root root 2097152000 Jul 18 15:17 file0 + btrfs qg show -pcre /test7 qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16384 16384 0 0--- --- 0/2702097168384 2097168384 1073741824 0--- --- + btrfs quota dis /test7 + btrfs sub del test Transaction commit: none (default) Delete subvolume '/test7/test' + set +x # === NOTE: The reproducer here (./test.sh) is a bit different from above-mentioned one because of some reason. Thanks, Satoru -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[BUG] bogus out of space reported when mounted raid1 degraded
On Jul 22, 2014, at 7:34 PM, Chris Murphy li...@colorremedies.com wrote: BUG 1: The df command is clearly bogus six ways to Sunday. It's a 12.58 GiB partition, only 6.78GiB used, thus 5.8GiB free, yet df and apparently gvfs think it's full, maybe systemd too because the journal wigged out and stopped logging events while also kept stopping and starting. So whatever changes occurred to clean up the df reporting, are very problematic at best when mounting degraded. Used strace on df, think I found the problem so I put it all into a bug. https://bugzilla.kernel.org/show_bug.cgi?id=80951 Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
This one (your bug #4) was likely caused by: commit 4cde9c59c2b8bb67d46d531b26cc73e39747 Author: Anand Jain anand.j...@oracle.com Date: Tue Jun 3 11:36:00 2014 +0800 btrfs: dev delete should remove sysfs entry and hopefully fixed by: commit 0bfaa9c5cb479cebc24979b384374fe47500b4c9 Author: Eric Sandeen sand...@redhat.com Date: Mon Jul 7 12:34:49 2014 -0500 btrfs: test for valid bdev before kobj removal in btrfs_rm_device -Eric On 7/22/14, 8:34 PM, Chris Murphy wrote: BUG 4: When mounting -degraded (rw), I get a major oops resulting in a completely unresponsive system. # mount -o degraded /dev/sdb /mnt [ 16.466995] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [ 55.081687] BTRFS info (device sdb): allowing degraded mounts [ 55.082107] BTRFS info (device sdb): disk space caching is enabled [ 55.117702] SELinux: initialized (dev sdb, type btrfs), uses xattr [ 55.117717] BTRFS: continuing dev_replace from missing disk (devid 2) to /dev/sdc @72% [ 55.530810] BTRFS: dev_replace from missing disk (devid 2) to /dev/sdc) finished [ 55.532149] BUG: unable to handle kernel NULL pointer dereference at 0088 [ 55.533087] IP: [a0268551] btrfs_kobj_rm_device+0x21/0x40 [btrfs] [ 55.533087] PGD 0 [ 55.533087] Oops: [#1] SMP [ 55.533087] Modules linked in: cfg80211 rfkill btrfs snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device ppdev xor raid6_pq snd_pcm microcode snd_timer serio_raw parport_pc snd i2c_piix4 parport soundcore i2c_core xfs libcrc32c virtio_net virtio_pci virtio_ring ata_generic virtio pata_acpi [ 55.533087] CPU: 2 PID: 821 Comm: btrfs-devrepl Not tainted 3.16.0-0.rc6.git0.1.fc21.1.x86_64 #1 [ 55.533087] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 55.533087] task: 880099b5eca0 ti: 88009983c000 task.ti: 88009983c000 [ 55.533087] RIP: 0010:[a0268551] [a0268551] btrfs_kobj_rm_device+0x21/0x40 [btrfs] [ 55.533087] RSP: 0018:88009983fe08 EFLAGS: 00010286 [ 55.533087] RAX: RBX: RCX: bbb3527a6b299586 [ 55.533087] RDX: 880036b6e410 RSI: 88009b4a2800 RDI: 880035f6cac0 [ 55.533087] RBP: 88009983fe10 R08: 880036b6e410 R09: 0234 [ 55.533087] R10: e8d01090 R11: 818675c0 R12: 880099a2cdc8 [ 55.533087] R13: 88009b4a2800 R14: 880099eaa000 R15: 880036acf200 [ 55.533087] FS: () GS:88009fb0() knlGS: [ 55.533087] CS: 0010 DS: ES: CR0: 8005003b [ 55.533087] CR2: 0088 CR3: 9aefe000 CR4: 06e0 [ 55.533087] Stack: [ 55.533087] 880099a2c000 88009983fe90 a02bf93d 880099a2c100 [ 55.533087] 880099a2ce38 0006baa5 0028 88009983fea0 [ 55.533087] 88009983fe58 2909d417 880099a2c000 2909d417 [ 55.533087] Call Trace: [ 55.533087] [a02bf93d] btrfs_dev_replace_finishing+0x32d/0x5c0 [btrfs] [ 55.533087] [a02c0130] ? btrfs_dev_replace_status+0x110/0x110 [btrfs] [ 55.533087] [a02c019d] btrfs_dev_replace_kthread+0x6d/0x130 [btrfs] [ 55.533087] [810b311a] kthread+0xea/0x100 [ 55.533087] [810b3030] ? insert_kthread_work+0x40/0x40 [ 55.533087] [8172253c] ret_from_fork+0x7c/0xb0 [ 55.533087] [810b3030] ? insert_kthread_work+0x40/0x40 [ 55.533087] Code: 5f 5d c3 0f 1f 80 00 00 00 00 66 66 66 66 90 55 48 89 e5 53 48 8b bf f0 09 00 00 48 85 ff 74 20 31 db 48 85 f6 74 14 48 8b 46 78 48 8b 80 88 00 00 00 48 8b 70 38 e8 2f 23 01 e1 89 d8 5b 5d c3 [ 55.533087] RIP [a0268551] btrfs_kobj_rm_device+0x21/0x40 [btrfs] [ 55.533087] RSP 88009983fe08 [ 55.533087] CR2: 0088 [ 55.533087] ---[ end trace a34670f31a1db59e ]--- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
On Tue, Jul 22, 2014 at 07:34:58PM -0600, Chris Murphy wrote: 3.16.0-0.rc6.git0.1.fc21.1.x86_64 btfs-progs 3.14.2 Fortunately this is a test system so it is dispensable. But in just an hour I ran into 5 bugs, and managed to apparently completely destroy a btrfs file system beyond repair, and it wasn't intentional. 1. mkfs.btrfs /dev/sda6 ## volume's life starts as single device, on an SSD 2. btrfs device add /dev/sdb1 / ## added an HDD partition 3. btrfs balance start -dconvert=raid1 -mconvert=raid1 4. clean shutdown, remove device 1 (leaving device 0) 5. poweron, mount degraded 6. gdm/gnome comes up very slowly, then I see a sad face graphic, with a message that there's only 60MB of space left. # df -h Filesystem Size Used Avail Use% Mounted on /dev/sda626G 13G 20M 100% / /dev/sda626G 13G 20M 100% /home /dev/sda626G 13G 20M 100% /var /dev/sda626G 13G 20M 100% /boot # btrfs fi df Data, RAID1: total=6.00GiB, used=5.99GiB System, RAID1: total=32.00MiB, used=32.00KiB Metadata, RAID1: total=768.00MiB, used=412.41MiB unknown, single: total=160.00MiB, used=0.00 # btrfs fi show Label: 'Rawhide2' uuid: f857c336-b8f5-4f5d-9500-a705ee1b6977 Total devices 2 FS bytes used 6.39GiB devid1 size 12.58GiB used 6.78GiB path /dev/sda6 *** Some devices missing Btrfs v3.14.2 BUG 1: The df command is clearly bogus six ways to Sunday. It's a 12.58 GiB partition, only 6.78GiB used, thus 5.8GiB free, yet df and apparently gvfs think it's full, maybe systemd too because the journal wigged out and stopped logging events while also kept stopping and starting. So whatever changes occurred to clean up the df reporting, are very problematic at best when mounting degraded. so then he gets curious about replacing the missing disk== 7. btrfs replace start 2 /dev/sdb1 / ## this is a ~13GB partition that matches the size of the missing device This completes, no disk activity for a little over a minute, and then I see a call trace with btrfs_replace implicated. Unfortunately the system becomes so unstable at this point, I can't even capture a dmesg to a separate volume. After 30 minutes of unresponsive local shells, I force a poweroff. 8. Power on. Dropped to a dracut shell, as the btrfs volume will not mount: [ 53.890761] rawhide kernel: BTRFS: failed to read the system array on sda6 [ 53.905058] rawhide kernel: BTRFS: open_ctree failed 9. mount with -o recovery, same message 10. Reboot using vbox pointed to these partitions as raw devices so I can better capture data, and not use a degraded fs as root; the devices are sdb and sdc. # mount -o ro /dev/sdb /mnt mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. [ 216.819927] BTRFS: failed to read the system array on sdc [ 216.835570] BTRFS: open_ctree failed So it's the same message as in dracut shell. Same message with ro,recovery. 11. mount -o degraded,ro /dev/sdb /mnt This works. Somehow the replace hasn't completed on some level. Very weird. And not intuitive. [root@localhost ~]# btrfs fi show Label: 'Rawhide2' uuid: f857c336-b8f5-4f5d-9500-a705ee1b6977 Total devices 2 FS bytes used 6.39GiB devid0 size 12.58GiB used 6.78GiB path /dev/sdc devid1 size 12.58GiB used 6.78GiB path /dev/sdb Btrfs v3.14.2 Does not show any missing devices. I vaguely recall in the dracut shell when booted baremetal that btrfs fi show did still show a missing devices along with the original and replacement devices, i.e. the replace didn't complete. I suspect that my 'btrfs replace start 2' is wrong, that devid 2 did not exist, it was actually devid 0 and 1 like above; but the problem is that btrfs fi show does not show devid for missing devices. I only saw the devid 1 for the remaining device, and assumed the missing one was 2. So that's why I did 'btrfs replace start 2' yet I didn't get an error message. The replace started, but apparently didn't complete. BUG 2: btrfs fi show needs to show the devid of the missing device. BUG 3: btrfs replace start should fail when specifying a non-existent devid. BUG 4: btrfs replace start can fail to complete (possibly related to bug 2 and 3). BUG 4: When mounting -degraded (rw), I get a major oops resulting in a completely unresponsive system. # mount -o degraded /dev/sdb /mnt [ 16.466995] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [ 55.081687] BTRFS info (device sdb): allowing degraded mounts [ 55.082107] BTRFS info (device sdb): disk space caching is enabled [ 55.117702] SELinux: initialized (dev sdb, type btrfs), uses xattr [ 55.117717] BTRFS: continuing dev_replace from
Re: BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
On Jul 22, 2014, at 9:01 PM, Liu Bo bo.li@oracle.com wrote: so then he gets curious about replacing the missing disk== 7. btrfs replace start 2 /dev/sdb1 / ## this is a ~13GB partition that matches the size of the missing device This completes, no disk activity for a little over a minute, and then I see a call trace with btrfs_replace implicated. Unfortunately the system becomes so unstable at this point, I can't even capture a dmesg to a separate volume. After 30 minutes of unresponsive local shells, I force a poweroff. OK I've reproduced this original oops that causes the problem during device replace. The command above is correct, it is devid 2. Here's the trace that happens during rebuild. It's only slightly different than the -o rw,degraded trace. What I note is that it reports the device replace is finished, yet also at that time it barfs, probably before it finishes writing whatever's needed so that subsequent mounts can be done normally rather than with -o degraded. [ 423.512988] BTRFS: dev_replace from missing disk (devid 2) to /dev/sdb1 started [ 651.671835] BTRFS: dev_replace from missing disk (devid 2) to /dev/sdb1) finished [ 651.672485] BUG: unable to handle kernel NULL pointer dereference at 0088 [ 651.673144] IP: [a03da551] btrfs_kobj_rm_device+0x21/0x40 [btrfs] [ 651.673834] PGD 8723b067 PUD 8723c067 PMD 0 [ 651.674512] Oops: [#1] SMP [ 651.675184] Modules linked in: ccm xt_CHECKSUM ipt_MASQUERADE ip6t_rpfilter ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnep nls_utf8 hfsplus arc4 b43 mac80211 x86_pkg_temp_thermal coretemp kvm_intel cfg80211 uvcvideo kvm ssb videobuf2_vmalloc iTCO_wdt crct10dif_pclmul videobuf2_memops videobuf2_core iTCO_vendor_support crc32_pclmul v4l2_common crc32c_intel videodev btusb ghash_clmulni_intel applesmc sdhci_pci input_polldev bluetooth media sdhci hid_appleir microcode bcm5974 rfkill mmc_core i2c_i801 bcma [ 651.677785] snd_hda_codec_cirrus lpc_ich snd_hda_codec_generic mfd_core snd_hda_codec_hdmi sbs sbshc snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm mei_me snd_timer apple_gmux snd mei apple_bl shpchp soundcore firewire_sbp2 btrfs xor raid6_pq i915 ttm i2c_algo_bit drm_kms_helper tg3 drm firewire_ohci ptp firewire_core pps_core i2c_core crc_itu_t video [ 651.680756] CPU: 0 PID: 1443 Comm: btrfs Not tainted 3.16.0-0.rc6.git0.1.fc21.1.x86_64 #1 [ 651.681816] Hardware name: Apple Inc. MacBookPro8,2/Mac-94245A3940C91C80, BIOSMBP81.88Z.0047.B27.1201241646 01/24/12 [ 651.682913] task: 8802546b62c0 ti: 880087254000 task.ti: 880087254000 [ 651.684030] RIP: 0010:[a03da551] [a03da551] btrfs_kobj_rm_device+0x21/0x40 [btrfs] [ 651.685190] RSP: 0018:880087257c80 EFLAGS: 00010286 [ 651.686346] RAX: RBX: RCX: dfc8a37487c2b3b9 [ 651.687517] RDX: 88026061f810 RSI: 88026061ce00 RDI: 88026130d0c0 [ 651.688705] RBP: 880087257c88 R08: 88026061f810 R09: 052e [ 651.689881] R10: 88026fa1cdc0 R11: 0001 R12: 88025f981dc8 [ 651.691059] R13: 88026061ce00 R14: 88026174d800 R15: 880262b31800 [ 651.692239] FS: 7f5b0225f880() GS:88026fa0() knlGS: [ 651.693439] CS: 0010 DS: ES: CR0: 80050033 [ 651.694638] CR2: 0088 CR3: 3f2b5000 CR4: 000407f0 [ 651.695850] Stack: [ 651.697053] 88025f981000 880087257d08 a043193d 88025f981100 [ 651.698301] 88025f981e38 000a3ea5 00ff8802 8802546b62c0 [ 651.699556] 810d7fa0 880087257cc8 880087257cc8 547e2838 [ 651.700824] Call Trace: [ 651.702099] [a043193d] btrfs_dev_replace_finishing+0x32d/0x5c0 [btrfs] [ 651.703397] [810d7fa0] ? abort_exclusive_wait+0xb0/0xb0 [ 651.704714] [a0431f52] btrfs_dev_replace_start+0x382/0x450 [btrfs] [ 651.706048] [a03faa8a] btrfs_ioctl+0x1caa/0x28f0 [btrfs] [ 651.707379] [811b4be6] ? handle_mm_fault+0x8d6/0xfd0 [ 651.708711] [8105be2c] ? __do_page_fault+0x29c/0x580 [ 651.710038] [81203187] ? cp_new_stat+0x157/0x190 [ 651.711361] [81212100] do_vfs_ioctl+0x2d0/0x4b0 [ 651.712683] [81212361] SyS_ioctl+0x81/0xa0 [ 651.714007] [817225e9] system_call_fastpath+0x16/0x1b [ 651.715332] Code: 5f 5d c3 0f 1f 80 00 00 00 00 66 66 66 66 90 55 48 89 e5 53 48 8b bf f0 09 00 00 48 85 ff 74 20 31 db 48 85 f6 74 14 48 8b 46 78 48 8b 80 88 00 00 00 48 8b 70 38 e8 2f 03 ea e0 89 d8 5b 5d
Re: BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
On Jul 22, 2014, at 8:52 PM, Eric Sandeen sand...@redhat.com wrote: This one (your bug #4) was likely caused by: commit 4cde9c59c2b8bb67d46d531b26cc73e39747 Author: Anand Jain anand.j...@oracle.com Date: Tue Jun 3 11:36:00 2014 +0800 btrfs: dev delete should remove sysfs entry and hopefully fixed by: commit 0bfaa9c5cb479cebc24979b384374fe47500b4c9 Author: Eric Sandeen sand...@redhat.com Date: Mon Jul 7 12:34:49 2014 -0500 btrfs: test for valid bdev before kobj removal in btrfs_rm_device OK good. Hopefully the first one is reverted or the second one is accepted before 3.16 is released, replace appears to be broken at the moment. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
On Tue, Jul 22, 2014 at 09:28:52PM -0600, Chris Murphy wrote: On Jul 22, 2014, at 8:52 PM, Eric Sandeen sand...@redhat.com wrote: This one (your bug #4) was likely caused by: commit 4cde9c59c2b8bb67d46d531b26cc73e39747 Author: Anand Jain anand.j...@oracle.com Date: Tue Jun 3 11:36:00 2014 +0800 btrfs: dev delete should remove sysfs entry and hopefully fixed by: commit 0bfaa9c5cb479cebc24979b384374fe47500b4c9 Author: Eric Sandeen sand...@redhat.com Date: Mon Jul 7 12:34:49 2014 -0500 btrfs: test for valid bdev before kobj removal in btrfs_rm_device OK good. Hopefully the first one is reverted or the second one is accepted before 3.16 is released, replace appears to be broken at the moment. Looks that they are not the same one, since you didn't use a btrfs_rm_device, As we just skip adding a sysfs entry for a missing device(dev-bdev is NULL), we can do the same thing in removing a sysfs entry, could you please try this? -liubo diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 7869936..12e5355 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -614,7 +614,7 @@ int btrfs_kobj_rm_device(struct btrfs_fs_info *fs_info, if (!fs_info-device_dir_kobj) return -EINVAL; - if (one_device) { + if (one_device one_device-bdev) { disk = one_device-bdev-bd_part; disk_kobj = part_to_dev(disk)-kobj; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
On Jul 22, 2014, at 9:36 PM, Liu Bo bo.li@oracle.com wrote: On Tue, Jul 22, 2014 at 09:28:52PM -0600, Chris Murphy wrote: On Jul 22, 2014, at 8:52 PM, Eric Sandeen sand...@redhat.com wrote: This one (your bug #4) was likely caused by: commit 4cde9c59c2b8bb67d46d531b26cc73e39747 Author: Anand Jain anand.j...@oracle.com Date: Tue Jun 3 11:36:00 2014 +0800 btrfs: dev delete should remove sysfs entry and hopefully fixed by: commit 0bfaa9c5cb479cebc24979b384374fe47500b4c9 Author: Eric Sandeen sand...@redhat.com Date: Mon Jul 7 12:34:49 2014 -0500 btrfs: test for valid bdev before kobj removal in btrfs_rm_device OK good. Hopefully the first one is reverted or the second one is accepted before 3.16 is released, replace appears to be broken at the moment. Looks that they are not the same one, since you didn't use a btrfs_rm_device, As we just skip adding a sysfs entry for a missing device(dev-bdev is NULL), we can do the same thing in removing a sysfs entry, could you please try this? Normally yes, but not for a couple weeks this time. While replace cancel worked, and balance conversion back to single profile worked, I forgot to immediately device delete missing, and instead I rebooted. Now I can't mount degraded, and I run into this old bug: [ 71.064352] BTRFS info (device sdb): allowing degraded mounts [ 71.064812] BTRFS info (device sdb): enabling auto recovery [ 71.065210] BTRFS info (device sdb): disk space caching is enabled [ 71.072068] BTRFS warning (device sdb): devid 2 missing [ 71.097320] BTRFS: too many missing devices, writeable mount is not allowed [ 71.116616] BTRFS: open_ctree failed Since I can't mount degraded rw I can't make read-only snapshots, and can't btrfs send receive the subvolumes, so this setup will need to be replaced. Not a big deal, just time, but maybe someone else can test it sooner than me. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
On 7/22/14, 10:36 PM, Liu Bo wrote: On Tue, Jul 22, 2014 at 09:28:52PM -0600, Chris Murphy wrote: On Jul 22, 2014, at 8:52 PM, Eric Sandeen sand...@redhat.com wrote: This one (your bug #4) was likely caused by: commit 4cde9c59c2b8bb67d46d531b26cc73e39747 Author: Anand Jain anand.j...@oracle.com Date: Tue Jun 3 11:36:00 2014 +0800 btrfs: dev delete should remove sysfs entry and hopefully fixed by: commit 0bfaa9c5cb479cebc24979b384374fe47500b4c9 Author: Eric Sandeen sand...@redhat.com Date: Mon Jul 7 12:34:49 2014 -0500 btrfs: test for valid bdev before kobj removal in btrfs_rm_device OK good. Hopefully the first one is reverted or the second one is accepted before 3.16 is released, replace appears to be broken at the moment. Looks that they are not the same one, since you didn't use a btrfs_rm_device, Oh, you're right - I'm sorry, I didn't look closely enough. -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] bogus out of space reported when mounted raid1 degraded
Chris Murphy posted on Tue, 22 Jul 2014 20:36:55 -0600 as excerpted: On Jul 22, 2014, at 7:34 PM, Chris Murphy li...@colorremedies.com wrote: BUG 1: The df command is clearly bogus six ways to Sunday. It's a 12.58 GiB partition, only 6.78GiB used, thus 5.8GiB free, yet df and apparently gvfs think it's full, maybe systemd too because the journal wigged out and stopped logging events while also kept stopping and starting. So whatever changes occurred to clean up the df reporting, are very problematic at best when mounting degraded. Used strace on df, think I found the problem so I put it all into a bug. https://bugzilla.kernel.org/show_bug.cgi?id=80951 Suggestion for improved bug summary/title: Current: df reports bogus filesystem usage when mounted degraded Problems: While it's product file system and component btrfs... 1) the summary doesn't mention btrfs, and 2) there's some ambiguity as to whether it's normal or btrfs df. Proposed improved summaries/titles: #1: (Non-btrfs) df reports bogus btrfs usage with degraded mount. #2: With degraded btrfs mount, (non-btrfs) df reports bogus usage. Meanwhile, to the problem at hand... There are two root issues here: The first is a variant of something already discussed in the FAQ and reasonably well known on the list: (non-btrfs) df is simply not accurate in many cases on a multi-device btrfs, because a multi-device btrfs breaks all the old rules and assumptions upon which it bases its reporting. There has been some debate about how it should work, but the basic problem is that there's no way to present all the information necessary to get a proper picture of the situation while continuing to keep output format backward compatibility in ordered to prevent breaking the various scripts etc that depend on the existing format. The best way forward seems to be some sort of at best half-broken compromise regarding legacy df output, maintaining backward output format compatibility and at least not breaking too badly in the legacy- assumption single-device filesystem case, but not really working so well in all the various multi-device btrfs cases, because the output format is simply too constrained to present the necessary information properly. With some work, it should be possible to make at least the most common multi-device btrfs cases not /entirely/ broken as well, altho the old assumptions constrain output format such that there will always be corner- cases that don't present well -- for these legacy df is just that, legacy, and a more appropriate tool is needed. And a two-device btrfs raid1 mounted degraded with one device missing is just such a corner-case, at least presently. Given the second root issue below, however, IMO the existing presentation was as accurate as could be expected under the circumstances. The second half of the solution (still to root issue #1), then, is providing a more appropriate btrfs specific tool free of these legacy assumptions and output format constraints. Currently, the solution there actually ships as two different reports which must be taken together to get a proper picture of the situation, currently with some additional interpretation required as well. Of course I'm talking about btrfs filesystem show along with btrfs filesystem df. The biggest catch here is that additional interpretation required bit. There's a bit of it required in normal operation, but for the degraded- mount case knowledge of root-issue #2 below is required for proper interpretation as well. Which brings us to root-issue #2: With btrfs raid1 the chunk-allocator policy forces allocation in pairs, with each chunk of the pair forced to a different device. Since the btrfs in question is raid1 (both data and metadata) with two devices when undegraded, loss of a single device and degraded-mount means the above chunk allocation policy cannot succeed as there's no second device available to write the mirror-chunk to. Note that the situation with a two-device raid1 but with one missing is rather different than with a three-device raid1 with one missing, as in the latter case and assuming there's still unallocated space left on all devices, a pair-chunk-allocation could still succeed, since it could still allocate one chunk-mirror on each of the two remaining devices. The critical bit to understand here is that (AFAIK), degraded-mount does *NOT* trigger a chunk-allocation-policy waiver, which means that with a two-device btrfs raid1 with a device-missing, no additional chunks can be allocated as the pair-chunks-at-a-time-allocated-on-different-devices policy cannot be filled. (Pardon my yelling, but this is the critical bit...) ** ON BTRFS RAID1, TWO DEVICES MUST BE PRESENT IN ORDERED TO ALLOCATE NEW CHUNKS. MOUNTING DEGRADED WITH A SINGLE DEVICE MEANS NO NEW CHUNK ALLOCATION, WHICH MEANS YOU'RE LIMITED TO FILLING UP EXISTING CHUNKS ** Conclusions in light of the above,
[PATCH 1/4] btrfs-progs: Remove fprintf() in find_mount_root().
find_mount_root() function in utils.c should not print error string. Caller should be responsible to print error string. This patch will remove the only fprintf in find_mount_root() and modify the caller a little to use strerror() to prompt users. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- cmds-receive.c | 4 ++-- cmds-send.c| 4 ++-- utils.c| 6 +- 3 files changed, 5 insertions(+), 9 deletions(-) diff --git a/cmds-receive.c b/cmds-receive.c index 48380a5..72afe2a 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -980,9 +980,9 @@ static int do_receive(struct btrfs_receive *r, const char *tomnt, int r_fd, ret = find_mount_root(dest_dir_full_path, r-root_path); if (ret 0) { - ret = -EINVAL; fprintf(stderr, ERROR: failed to determine mount point - for %s\n, dest_dir_full_path); + for %s: %s\n, dest_dir_full_path, strerror(-ret)); + ret = -EINVAL; goto out; } r-mnt_fd = open(r-root_path, O_RDONLY | O_NOATIME); diff --git a/cmds-send.c b/cmds-send.c index 9a73b32..48c3df4 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -356,9 +356,9 @@ static int init_root_path(struct btrfs_send *s, const char *subvol) ret = find_mount_root(subvol, s-root_path); if (ret 0) { - ret = -EINVAL; fprintf(stderr, ERROR: failed to determine mount point - for %s\n, subvol); + for %s: %s\n, subvol, strerror(-ret)); + ret = -EINVAL; goto out; } diff --git a/utils.c b/utils.c index 11250d9..2d0f18e 100644 --- a/utils.c +++ b/utils.c @@ -2422,12 +2422,8 @@ int find_mount_root(const char *path, char **mount_root) } endmntent(mnttab); - if (!longest_match) { - fprintf(stderr, - ERROR: Failed to find mount root for path %s.\n, - path); + if (!longest_match) return -ENOENT; - } ret = 0; *mount_root = realpath(longest_match, NULL); -- 2.0.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 4/4] btrfs-progs: Add mount point output for 'btrfs fi df'
Add mount point output for 'btrfs fi df'. Also since the patch uses find_mount_root() to find mount point, now 'btrfs fi df' can output more meaningful error message when given a non-btrfs path. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- changelog: v2: Call realpath() before find_mount_root() to deal with relative path v3: Only output mount point when get_df() successed. --- cmds-filesystem.c | 28 1 file changed, 28 insertions(+) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index 108d9b7..ca6bcad 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -187,6 +187,8 @@ static int cmd_filesystem_df(int argc, char **argv) int ret; int fd; char *path; + char *real_path = NULL; + char *mount_point = NULL; DIR *dirstream = NULL; if (check_argc_exact(argc, 2)) @@ -194,6 +196,29 @@ static int cmd_filesystem_df(int argc, char **argv) path = argv[1]; + real_path = realpath(path, NULL); + if (!real_path) { + fprintf(stderr, + ERROR: Failed to resolve real path for %s: %s\n, + path, strerror(errno)); + return 1; + } + ret = find_mount_root(real_path, mount_point); + if (ret 0) { + fprintf(stderr, + ERROR: failed to determine mount point for %s: %s\n, + path, strerror(-ret)); + free(real_path); + return 1; + } + if (ret 0) { + fprintf(stderr, + ERROR: %s does not belong to a btrfs mount point\n, + path); + free(real_path); + return 1; + } + fd = open_file_or_dir(path, dirstream); if (fd 0) { fprintf(stderr, ERROR: can't access '%s'\n, path); @@ -201,12 +226,15 @@ static int cmd_filesystem_df(int argc, char **argv) } ret = get_df(fd, sargs); if (!ret sargs) { + printf(Mounted on: %s\n, mount_point); print_df(sargs); free(sargs); } else { fprintf(stderr, ERROR: get_df failed %s\n, strerror(-ret)); } + free(real_path); + free(mount_point); close_file_or_dir(fd, dirstream); return !!ret; } -- 2.0.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] btrfs-progs: Fix wrong indent in btrfs-progs.
When editing cmds-filesystem.c, I found cmd_filesystem_df() uses 7 spaces as indent instead of 1 tab (or 8 spaces). which makes indent quite embarrassing. Such problem is especillay hard to detect when reviewing patches, since the leading '+' makes a tab only 7 spaces long, makeing 7 spaces look the same with a tab. This patch fixes all the 7 spaces indent. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- cmds-filesystem.c | 79 +++ ctree.h | 15 ++- utils.c | 10 +++ 3 files changed, 52 insertions(+), 52 deletions(-) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index bf87bbe..108d9b7 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -114,23 +114,23 @@ static const char * const filesystem_cmd_group_usage[] = { }; static const char * const cmd_filesystem_df_usage[] = { - btrfs filesystem df path, - Show space usage information for a mount point, - NULL + btrfs filesystem df path, + Show space usage information for a mount point, + NULL }; static void print_df(struct btrfs_ioctl_space_args *sargs) { - u64 i; - struct btrfs_ioctl_space_info *sp = sargs-spaces; - - for (i = 0; i sargs-total_spaces; i++, sp++) { - printf(%s, %s: total=%s, used=%s\n, - group_type_str(sp-flags), - group_profile_str(sp-flags), - pretty_size(sp-total_bytes), - pretty_size(sp-used_bytes)); - } + u64 i; + struct btrfs_ioctl_space_info *sp = sargs-spaces; + + for (i = 0; i sargs-total_spaces; i++, sp++) { + printf(%s, %s: total=%s, used=%s\n, + group_type_str(sp-flags), + group_profile_str(sp-flags), + pretty_size(sp-total_bytes), + pretty_size(sp-used_bytes)); + } } static int get_df(int fd, struct btrfs_ioctl_space_args **sargs_ret) @@ -183,33 +183,32 @@ static int get_df(int fd, struct btrfs_ioctl_space_args **sargs_ret) static int cmd_filesystem_df(int argc, char **argv) { - struct btrfs_ioctl_space_args *sargs = NULL; - int ret; - int fd; - char *path; - DIR *dirstream = NULL; - - if (check_argc_exact(argc, 2)) - usage(cmd_filesystem_df_usage); - - path = argv[1]; - - fd = open_file_or_dir(path, dirstream); - if (fd 0) { - fprintf(stderr, ERROR: can't access '%s'\n, path); - return 1; - } - ret = get_df(fd, sargs); - - if (!ret sargs) { - print_df(sargs); - free(sargs); - } else { - fprintf(stderr, ERROR: get_df failed %s\n, strerror(-ret)); - } - - close_file_or_dir(fd, dirstream); - return !!ret; + struct btrfs_ioctl_space_args *sargs = NULL; + int ret; + int fd; + char *path; + DIR *dirstream = NULL; + + if (check_argc_exact(argc, 2)) + usage(cmd_filesystem_df_usage); + + path = argv[1]; + + fd = open_file_or_dir(path, dirstream); + if (fd 0) { + fprintf(stderr, ERROR: can't access '%s'\n, path); + return 1; + } + ret = get_df(fd, sargs); + if (!ret sargs) { + print_df(sargs); + free(sargs); + } else { + fprintf(stderr, ERROR: get_df failed %s\n, strerror(-ret)); + } + + close_file_or_dir(fd, dirstream); + return !!ret; } static int match_search_item_kernel(__u8 *fsid, char *mnt, char *label, diff --git a/ctree.h b/ctree.h index 35d3633..83d85b3 100644 --- a/ctree.h +++ b/ctree.h @@ -939,10 +939,10 @@ struct btrfs_block_group_cache { }; struct btrfs_extent_ops { - int (*alloc_extent)(struct btrfs_root *root, u64 num_bytes, - u64 hint_byte, struct btrfs_key *ins); - int (*free_extent)(struct btrfs_root *root, u64 bytenr, - u64 num_bytes); + int (*alloc_extent)(struct btrfs_root *root, u64 num_bytes, + u64 hint_byte, struct btrfs_key *ins); + int (*free_extent)(struct btrfs_root *root, u64 bytenr, + u64 num_bytes); }; struct btrfs_device; @@ -2117,9 +2117,10 @@ BTRFS_SETGET_STACK_FUNCS(stack_qgroup_limit_rsv_exclusive, static inline u32 btrfs_file_extent_inline_item_len(struct extent_buffer *eb, struct btrfs_item *e) { - unsigned long offset; - offset = offsetof(struct btrfs_file_extent_item, disk_bytenr); - return btrfs_item_size(eb, e) - offset; + unsigned long offset; + + offset = offsetof(struct btrfs_file_extent_item, disk_bytenr); + return btrfs_item_size(eb, e) - offset; } /* this returns the number of file bytes represented by the
[PATCH v2 2/4] btrfs-progs: Check fstype in find_mount_root()
When calling find_mount_root(), caller in fact wants to find the mount point of *BTRFS*. So also check ent-fstype in find_mount_root() and do special error string output in caller. This will suppress a lot of Inapproiate ioctl for device error message. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- changelog: v2: move error message to caller. --- cmds-receive.c | 7 +++ cmds-send.c | 14 ++ cmds-subvolume.c | 7 +++ utils.c | 9 + 4 files changed, 37 insertions(+) diff --git a/cmds-receive.c b/cmds-receive.c index 72afe2a..0644b59 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -985,6 +985,13 @@ static int do_receive(struct btrfs_receive *r, const char *tomnt, int r_fd, ret = -EINVAL; goto out; } + if (ret 0) { + fprintf(stderr, + ERROR: %s doesn't belong to btrfs mount point\n, + dest_dir_full_path); + ret = -EINVAL; + goto out; + } r-mnt_fd = open(r-root_path, O_RDONLY | O_NOATIME); if (r-mnt_fd 0) { ret = -errno; diff --git a/cmds-send.c b/cmds-send.c index 48c3df4..d6b1855 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -361,6 +361,13 @@ static int init_root_path(struct btrfs_send *s, const char *subvol) ret = -EINVAL; goto out; } + if (ret 0) { + fprintf(stderr, + ERROR: %s doesn't belong to btrfs mount point\n, + subvol); + ret = -EINVAL; + goto out; + } s-mnt_fd = open(s-root_path, O_RDONLY | O_NOATIME); if (s-mnt_fd 0) { @@ -628,6 +635,13 @@ int cmd_send(int argc, char **argv) strerror(-ret)); goto out; } + if (ret 0) { + fprintf(stderr, + ERROR: %s doesn't belong to btrfs mount point\n, + subvol); + ret = -EINVAL; + goto out; + } if (strcmp(send.root_path, mount_root) != 0) { ret = -EINVAL; fprintf(stderr, ERROR: all subvols must be from the diff --git a/cmds-subvolume.c b/cmds-subvolume.c index 639fb10..64a66e3 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -986,6 +986,13 @@ static int cmd_subvol_show(int argc, char **argv) %s\n, fullpath, strerror(-ret)); goto out; } + if (ret 0) { + fprintf(stderr, + ERROR: %s doesn't belong to btrfs mount point\n, + fullpath); + ret = -EINVAL; + goto out; + } ret = 1; svpath = get_subvol_name(mnt, fullpath); diff --git a/utils.c b/utils.c index 2d0f18e..b96d5b4 100644 --- a/utils.c +++ b/utils.c @@ -2390,6 +2390,9 @@ int lookup_ino_rootid(int fd, u64 *rootid) return 0; } +/* return 0 if a btrfs mount point if found + * return 1 if a mount point is found but not btrfs + * return 0 if something goes wrong */ int find_mount_root(const char *path, char **mount_root) { FILE *mnttab; @@ -2397,6 +2400,7 @@ int find_mount_root(const char *path, char **mount_root) struct mntent *ent; int len; int ret; + int not_btrfs; int longest_matchlen = 0; char *longest_match = NULL; @@ -2417,6 +2421,7 @@ int find_mount_root(const char *path, char **mount_root) free(longest_match); longest_matchlen = len; longest_match = strdup(ent-mnt_dir); + not_btrfs = strcmp(ent-mnt_type, btrfs); } } } @@ -2424,6 +2429,10 @@ int find_mount_root(const char *path, char **mount_root) if (!longest_match) return -ENOENT; + if (not_btrfs) { + free(longest_match); + return 1; + } ret = 0; *mount_root = realpath(longest_match, NULL); -- 2.0.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html