[PATCH] btrfs-progs: Add mount point output for 'btrfs fi df' command.
Add mount point output for 'btrfs fi df'. Also since the patch uses find_mount_root() to find mount point, now 'btrfs fi df' can output more meaningful error message when given a non-btrfs path. Signed-off-by: Qu Wenruo --- This patch needs to be merged after the following path: btrfs-progs: Check fstype in find_mount_root() --- cmds-filesystem.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index 4b2d27e..d571765 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -187,12 +187,22 @@ static int cmd_filesystem_df(int argc, char **argv) int ret; int fd; char *path; + char *mount_point = NULL; DIR *dirstream = NULL; if (check_argc_exact(argc, 2)) usage(cmd_filesystem_df_usage); path = argv[1]; + ret = find_mount_root(path, &mount_point); + if (ret < 0) { + if (ret != -ENOENT) + fprintf(stderr, "ERROR: Failed to find mount root for path %s: %s\n", + path, strerror(-ret)); + return 1; + } + printf("Mounted on: %s\n", mount_point); + free(mount_point); fd = open_file_or_dir(path, &dirstream); if (fd < 0) { -- 2.0.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] btrfs: code optimize use btrfs_get_bdev_and_sb() at btrfs_scan_one_device
On Tue, 8 Jul 2014 12:08:19 +0800, Liu Bo wrote: > On Tue, Jul 08, 2014 at 02:38:37AM +0800, Anand Jain wrote: >> (for review comments pls). >> >> btrfs_scan_one_device() needs SB, instead of doing it from scratch could >> use btrfs_get_bdev_and_sb() >> >> Signed-off-by: Anand Jain >> --- >> fs/btrfs/volumes.c | 51 ++- >> 1 file changed, 6 insertions(+), 45 deletions(-) >> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >> index c166355..94e6131 100644 >> --- a/fs/btrfs/volumes.c >> +++ b/fs/btrfs/volumes.c >> @@ -1053,14 +1053,11 @@ int btrfs_scan_one_device(const char *path, fmode_t >> flags, void *holder, >> { >> struct btrfs_super_block *disk_super; >> struct block_device *bdev; >> -struct page *page; >> -void *p; >> int ret = -EINVAL; >> u64 devid; >> u64 transid; >> u64 total_devices; >> -u64 bytenr; >> -pgoff_t index; >> +struct buffer_head *bh; >> >> /* >> * we would like to check all the supers, but that would make >> @@ -1068,44 +1065,12 @@ int btrfs_scan_one_device(const char *path, fmode_t >> flags, void *holder, >> * So, we need to add a special mount option to scan for >> * later supers, using BTRFS_SUPER_MIRROR_MAX instead >> */ >> -bytenr = btrfs_sb_offset(0); >> mutex_lock(&uuid_mutex); >> >> -bdev = blkdev_get_by_path(path, flags, holder); >> - >> -if (IS_ERR(bdev)) { >> -ret = PTR_ERR(bdev); >> +ret = btrfs_get_bdev_and_sb(path, flags, holder, 0, &bdev, &bh); >> +if (ret) >> goto error; >> -} >> - >> -/* make sure our super fits in the device */ >> -if (bytenr + PAGE_CACHE_SIZE >= i_size_read(bdev->bd_inode)) >> -goto error_bdev_put; >> - >> -/* make sure our super fits in the page */ >> -if (sizeof(*disk_super) > PAGE_CACHE_SIZE) >> -goto error_bdev_put; >> - >> -/* make sure our super doesn't straddle pages on disk */ >> -index = bytenr >> PAGE_CACHE_SHIFT; >> -if ((bytenr + sizeof(*disk_super) - 1) >> PAGE_CACHE_SHIFT != index) >> -goto error_bdev_put; > > Apparently btrfs_get_bdev_and_sb() lacks the above two checks, otherwise > looks good. In fact, our disk_super size is constant and <= min page size (4K), and we are sure that it is impossible that the super block is cross the block, so the above two checks are unnecessary. Thanks Miao > > thanks, > -liubo > >> - >> -/* pull in the page with our super */ >> -page = read_cache_page_gfp(bdev->bd_inode->i_mapping, >> - index, GFP_NOFS); >> - >> -if (IS_ERR_OR_NULL(page)) >> -goto error_bdev_put; >> - >> -p = kmap(page); >> - >> -/* align our pointer to the offset of the super block */ >> -disk_super = p + (bytenr & ~PAGE_CACHE_MASK); >> - >> -if (btrfs_super_bytenr(disk_super) != bytenr || >> -btrfs_super_magic(disk_super) != BTRFS_MAGIC) >> -goto error_unmap; >> +disk_super = (struct btrfs_super_block *) bh->b_data; >> >> devid = btrfs_stack_device_id(&disk_super->dev_item); >> transid = btrfs_super_generation(disk_super); >> @@ -1125,13 +1090,9 @@ int btrfs_scan_one_device(const char *path, fmode_t >> flags, void *holder, >> printk(KERN_CONT "devid %llu transid %llu %s\n", devid, >> transid, path); >> } >> >> - >> -error_unmap: >> -kunmap(page); >> -page_cache_release(page); >> - >> -error_bdev_put: >> +brelse(bh); >> blkdev_put(bdev, flags); >> + >> error: >> mutex_unlock(&uuid_mutex); >> return ret; >> -- >> 2.0.0.257.g75cc6c6 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] btrfs: code optimize use btrfs_get_bdev_and_sb() at btrfs_scan_one_device
On Tue, Jul 08, 2014 at 02:38:37AM +0800, Anand Jain wrote: > (for review comments pls). > > btrfs_scan_one_device() needs SB, instead of doing it from scratch could > use btrfs_get_bdev_and_sb() > > Signed-off-by: Anand Jain > --- > fs/btrfs/volumes.c | 51 ++- > 1 file changed, 6 insertions(+), 45 deletions(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index c166355..94e6131 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -1053,14 +1053,11 @@ int btrfs_scan_one_device(const char *path, fmode_t > flags, void *holder, > { > struct btrfs_super_block *disk_super; > struct block_device *bdev; > - struct page *page; > - void *p; > int ret = -EINVAL; > u64 devid; > u64 transid; > u64 total_devices; > - u64 bytenr; > - pgoff_t index; > + struct buffer_head *bh; > > /* >* we would like to check all the supers, but that would make > @@ -1068,44 +1065,12 @@ int btrfs_scan_one_device(const char *path, fmode_t > flags, void *holder, >* So, we need to add a special mount option to scan for >* later supers, using BTRFS_SUPER_MIRROR_MAX instead >*/ > - bytenr = btrfs_sb_offset(0); > mutex_lock(&uuid_mutex); > > - bdev = blkdev_get_by_path(path, flags, holder); > - > - if (IS_ERR(bdev)) { > - ret = PTR_ERR(bdev); > + ret = btrfs_get_bdev_and_sb(path, flags, holder, 0, &bdev, &bh); > + if (ret) > goto error; > - } > - > - /* make sure our super fits in the device */ > - if (bytenr + PAGE_CACHE_SIZE >= i_size_read(bdev->bd_inode)) > - goto error_bdev_put; > - > - /* make sure our super fits in the page */ > - if (sizeof(*disk_super) > PAGE_CACHE_SIZE) > - goto error_bdev_put; > - > - /* make sure our super doesn't straddle pages on disk */ > - index = bytenr >> PAGE_CACHE_SHIFT; > - if ((bytenr + sizeof(*disk_super) - 1) >> PAGE_CACHE_SHIFT != index) > - goto error_bdev_put; Apparently btrfs_get_bdev_and_sb() lacks the above two checks, otherwise looks good. thanks, -liubo > - > - /* pull in the page with our super */ > - page = read_cache_page_gfp(bdev->bd_inode->i_mapping, > -index, GFP_NOFS); > - > - if (IS_ERR_OR_NULL(page)) > - goto error_bdev_put; > - > - p = kmap(page); > - > - /* align our pointer to the offset of the super block */ > - disk_super = p + (bytenr & ~PAGE_CACHE_MASK); > - > - if (btrfs_super_bytenr(disk_super) != bytenr || > - btrfs_super_magic(disk_super) != BTRFS_MAGIC) > - goto error_unmap; > + disk_super = (struct btrfs_super_block *) bh->b_data; > > devid = btrfs_stack_device_id(&disk_super->dev_item); > transid = btrfs_super_generation(disk_super); > @@ -1125,13 +1090,9 @@ int btrfs_scan_one_device(const char *path, fmode_t > flags, void *holder, > printk(KERN_CONT "devid %llu transid %llu %s\n", devid, > transid, path); > } > > - > -error_unmap: > - kunmap(page); > - page_cache_release(page); > - > -error_bdev_put: > + brelse(bh); > blkdev_put(bdev, flags); > + > error: > mutex_unlock(&uuid_mutex); > return ret; > -- > 2.0.0.257.g75cc6c6 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Revert "btrfs: allow mounting btrfs subvolumes with different ro/rw options"
On 07/08/2014 04:43 AM, Duncan wrote: > The remaining problem to deal with is that if say the root subvol (id=5) > is mounted rw,subvolmode=rw, while a subvolume below it is mounted > subvolmode=ro, then what happens if someone tries to make an edit in the > portion of the filesystem visible in the subvolume, but from the parent, > id=5/root in this case? Obviously if that modification is allowed from > the parent, it'll change what's visible in the child subvolume as well, > which would be rather unexpected. The ro/rw status is a subvolume flag. So if a subvolume is marked rw (or ro), is writable (not writable) in all the mount(S) This flag is not inheritable. What could be strange is the following: # mount -o subvolid=5,rw /dev/sda1 /mnt/btrfs-root # btrfs subvol create /mnt/btrfs-root/subvolname/ then # touch /mnt/btrfs-root/subvolname/touch-file succeeds; but # mount -o subvolid=5,rw /dev/sda1 /mnt/btrfs-root # btrfs subvol create /mnt/btrfs-root/subvolname/ # mount -o subvol=subvolname,ro /dev/sda1 /mnt/btrfs-subvol then # touch /mnt/btrfs-root/subvolname/touch-file2 fails. -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] btrfs: code optimize use btrfs_get_bdev_and_sb() at btrfs_scan_one_device
On Tue, 8 Jul 2014 02:38:37 +0800, Anand Jain wrote: > (for review comments pls). > > btrfs_scan_one_device() needs SB, instead of doing it from scratch could > use btrfs_get_bdev_and_sb() > > Signed-off-by: Anand Jain > --- > fs/btrfs/volumes.c | 51 ++- > 1 file changed, 6 insertions(+), 45 deletions(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index c166355..94e6131 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -1053,14 +1053,11 @@ int btrfs_scan_one_device(const char *path, fmode_t > flags, void *holder, > { > struct btrfs_super_block *disk_super; > struct block_device *bdev; > - struct page *page; > - void *p; > int ret = -EINVAL; > u64 devid; > u64 transid; > u64 total_devices; > - u64 bytenr; > - pgoff_t index; > + struct buffer_head *bh; > > /* >* we would like to check all the supers, but that would make > @@ -1068,44 +1065,12 @@ int btrfs_scan_one_device(const char *path, fmode_t > flags, void *holder, >* So, we need to add a special mount option to scan for >* later supers, using BTRFS_SUPER_MIRROR_MAX instead >*/ > - bytenr = btrfs_sb_offset(0); > mutex_lock(&uuid_mutex); > > - bdev = blkdev_get_by_path(path, flags, holder); > - > - if (IS_ERR(bdev)) { > - ret = PTR_ERR(bdev); > + ret = btrfs_get_bdev_and_sb(path, flags, holder, 0, &bdev, &bh); > + if (ret) > goto error; > - } > - > - /* make sure our super fits in the device */ > - if (bytenr + PAGE_CACHE_SIZE >= i_size_read(bdev->bd_inode)) > - goto error_bdev_put; I think moving this check into btrfs_get_bdev_and_sb is better. The other is OK. Thanks Miao > - > - /* make sure our super fits in the page */ > - if (sizeof(*disk_super) > PAGE_CACHE_SIZE) > - goto error_bdev_put; > - > - /* make sure our super doesn't straddle pages on disk */ > - index = bytenr >> PAGE_CACHE_SHIFT; > - if ((bytenr + sizeof(*disk_super) - 1) >> PAGE_CACHE_SHIFT != index) > - goto error_bdev_put; > - > - /* pull in the page with our super */ > - page = read_cache_page_gfp(bdev->bd_inode->i_mapping, > -index, GFP_NOFS); > - > - if (IS_ERR_OR_NULL(page)) > - goto error_bdev_put; > - > - p = kmap(page); > - > - /* align our pointer to the offset of the super block */ > - disk_super = p + (bytenr & ~PAGE_CACHE_MASK); > - > - if (btrfs_super_bytenr(disk_super) != bytenr || > - btrfs_super_magic(disk_super) != BTRFS_MAGIC) > - goto error_unmap; > + disk_super = (struct btrfs_super_block *) bh->b_data; > > devid = btrfs_stack_device_id(&disk_super->dev_item); > transid = btrfs_super_generation(disk_super); > @@ -1125,13 +1090,9 @@ int btrfs_scan_one_device(const char *path, fmode_t > flags, void *holder, > printk(KERN_CONT "devid %llu transid %llu %s\n", devid, > transid, path); > } > > - > -error_unmap: > - kunmap(page); > - page_cache_release(page); > - > -error_bdev_put: > + brelse(bh); > blkdev_put(bdev, flags); > + > error: > mutex_unlock(&uuid_mutex); > return ret; > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Revert "btrfs: allow mounting btrfs subvolumes with different ro/rw options"
Goffredo Baroncelli posted on Mon, 07 Jul 2014 19:37:53 +0200 as excerpted: > For "mounted RO" I mean the VFS flag, the "one" passed via the mount > command. I say "one" as 1, because I am convinced that it has to act > globally, > e.g. on the whole filesystem; the flag should be set at the first mount, > then it can be changed (only ?) issuing a "mount -o remount,rw/ro" [...] > So for each filesystem, there is a "globally" flag ro/rw which acts on > the whole filesystem. Clear and simple. > > Step 2: a more fine grained control of the subvolumes. > We have already the capability to make a subvolume read-only/read-write > doing > ># btrfs property set -t s /path/to/subvolume ro true > > or > ># btrfs property set -t s /path/to/subvolume ro false > > My idea is to use this flag. It could be done at the mount time for > example: > > # mount -o subvolmode=ro,subvol=subvolname /dev/sda1 / > > (this example doesn't work, it is only a my idea) > > So: > - we should not add further code > - the semantic is simple > - the property is linked to the subvolume in a understandable way > > We should only add the subvolmode=ro option to the mount command. > > Further discussion need to investigate the following cases: > - if the filesystem is mounted as ro (mount -o ro), does mounting a > subvolume rw ( mount -o subvolmode=rw...) should raise an error ? > (IMHO yes) > - if the filesystem is mounted as ro (mount -o ro), does mounting > the filesystem a 2nd time rw ( mount -o rw...) should raise an error ? > (IMHO yes) > - if a subvolume is mounter rw (or ro), does mounting the same subvolume > a 2nd time as ro (or rw) should raise an error ? > (IMHO yes) Makes sense. Assuming I'm following you correctly, then, no subvolumes could be rw if the filesystem/vfs flag was set ro. Which would mean that in ordered to mount any particular subvolume rw, the whole filesystem would have to be rw. Extending now: For simplicity and backward compatibility, if subvolmode isn't set, it corresponds to the whole-fs/vfs mode. That way, setting mount -o ro,... (or -o rw,...) with the first mount would naturally propagate to all subsequent subvolume mounts, unless of course (1) all subvolumes and the filesystem itself are umounted, after which a new mount would be the first one again, or (2) a mount -o remount,... is done that changes the whole-fs mode. Further, if it happened that one wanted the first subvolume mounted to be ro, but the filesystem as a whole rw so that other subvolumes could be mounted rw, the following would accomplish that: mount -o rw,subvolmode=ro That way, the subvol would be ro as desired, but the filesystem as a whole would be rw, so other subvolumes could be successfully mounted rw. I like the concept. =:^) The remaining problem to deal with is that if say the root subvol (id=5) is mounted rw,subvolmode=rw, while a subvolume below it is mounted subvolmode=ro, then what happens if someone tries to make an edit in the portion of the filesystem visible in the subvolume, but from the parent, id=5/root in this case? Obviously if that modification is allowed from the parent, it'll change what's visible in the child subvolume as well, which would be rather unexpected. I'd suggest that the snapshotting border rule should apply to writes as well. Snapshots stop at subvolume borders, and writes should as well. Attempting to write in a child subvolume should error out -- child subvolumes are not part of a parent snapshot and shouldn't be writable from the parent subvolume, either. Child-subvolume content should be read-only because it's beyond the subvolume border. That would seem to be the safest. Altho I believe it's a change from current behavior, where it's possible to write into any subvolume visible from the parent (that is, not covered by an over-mount, perhaps even of the same subvolume that would otherwise be visible in the same location from the parent subvolume), provided the parent is writable. Regardless, my biggest take-away from the discussion so far is that I'm glad I decided to go with entirely separate filesystems, each on their own partitions, so my ro vs writable mounts do exactly what I expect them to do without me having to worry or think about it too much! That wasn't the reason I did it -- I did it because I didn't want all my data eggs in the same whole-filesystem basket such that if a filesystem was damaged, the damage was compartmentalized -- but now that see all the subvolume rw/ ro implications discussed in this thread, I'm VERY glad I personally don't have to worry about it, and it all simply "just works" for me, because each filesystem is independent of the others, not simply a subvolume! -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe
Re: [PATCH V2 7/9] btrfs: fix null pointer dereference in clone_fs_devices when name is null
On Mon, 7 Jul 2014 17:56:13 +0800, Anand Jain wrote: > > > On 07/07/2014 12:22, Miao Xie wrote: >> On Mon, 7 Jul 2014 12:04:09 +0800, Anand Jain wrote: when one of the device path is missing btrfs_device name is null. So this patch will check for that. stack: BUG: unable to handle kernel NULL pointer dereference at 0010 IP: [] strlen+0x0/0x30 [] ? clone_fs_devices+0xaa/0x160 [btrfs] [] btrfs_init_new_device+0x317/0xca0 [btrfs] [] ? __kmalloc_track_caller+0x15a/0x1a0 [] btrfs_ioctl+0xaa3/0x2860 [btrfs] [] ? handle_mm_fault+0x48c/0x9c0 [] ? __blkdev_put+0x171/0x180 [] ? __do_page_fault+0x4ac/0x590 [] ? blkdev_put+0x106/0x110 [] ? mntput+0x35/0x40 [] do_vfs_ioctl+0x460/0x4a0 [] ? fput+0xe/0x10 [] ? task_work_run+0xb3/0xd0 [] SyS_ioctl+0x57/0x90 [] ? do_page_fault+0xe/0x10 [] system_call_fastpath+0x16/0x1b reproducer: mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2 btrfstune -S 1 /dev/sdg1 modprobe -r btrfs && modprobe btrfs mount -o degraded /dev/sdg1 /btrfs btrfs dev add /dev/sdg3 /btrfs Signed-off-by: Anand Jain Signed-off-by: Miao Xie --- Changelog v1->v2: - Fix the problem that we forgot to set the missing flag for the cloned device --- fs/btrfs/volumes.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1891541..4731bd6 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -598,16 +598,23 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig) if (IS_ERR(device)) goto error; -/* - * This is ok to do without rcu read locked because we hold the - * uuid mutex so nothing we touch in here is going to disappear. - */ -name = rcu_string_strdup(orig_dev->name->str, GFP_NOFS); -if (!name) { -kfree(device); -goto error; +if (orig_dev->missing) { +device->missing = 1; +fs_devices->missing_devices++; >>> >>> as mentioned in some places we just check name (for missing device) >>> and don't set the missing flag so it better to .. >>> >>> if (orig_dev->missing || !orig_dev->name) { >>> device->missing = 1; >>> fs_devices->missing_devices++; >> >> I don't think we need check name pointer here because only missing device >> doesn't have its own name. Or there is something wrong in the code, so >> I add assert in else branch. Am I right? > > At few critical code, the below and I guess in the chunk/strips > function as well, we don't make use of missing flag, but rather > ->name. > > - > btrfsic_process_superblock > :: > if (!device->bdev || !device->name) > continue; > - > > But here without !orig_dev->name check, is also good enough. Right. According to the code, only missing device doesn't have its own name, that is we can check the device is a missing device or not by missing flag or its name pointer. Maybe we can remove missing flag, check the device just by its name pointer(In order to make the code be more readable, maybe we need introduce a function to wrap the missing device check) Thanks Miao > > Thanks, Anand > > +} else { +ASSERT(orig_dev->name); +/* + * This is ok to do without rcu read locked because + * we hold the uuid mutex so nothing we touch in here + * is going to disappear. + */ +name = rcu_string_strdup(orig_dev->name->str, GFP_NOFS); +if (!name) { +kfree(device); +goto error; +} +rcu_assign_pointer(device->name, name); } -rcu_assign_pointer(device->name, name); list_add(&device->dev_list, &fs_devices->devices); device->fs_devices = fs_devices; >>> >>> Thanks, Anand >>> . >>> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > . > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs-progs: Add mount point check for 'btrfs fi df' command
Original Message Subject: Re: [PATCH 2/2] btrfs-progs: Add mount point check for 'btrfs fi df' command From: Vikram Goyal To: linux-btrfs@vger.kernel.org Date: 2014年07月07日 17:51 On Fri, Jul 04, 2014 at 03:52:26PM +0200, David Sterba wrote: On Fri, Jul 04, 2014 at 04:38:49PM +0800, Qu Wenruo wrote: 'btrfs fi df' command is currently able to be executed on any file/dir inside btrfs since it uses btrfs ioctl to get disk usage info. However it is somewhat confusing for some end users since normally such command should only be executed on a mount point. I disagree here, it's much more convenient to run 'fi df' anywhere and get the output. The system 'df' command works the same way. Just to clarify, in case my earlier mail did not convey the idea properly. The basic difference between traditional df & btrfs fi df is that traditional df does not errors out when no arg is given & outputs all the mounted FSes with their mount points. So to be consistent, btrfs fi df should output all BTRFSes with mount points if no arg is given. Btrfs fi df insists for an arg but does not clarifies in its output if the given arg is a path inside of a mount point or is the mount point itself, which can become transparent, if the mount point is also shown in the output. IMO this is much better. Cc David. What about this idea? No extra warning but output the mount point? Since if calling find_mount_root(), it will check whether the mount point is btrfs, which can provide more meaningful error message than the original "ERROR: couldn't get space info - Inappropriate ioctl for device" error message. Thanks, Qu This is a just a request & a pointer to an oversight/anomaly but if the developers do not feel in resonance with it right now then I just wish that they keep it in mind, think about it & remove this confusion caused by btrfs fi df as,when & how they feel fit. The 'fi df' command itself is not that user friendly and the numbers need further interpretation. I'm using it heavily during debugging and restricting it to the mountpoint seems too artifical, the tool can cope with that. The 'fi usage' is supposed to give the user-friendly overview, but the patchset is stuck because I found the numbers wrong or misleading under some circumstances. I'll reread the thread that motivated this patch to see if there's something to address. Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mount time of multi-disk arrays
On 7/7/2014 5:24 μμ, André-Sebastian Liebe wrote: On 07/07/2014 03:54 PM, Konstantinos Skarlatos wrote: On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: Hello List, can anyone tell me how much time is acceptable and assumable for a multi-disk btrfs array with classical hard disk drives to mount? I'm having a bit of trouble with my current systemd setup, because it couldn't mount my btrfs raid anymore after adding the 5th drive. With the 4 drive setup it failed to mount once in a few times. Now it fails everytime because the default timeout of 1m 30s is reached and mount is aborted. My last 10 manual mounts took between 1m57s and 2m12s to finish. I have the exact same problem, and have to manually mount my large multi-disk btrfs filesystems, so I would be interested in a solution as well. Hi Konstantinos , you can workaround this by manual creating a systemd mount unit. - First review the autogenerated systemd mount unit (systemctl show .mount). You you can get the unit name by issuing a 'systemctl' and look for your failed mount. - Then you have to take the needed values (After, Before, Conflicts, RequiresMountsFor, Where, What, Options, Type, Wantedby) and put them into an new systemd mount unit file (possibly under /usr/lib/systemd/system/.mount ). - Now just add the TimeoutSec with a large enough value below [Mount]. - If you later want to automount you raid, add the WantedBy under [Install] - now issue a 'systemctl daemon-reload' and look for error messages in syslog. - If there are no errors you could enable your manual mount entry by 'systemctl enable .mount' and safely comment out your old fstab entry (systemd no longer generates autogenerated units). -- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- [Unit] Description=Mount /data/pool0 After=dev-disk-by\x2duuid-066141c6\x2d16ca\x2d4a30\x2db55c\x2de606b90ad0fb.device systemd-journald.socket local-fs-pre.target system.slice -.mount Before=umount.target Conflicts=umount.target RequiresMountsFor=/data /dev/disk/by-uuid/066141c6-16ca-4a30-b55c-e606b90ad0fb [Mount] Where=/data/pool0 What=/dev/disk/by-uuid/066141c6-16ca-4a30-b55c-e606b90ad0fb Options=rw,relatime,skip_balance,compress Type=btrfs TimeoutSec=3min [Install] WantedBy=dev-disk-by\x2duuid-066141c6\x2d16ca\x2d4a30\x2db55c\x2de606b90ad0fb.device -- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- Hi André, This unit file works for me, thank you for creating it! Can somebody put it on the wiki? My hardware setup contains a - Intel Core i7 4770 - Kernel 3.15.2-1-ARCH - 32GB RAM - dev 1-4 are 4TB Seagate ST4000DM000 (5900rpm) - dev 5 is a 4TB Wstern Digital WDC WD40EFRX (5400rpm) Thanks in advance André-Sebastian Liebe -- # btrfs fi sh Label: 'apc01_pool0' uuid: 066141c6-16ca-4a30-b55c-e606b90ad0fb Total devices 5 FS bytes used 14.21TiB devid1 size 3.64TiB used 2.86TiB path /dev/sdd devid2 size 3.64TiB used 2.86TiB path /dev/sdc devid3 size 3.64TiB used 2.86TiB path /dev/sdf devid4 size 3.64TiB used 2.86TiB path /dev/sde devid5 size 3.64TiB used 2.88TiB path /dev/sdb Btrfs v3.14.2-dirty # btrfs fi df /data/pool0/ Data, single: total=14.28TiB, used=14.19TiB System, RAID1: total=8.00MiB, used=1.54MiB Metadata, RAID1: total=26.00GiB, used=20.20GiB unknown, single: total=512.00MiB, used=0.00 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Konstantinos Skarlatos -- André-Sebastian Liebe -- Konstantinos Skarlatos -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mount time of multi-disk arrays
On 7/7/2014 6:48 μμ, Duncan wrote: Konstantinos Skarlatos posted on Mon, 07 Jul 2014 16:54:05 +0300 as excerpted: On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: can anyone tell me how much time is acceptable and assumable for a multi-disk btrfs array with classical hard disk drives to mount? I'm having a bit of trouble with my current systemd setup, because it couldn't mount my btrfs raid anymore after adding the 5th drive. With the 4 drive setup it failed to mount once in a few times. Now it fails everytime because the default timeout of 1m 30s is reached and mount is aborted. My last 10 manual mounts took between 1m57s and 2m12s to finish. I have the exact same problem, and have to manually mount my large multi-disk btrfs filesystems, so I would be interested in a solution as well. I don't have a direct answer, as my btrfs devices are all SSD, but... a) Btrfs, like some other filesystems, is designed not to need a pre-mount (or pre-rw-mount) fsck, because it does what /should/ be a quick-scan at mount-time. However, that isn't always as quick as it might be for a number of reasons: a1) Btrfs is still a relatively immature filesystem and certain operations are not yet optimized. In particular, multi-device btrfs operations tend to still be using a first-working-implementation type of algorithm instead of a well optimized for parallel operation algorithm, and thus often serialize access to multiple devices where a more optimized algorithm would parallelize operations across multiple devices at the same time. That will come, but it's not there yet. a2) Certain operations such as orphan cleanup ("orphans" are files that were deleted while they were in use and thus weren't fully deleted at the time; if they were still in use at unmount (remount-read-only), cleanup is done at mount-time) can delay mount as well. a3) Inode_cache mount option: Don't use this unless you can explain exactly WHY you are using it, preferably backed up with benchmark numbers, etc. It's useful only on 32-bit, generally high-file-activity server systems and has general-case problems, including long mount times and possible overflow issues that make it inappropriate for normal use. Unfortunately there's a lot of people out there using it that shouldn't be, and I even saw it listed on at least one distro (not mine!) wiki. =:^( a4) The space_cache mount option OTOH *IS* appropriate for normal use (and is in fact enabled by default these days), but particularly in improper shutdown cases can require rebuilding at mount time -- altho this should happen /after/ mount, the system will just be busy for some minutes, until the space-cache is rebuilt. But the IO from a space_cache rebuild on one filesystem could slow down the mounting of filesystems that mount after it, as well as the boot-time launching of other post- mount launched services. If you're seeing the time go up dramatically with the addition of more filesystem devices, however, and you do /not/ have inode_cache active, I'd guess it's mainly the not-yet-optimized multi-device operations. b) As with any systemd launched unit, however, there's systemd configuration mechanisms for working around specific unit issues, including timeout issues. Of course most systems continue to use fstab and let systemd auto-generate the mount units, and in fact that is recommended, but either with fstab or directly created mount units, there's a timeout configuration option that can be set. b1) The general systemd *.mount unit [Mount] section option appears to be TimeoutSec=. As is usual with systemd times, the default is seconds, or pass the unit(s, like "5min 20s"). b2) I don't see it /specifically/ stated, but with a bit of reading between the lines, the corresponding fstab option appears to be either x-systemd.timeoutsec= or x-systemd.TimeoutSec= (IOW I'm not sure of the case). You may also want to try x-systemd.device-timeout=, which /is/ specifically mentioned, altho that appears to be specifically the timeout for the device to appear, NOT for the filesystem to mount after it does. b3) See the systemd.mount (5) and systemd-fstab-generator (8) manpages for more, that being what the above is based on. Thanks for your detailed answer. A mount unit with a larger timeout works fine, maybe we should tell distro maintainers to up the limit for btrfs to 5 minutes or so? In my experience, mount time definitely grows as the filesystem grows older, and times out after snapshot count gets more than 500-1000 . I guess thats something that can be optimized in the future, but i believe stability is a much more urgent need now... So it might take a bit of experimentation to find the exact command, but based on the above anyway, it /should/ be pretty easy to tell systemd to wait a bit longer for that filesystem. When you find the right invocation, please reply with it here, as I'm sure there's others who will benefit as well. FWIW, I'm still on reiserfs for my spinning
Re: [v3.10.y][v3.11.y][v3.12.y][v3.13.y][v3.14.y][PATCH 1/1][V2] ALSA: usb-audio: Prevent printk ratelimiting from spamming kernel log while DEBUG not defined
On Sat, Jun 21, 2014 at 12:48:27PM -0700, Greg KH wrote: > On Sat, Jun 21, 2014 at 01:05:53PM +0100, Ben Hutchings wrote: > > On Fri, 2014-06-20 at 14:21 -0400, Joseph Salisbury wrote: > > [...] > > > I looked at this some more. It seems like my v2 backport may be the > > > most suitable for the releases mentioned in the subject line, but I'd > > > like to get additional feedback. > > > > > > The lines added by commit a5065eb just get removed by commit b7a77235. > > > Also, if I apply commit a5065eb, it will also require a backport to pull > > > in just a piece of code(Remove snd_printk() and add dev_dbg()) from > > > another prior commit(0ba41d9). No backport would be needed at all if I > > > cherry-pick 0ba41d9, but that commit seems to have too may changes for a > > > stable release. > > > > Keep the changes squashed together if you like, but do include both > > commit hashes and commit messages. > > No, I don't want to see "squashed together" patches, please keep them as > close to the original patch as possible. It saves time in the long run, > trust me... And since no one did this work for me, I had to do it myself... {grumble} -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: __btrfs_mod_ref should always use no_quota
From: Josef Bacik Before I extended the no_quota arg to btrfs_dec/inc_ref because I didn't understand how snapshot delete was using it and assumed that we needed the quota operations there. With Mark's work this has turned out to be not the case, we _always_ need to use no_quota for btrfs_dec/inc_ref, so just drop the argument and make __btrfs_mod_ref call it's process function with no_quota set always. Thanks, Signed-off-by: Josef Bacik Signed-off-by: Mark Fasheh --- fs/btrfs/ctree.c | 20 ++-- fs/btrfs/ctree.h | 4 ++-- fs/btrfs/extent-tree.c | 24 +++- 3 files changed, 23 insertions(+), 25 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index d99d965..d9e0ce0 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -280,9 +280,9 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans, WARN_ON(btrfs_header_generation(buf) > trans->transid); if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID) - ret = btrfs_inc_ref(trans, root, cow, 1, 1); + ret = btrfs_inc_ref(trans, root, cow, 1); else - ret = btrfs_inc_ref(trans, root, cow, 0, 1); + ret = btrfs_inc_ref(trans, root, cow, 0); if (ret) return ret; @@ -1035,14 +1035,14 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans, if ((owner == root->root_key.objectid || root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) && !(flags & BTRFS_BLOCK_FLAG_FULL_BACKREF)) { - ret = btrfs_inc_ref(trans, root, buf, 1, 1); + ret = btrfs_inc_ref(trans, root, buf, 1); BUG_ON(ret); /* -ENOMEM */ if (root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) { - ret = btrfs_dec_ref(trans, root, buf, 0, 1); + ret = btrfs_dec_ref(trans, root, buf, 0); BUG_ON(ret); /* -ENOMEM */ - ret = btrfs_inc_ref(trans, root, cow, 1, 1); + ret = btrfs_inc_ref(trans, root, cow, 1); BUG_ON(ret); /* -ENOMEM */ } new_flags |= BTRFS_BLOCK_FLAG_FULL_BACKREF; @@ -1050,9 +1050,9 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans, if (root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) - ret = btrfs_inc_ref(trans, root, cow, 1, 1); + ret = btrfs_inc_ref(trans, root, cow, 1); else - ret = btrfs_inc_ref(trans, root, cow, 0, 1); + ret = btrfs_inc_ref(trans, root, cow, 0); BUG_ON(ret); /* -ENOMEM */ } if (new_flags != 0) { @@ -1069,11 +1069,11 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans, if (flags & BTRFS_BLOCK_FLAG_FULL_BACKREF) { if (root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) - ret = btrfs_inc_ref(trans, root, cow, 1, 1); + ret = btrfs_inc_ref(trans, root, cow, 1); else - ret = btrfs_inc_ref(trans, root, cow, 0, 1); + ret = btrfs_inc_ref(trans, root, cow, 0); BUG_ON(ret); /* -ENOMEM */ - ret = btrfs_dec_ref(trans, root, buf, 1, 1); + ret = btrfs_dec_ref(trans, root, buf, 1); BUG_ON(ret); /* -ENOMEM */ } clean_tree_block(trans, root, buf); diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 4896d7a..56f280f 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3307,9 +3307,9 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 num_bytes, u64 min_alloc_size, u64 empty_size, u64 hint_byte, struct btrfs_key *ins, int is_data); int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, - struct extent_buffer *buf, int full_backref, int no_quota); + struct extent_buffer *buf, int full_backref); int btrfs_dec_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, - struct extent_buffer *buf, int full_backref, int no_quota); + struct extent_buffer *buf, int full_backref); int btrfs_set_disk_extent_flags(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, u64 flags, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tr
btrfs: add trace for qgroup accounting
We want this to debug qgroup changes on live systems. Signed-off-by: Mark Fasheh Reviewed-by: Josef Bacik --- fs/btrfs/qgroup.c| 3 +++ fs/btrfs/super.c | 1 + include/trace/events/btrfs.h | 56 3 files changed, 60 insertions(+) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index cf5aead..a9f0f05 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1290,6 +1290,7 @@ int btrfs_qgroup_record_ref(struct btrfs_trans_handle *trans, oper->seq = atomic_inc_return(&fs_info->qgroup_op_seq); INIT_LIST_HEAD(&oper->elem.list); oper->elem.seq = 0; + trace_btrfs_qgroup_record_ref(oper); ret = insert_qgroup_oper(fs_info, oper); if (ret) { /* Shouldn't happen so have an assert for developers */ @@ -1909,6 +1910,8 @@ static int btrfs_qgroup_account(struct btrfs_trans_handle *trans, ASSERT(is_fstree(oper->ref_root)); + trace_btrfs_qgroup_account(oper); + switch (oper->type) { case BTRFS_QGROUP_OPER_ADD_EXCL: case BTRFS_QGROUP_OPER_SUB_EXCL: diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 4662d92..ca7836c 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -60,6 +60,7 @@ #include "backref.h" #include "tests/btrfs-tests.h" +#include "qgroup.h" #define CREATE_TRACE_POINTS #include diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index 4ee4e30..b8774b3 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -23,6 +23,7 @@ struct map_lookup; struct extent_buffer; struct btrfs_work; struct __btrfs_workqueue; +struct btrfs_qgroup_operation; #define show_ref_type(type)\ __print_symbolic(type, \ @@ -1119,6 +1120,61 @@ DEFINE_EVENT(btrfs__workqueue_done, btrfs_workqueue_destroy, TP_ARGS(wq) ); +#define show_oper_type(type) \ + __print_symbolic(type, \ + { BTRFS_QGROUP_OPER_ADD_EXCL, "OPER_ADD_EXCL" }, \ + { BTRFS_QGROUP_OPER_ADD_SHARED, "OPER_ADD_SHARED" },\ + { BTRFS_QGROUP_OPER_SUB_EXCL, "OPER_SUB_EXCL" }, \ + { BTRFS_QGROUP_OPER_SUB_SHARED, "OPER_SUB_SHARED" }) + +DECLARE_EVENT_CLASS(btrfs_qgroup_oper, + + TP_PROTO(struct btrfs_qgroup_operation *oper), + + TP_ARGS(oper), + + TP_STRUCT__entry( + __field(u64, ref_root ) + __field(u64, bytenr) + __field(u64, num_bytes ) + __field(u64, seq ) + __field(int, type ) + __field(u64, elem_seq ) + ), + + TP_fast_assign( + __entry->ref_root = oper->ref_root; + __entry->bytenr = oper->bytenr, + __entry->num_bytes = oper->num_bytes; + __entry->seq= oper->seq; + __entry->type = oper->type; + __entry->elem_seq = oper->elem.seq; + ), + + TP_printk("ref_root = %llu, bytenr = %llu, num_bytes = %llu, " + "seq = %llu, elem.seq = %llu, type = %s", + (unsigned long long)__entry->ref_root, + (unsigned long long)__entry->bytenr, + (unsigned long long)__entry->num_bytes, + (unsigned long long)__entry->seq, + (unsigned long long)__entry->elem_seq, + show_oper_type(__entry->type)) +); + +DEFINE_EVENT(btrfs_qgroup_oper, btrfs_qgroup_account, + + TP_PROTO(struct btrfs_qgroup_operation *oper), + + TP_ARGS(oper) +); + +DEFINE_EVENT(btrfs_qgroup_oper, btrfs_qgroup_record_ref, + + TP_PROTO(struct btrfs_qgroup_operation *oper), + + TP_ARGS(oper) +); + #endif /* _TRACE_BTRFS_H */ /* This part must be outside protection */ -- 1.8.4.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] btrfs: qgroup fixes for btrfs_drop_snapshot V3
Hi, the following patches try to fix a long outstanding issue with qgroups and snapshot deletion. The core problem is that btrfs_drop_snapshot will skip shared extents during it's tree walk. This results in an inconsistent qgroup state once the drop is processed. The first patch adds some tracing which I found very useful in debugging qgroup operations. The second patch is an actual fix to the problem. A third patch, from Josef is also added. We need this because it fixes at least one set of inconsistencies qgroups can get to via drop_snapshot. With this version of the patch series, I can no longer reproduce qgroup inconsistencies via drop_snapshot on my test disks. Changes from last patch set: - search on bytenr and root, but not seq in btrfs_record_ref when we're looking for existing qgroup operations. Changes before that (V1-V2): - remove extra extent_buffer_uptodate call from account_shared_subtree() - catch return values for the accounting calls now and do the right thing (log an error and tell the user to rescan) - remove the loop on roots in qgroup_subtree_accounting and just use the nnodes member to make our first decision. - Don't queue up the subtree root for a change (the code in drop_snapshot handkles qgroup updates for this block). - only walk subtrees if we're actually in DROP_REFERENCE stage and we're going to call free_extent - account leaf items for level zero blocks that we are dropping in walk_up_proc General qgroups TODO: - We need an xfstest for the drop_snapshot case, otherwise I'm concerned that we can easily regress from bugs introduced via seemingly unrelated patches. This stuff can be fragile. - I already have a script that creates and removes a level 1 tree to introduce an inconsistency. I think adapting that is probably a good first step. The script can be found at: http://zeniv.linux.org.uk/~mfasheh/create-btrfs-trees.sh Please don't make fun of my poor shell scripting skills :) - qgroup items are not deleted after drop_snapshot. They stay orphaned, on disk, often with nonzero values in their count fields. This is something for another patch. Josef and I have some ideas for how to deal with this: - Just zero them out at the end of drop_snapshot (maybe in the future we could actually then delete them from disk?) - update btrfs_subtree_accounting() to remove bytes from the being-deleted qgroups so they wind up as zero on disk (this is preferable but might not be practical) - we need at least a rescan to be kicked off when adding parent qgroups. otherwise, the newly added groups start with the wrong information. Quite possible the rescan itself might need to be updated (I haven't tested this enough). - qgroup heirarchies in general don't seem quite implemented yet. Once we fix the previous items the code to update their counts for them will probably need some love. Please review, thanks. Diffstat follows, --Mark fs/btrfs/ctree.c | 20 +-- fs/btrfs/ctree.h |4 fs/btrfs/extent-tree.c | 285 +-- fs/btrfs/qgroup.c| 168 + fs/btrfs/qgroup.h|1 fs/btrfs/super.c |1 include/trace/events/btrfs.h | 57 7 files changed, 511 insertions(+), 25 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs: qgroup: account shared subtrees during snapshot delete
During its tree walk, btrfs_drop_snapshot() will skip any shared subtrees it encounters. This is incorrect when we have qgroups turned on as those subtrees need to have their contents accounted. In particular, the case we're concerned with is when removing our snapshot root leaves the subtree with only one root reference. In those cases we need to find the last remaining root and add each extent in the subtree to the corresponding qgroup exclusive counts. This patch implements the shared subtree walk and a new qgroup operation, BTRFS_QGROUP_OPER_SUB_SUBTREE. When an operation of this type is encountered during qgroup accounting, we search for any root references to that extent and in the case that we find only one reference left, we go ahead and do the math on it's exclusive counts. Signed-off-by: Mark Fasheh Reviewed-by: Josef Bacik --- fs/btrfs/extent-tree.c | 261 +++ fs/btrfs/qgroup.c| 165 +++ fs/btrfs/qgroup.h| 1 + include/trace/events/btrfs.h | 3 +- 4 files changed, 429 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 46f39bf..3f43e9a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7324,6 +7324,220 @@ reada: wc->reada_slot = slot; } +static int account_leaf_items(struct btrfs_trans_handle *trans, + struct btrfs_root *root, + struct extent_buffer *eb) +{ + int nr = btrfs_header_nritems(eb); + int i, extent_type, ret; + struct btrfs_key key; + struct btrfs_file_extent_item *fi; + u64 bytenr, num_bytes; + + for (i = 0; i < nr; i++) { + btrfs_item_key_to_cpu(eb, &key, i); + + if (key.type != BTRFS_EXTENT_DATA_KEY) + continue; + + fi = btrfs_item_ptr(eb, i, struct btrfs_file_extent_item); + /* filter out non qgroup-accountable extents */ + extent_type = btrfs_file_extent_type(eb, fi); + + if (extent_type == BTRFS_FILE_EXTENT_INLINE) + continue; + + bytenr = btrfs_file_extent_disk_bytenr(eb, fi); + if (!bytenr) + continue; + + num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi); + + ret = btrfs_qgroup_record_ref(trans, root->fs_info, + root->objectid, + bytenr, num_bytes, + BTRFS_QGROUP_OPER_SUB_SUBTREE, 0); + if (ret) + return ret; + } + return 0; +} + +/* + * Walk up the tree from the bottom, freeing leaves and any interior + * nodes which have had all slots visited. If a node (leaf or + * interior) is freed, the node above it will have it's slot + * incremented. The root node will never be freed. + * + * At the end of this function, we should have a path which has all + * slots incremented to the next position for a search. If we need to + * read a new node it will be NULL and the node above it will have the + * correct slot selected for a later read. + * + * If we increment the root nodes slot counter past the number of + * elements, 1 is returned to signal completion of the search. + */ +static int adjust_slots_upwards(struct btrfs_root *root, + struct btrfs_path *path, int root_level) +{ + int level = 0; + int nr, slot; + struct extent_buffer *eb; + + if (root_level == 0) + return 1; + + while (level <= root_level) { + eb = path->nodes[level]; + nr = btrfs_header_nritems(eb); + path->slots[level]++; + slot = path->slots[level]; + if (slot >= nr || level == 0) { + /* +* Don't free the root - we will detect this +* condition after our loop and return a +* positive value for caller to stop walking the tree. +*/ + if (level != root_level) { + btrfs_tree_unlock_rw(eb, path->locks[level]); + path->locks[level] = 0; + + free_extent_buffer(eb); + path->nodes[level] = NULL; + path->slots[level] = 0; + } + } else { + /* +* We have a valid slot to walk back down +* from. Stop here so caller can process these +* new nodes. +*/ + break; + } + + level++; + } + + eb = path->nodes[root_level]; + if (path->slots[root_l
[PATCH 2/2] btrfs: syslog when quota is disabled
Offline investigations of the issues would need to know when quota is disabled. Signed-off-by: Anand Jain --- fs/btrfs/ioctl.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index bb4a498..fd29978 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -4221,6 +4221,8 @@ static long btrfs_ioctl_quota_ctl(struct file *file, void __user *arg) break; case BTRFS_QUOTA_CTL_DISABLE: ret = btrfs_quota_disable(trans, root->fs_info); + if (!ret) + btrfs_info(root->fs_info, "quota is disabled"); break; default: ret = -EINVAL; -- 2.0.0.257.g75cc6c6 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs: syslog when quota is enabled
must syslog when btrfs working config changes so is to support offline investigation of the issues. --- fs/btrfs/ioctl.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 016a5eb..bb4a498 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -4216,6 +4216,8 @@ static long btrfs_ioctl_quota_ctl(struct file *file, void __user *arg) switch (sa->cmd) { case BTRFS_QUOTA_CTL_ENABLE: ret = btrfs_quota_enable(trans, root->fs_info); + if (!ret) + btrfs_info(root->fs_info, "quota is enabled"); break; case BTRFS_QUOTA_CTL_DISABLE: ret = btrfs_quota_disable(trans, root->fs_info); -- 2.0.0.257.g75cc6c6 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC] btrfs: code optimize use btrfs_get_bdev_and_sb() at btrfs_scan_one_device
(for review comments pls). btrfs_scan_one_device() needs SB, instead of doing it from scratch could use btrfs_get_bdev_and_sb() Signed-off-by: Anand Jain --- fs/btrfs/volumes.c | 51 ++- 1 file changed, 6 insertions(+), 45 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c166355..94e6131 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1053,14 +1053,11 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, { struct btrfs_super_block *disk_super; struct block_device *bdev; - struct page *page; - void *p; int ret = -EINVAL; u64 devid; u64 transid; u64 total_devices; - u64 bytenr; - pgoff_t index; + struct buffer_head *bh; /* * we would like to check all the supers, but that would make @@ -1068,44 +1065,12 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, * So, we need to add a special mount option to scan for * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ - bytenr = btrfs_sb_offset(0); mutex_lock(&uuid_mutex); - bdev = blkdev_get_by_path(path, flags, holder); - - if (IS_ERR(bdev)) { - ret = PTR_ERR(bdev); + ret = btrfs_get_bdev_and_sb(path, flags, holder, 0, &bdev, &bh); + if (ret) goto error; - } - - /* make sure our super fits in the device */ - if (bytenr + PAGE_CACHE_SIZE >= i_size_read(bdev->bd_inode)) - goto error_bdev_put; - - /* make sure our super fits in the page */ - if (sizeof(*disk_super) > PAGE_CACHE_SIZE) - goto error_bdev_put; - - /* make sure our super doesn't straddle pages on disk */ - index = bytenr >> PAGE_CACHE_SHIFT; - if ((bytenr + sizeof(*disk_super) - 1) >> PAGE_CACHE_SHIFT != index) - goto error_bdev_put; - - /* pull in the page with our super */ - page = read_cache_page_gfp(bdev->bd_inode->i_mapping, - index, GFP_NOFS); - - if (IS_ERR_OR_NULL(page)) - goto error_bdev_put; - - p = kmap(page); - - /* align our pointer to the offset of the super block */ - disk_super = p + (bytenr & ~PAGE_CACHE_MASK); - - if (btrfs_super_bytenr(disk_super) != bytenr || - btrfs_super_magic(disk_super) != BTRFS_MAGIC) - goto error_unmap; + disk_super = (struct btrfs_super_block *) bh->b_data; devid = btrfs_stack_device_id(&disk_super->dev_item); transid = btrfs_super_generation(disk_super); @@ -1125,13 +1090,9 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, printk(KERN_CONT "devid %llu transid %llu %s\n", devid, transid, path); } - -error_unmap: - kunmap(page); - page_cache_release(page); - -error_bdev_put: + brelse(bh); blkdev_put(bdev, flags); + error: mutex_unlock(&uuid_mutex); return ret; -- 2.0.0.257.g75cc6c6 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: test for valid bdev before kobj removal in btrfs_rm_device
commit 4cd btrfs: dev delete should remove sysfs entry added a btrfs_kobj_rm_device, which dereferences device->bdev... right after we check whether device->bdev might be NULL. I don't honestly know if it's possible to have a NULL device->bdev here, but assuming that it is (given the test), we need to move the kobject removal to be under that test. (Coverity spotted this) Signed-off-by: Eric Sandeen --- If it's not possible for bdev to be null, then the test should just be removed, but that's above my current btrfs pay grade. ;) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 6104676..6cb82f6 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1680,11 +1680,11 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) if (device->bdev == root->fs_info->fs_devices->latest_bdev) root->fs_info->fs_devices->latest_bdev = next_device->bdev; - if (device->bdev) + if (device->bdev) { device->fs_devices->open_devices--; - - /* remove sysfs entry */ - btrfs_kobj_rm_device(root->fs_info, device); + /* remove sysfs entry */ + btrfs_kobj_rm_device(root->fs_info, device); + } call_rcu(&device->rcu, free_device); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Revert "btrfs: allow mounting btrfs subvolumes with different ro/rw options"
On 07/07/2014 03:46 AM, Qu Wenruo wrote: > [... cut ...] >> >> So to me it seems reasonable to have different rw/ro status between >> btrfs root and btrfs subvolume. As use case think a system which >> hosts several guests in container. Each guest has its own subvolume >> as root filesystem. An user would mount the btrfs root RO in order >> to see all the subvolume but at the same time he avoids to change a >> file for error; when a guest has to be started, its root >> filesystem/subvolume can be mounted RW. > You caught me. Yes, the use case seems quite resonable since > currently you need to mount btrfs to get the subvolume list (the only > offline method seems btrfs-debug-tree but end-users won't use it > anyway) and it's a good admin behavior to mount it ro if no need to > write. >> >> On the other side, I understand that this could lead to an >> unexpected behaviour because with the other filesystem it is >> impossible to mount only a part as RW. In this BTRFS would be >> different. >> >> Following the "least surprise" principle, I prefer that the *mount* >> RO/RW flag acts globally: the filesystem has only one status. It is >> possible to change it only globally. >> >> In order to having a subvolumes with different RO/RW status we >> should rely on different flag. I have to point out that the >> subvolume has already the concept of read-only status. >> >> We could adopt the following rules: >> - if the filesystem is mounted RO then all the subvolumes (event >> the id=5) are RO >>- if a subvolume is marked RO, the it is RO >> - otherwise a subvolume is RW > I'm confused with rule 1. When mentionting 'mounted RO', you mean > mount subvolume id=5 RO? Also you mentioned that using differnt RO/RW > flag independent from VFS RO/RW flags, so it also makes me confused > that when mentioning RO, did you mean VFS RO or new btrfs RO/RW > flags? For "mounted RO" I mean the VFS flag, the "one" passed via the mount command. I say "one" as 1, because I am convinced that it has to act globally, e.g. on the whole filesystem; the flag should be set at the first mount, then it can be changed (only ?) issuing a "mount -o remount,rw/ro" For example, the following commands # mount -o subvol=subvolname,ro /dev/sda1 /mnt/btrfs-subvol # mount -o subvolid=5 /dev/sda1 /mnt/btrfs-root cause the following ones # touch /mnt/btrfs-subvol/touch-a-file # touch /mnt/btrfs-root/touch-a-file2 to fail; and the following commands # mount -o subvol=subvolname,ro /dev/sda1 /mnt/btrfs-subvol # mount -o subvolid=5 /dev/sda1 /mnt/btrfs-root # mount -o remount,rw /mnt/btrfs-subvol cause the following ones # touch /mnt/btrfs-subvol/touch-a-file # touch /mnt/btrfs-root/touch-a-file2 to succeed So for each filesystem, there is a "globally" flag ro/rw which acts on the whole filesystem. Clear and simple. Step 2: a more fine grained control of the subvolumes. We have already the capability to make a subvolume read-only/read-write doing # btrfs property set -t s /path/to/subvolume ro true or # btrfs property set -t s /path/to/subvolume ro false My idea is to use this flag. It could be done at the mount time for example: # mount -o subvolmode=ro,subvol=subvolname /dev/sda1 / (this example doesn't work, it is only a my idea) So: - we should not add further code - the semantic is simple - the property is linked to the subvolume in a understandable way We should only add the subvolmode=ro option to the mount command. Further discussion need to investigate the following cases: - if the filesystem is mounted as ro (mount -o ro), does mounting a subvolume rw ( mount -o subvolmode=rw...) should raise an error ? (IMHO yes) - if the filesystem is mounted as ro (mount -o ro), does mounting the filesystem a 2nd time rw ( mount -o rw...) should raise an error ? (IMHO yes) - if a subvolume is mounter rw (or ro), does mounting the same subvolume a 2nd time as ro (or rw) should raise an error ? (IMHO yes) BR G.Baroncelli >> >> Moreover we can add further rules to inherit the subvolume RO/RW >> status at the creation time (even tough it makes sense only for the >> snapshot). We could use an xattr for that. >> >> Finally I would like to point out that relying on the relationship >> parent/child between the subvolumes is very dangerous. With the >> exception if the subvolid=5 which is the only root one, it is very >> easy to move up and down the subvolume. I have to point out this >> because I read on another email that someone likes the idea to >> having a RO subvolume because its parent is marked RO. But a >> subvolume may be mounted also by id and not by its path (and or >> name). So relying on the relationship parent/child would lead to >> break the "least surprise principle". >> >> My 2 ¢ BR G.Baroncelli > Oh I forgot that user can mv subvolumes just like normal dirs. In > this case it will certainly make ro/rw disaster if rely on the parent > ro/rw status. :( > > Thanks,
Re: mount time of multi-disk arrays
On 07/07/2014 04:14 PM, Austin S Hemmelgarn wrote: > On 2014-07-07 09:54, Konstantinos Skarlatos wrote: >> On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: >>> Hello List, >>> >>> can anyone tell me how much time is acceptable and assumable for a >>> multi-disk btrfs array with classical hard disk drives to mount? >>> >>> I'm having a bit of trouble with my current systemd setup, because it >>> couldn't mount my btrfs raid anymore after adding the 5th drive. With >>> the 4 drive setup it failed to mount once in a few times. Now it fails >>> everytime because the default timeout of 1m 30s is reached and mount is >>> aborted. >>> My last 10 manual mounts took between 1m57s and 2m12s to finish. >> I have the exact same problem, and have to manually mount my large >> multi-disk btrfs filesystems, so I would be interested in a solution as >> well. >> >>> My hardware setup contains a >>> - Intel Core i7 4770 >>> - Kernel 3.15.2-1-ARCH >>> - 32GB RAM >>> - dev 1-4 are 4TB Seagate ST4000DM000 (5900rpm) >>> - dev 5 is a 4TB Wstern Digital WDC WD40EFRX (5400rpm) >>> >>> Thanks in advance >>> >>> André-Sebastian Liebe >>> -- >>> >>> >>> # btrfs fi sh >>> Label: 'apc01_pool0' uuid: 066141c6-16ca-4a30-b55c-e606b90ad0fb >>> Total devices 5 FS bytes used 14.21TiB >>> devid1 size 3.64TiB used 2.86TiB path /dev/sdd >>> devid2 size 3.64TiB used 2.86TiB path /dev/sdc >>> devid3 size 3.64TiB used 2.86TiB path /dev/sdf >>> devid4 size 3.64TiB used 2.86TiB path /dev/sde >>> devid5 size 3.64TiB used 2.88TiB path /dev/sdb >>> >>> Btrfs v3.14.2-dirty >>> >>> # btrfs fi df /data/pool0/ >>> Data, single: total=14.28TiB, used=14.19TiB >>> System, RAID1: total=8.00MiB, used=1.54MiB >>> Metadata, RAID1: total=26.00GiB, used=20.20GiB >>> unknown, single: total=512.00MiB, used=0.00 > This is interesting, I actually did some profiling of the mount timings > for a bunch of different configurations of 4 (identical other than > hardware age) 1TB Seagate disks. One of the arrangements I tested was > Data using single profile and Metadata/System using RAID1. Based on the > results I got, and what you are reporting, the mount time doesn't scale > linearly in proportion to the amount of storage space. > > You might want to try the RAID10 profile for Metadata, of the > configurations I tested, the fastest used Single for Data and RAID10 for > Metadata/System. Switching Metadata from raid1 to raid10 reduced mount times from roughly 120s to 38s! > > Also, based on the System chunk usage, I'm guessing that you have a LOT > of subvolumes/snapshots, and I do know that having very large (100+) > numbers of either does slow down the mount command (I don't think that > we cache subvolume information between mount invocations, so it has to > re-parse the system chunks for each individual mount). No, I had to remove the one and only snapshot to recover from a 'no space left on device' to regain metadata space (http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html) -- André-Sebastian Liebe -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mount time of multi-disk arrays
As a point of reference, my BTRFS filesystem with 11 x 21TB devices in RAID0 with space cache enabled takes about 4 minutes to mount after a clean unmount. There is a decent amount of variation in the amount of time (has been as low as 3 minutes or taken 5 minutes or longer). These devices are all connected via 10gb iscsi. Mount time seems to have not increased relative to the number of devices (so far). I think that back when we had only 6 devices, it still took roughly that amount of time. -ben -- - Benjamin O'Connor TechOps Systems Administrator TripAdvisor Media Group be...@tripadvisor.com c. 617-312-9072 - Duncan wrote: Konstantinos Skarlatos posted on Mon, 07 Jul 2014 16:54:05 +0300 as excerpted: On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: can anyone tell me how much time is acceptable and assumable for a multi-disk btrfs array with classical hard disk drives to mount? I'm having a bit of trouble with my current systemd setup, because it couldn't mount my btrfs raid anymore after adding the 5th drive. With the 4 drive setup it failed to mount once in a few times. Now it fails everytime because the default timeout of 1m 30s is reached and mount is aborted. My last 10 manual mounts took between 1m57s and 2m12s to finish. I have the exact same problem, and have to manually mount my large multi-disk btrfs filesystems, so I would be interested in a solution as well. I don't have a direct answer, as my btrfs devices are all SSD, but... a) Btrfs, like some other filesystems, is designed not to need a pre-mount (or pre-rw-mount) fsck, because it does what /should/ be a quick-scan at mount-time. However, that isn't always as quick as it might be for a number of reasons: a1) Btrfs is still a relatively immature filesystem and certain operations are not yet optimized. In particular, multi-device btrfs operations tend to still be using a first-working-implementation type of algorithm instead of a well optimized for parallel operation algorithm, and thus often serialize access to multiple devices where a more optimized algorithm would parallelize operations across multiple devices at the same time. That will come, but it's not there yet. a2) Certain operations such as orphan cleanup ("orphans" are files that were deleted while they were in use and thus weren't fully deleted at the time; if they were still in use at unmount (remount-read-only), cleanup is done at mount-time) can delay mount as well. a3) Inode_cache mount option: Don't use this unless you can explain exactly WHY you are using it, preferably backed up with benchmark numbers, etc. It's useful only on 32-bit, generally high-file-activity server systems and has general-case problems, including long mount times and possible overflow issues that make it inappropriate for normal use. Unfortunately there's a lot of people out there using it that shouldn't be, and I even saw it listed on at least one distro (not mine!) wiki. =:^( a4) The space_cache mount option OTOH *IS* appropriate for normal use (and is in fact enabled by default these days), but particularly in improper shutdown cases can require rebuilding at mount time -- altho this should happen /after/ mount, the system will just be busy for some minutes, until the space-cache is rebuilt. But the IO from a space_cache rebuild on one filesystem could slow down the mounting of filesystems that mount after it, as well as the boot-time launching of other post- mount launched services. If you're seeing the time go up dramatically with the addition of more filesystem devices, however, and you do /not/ have inode_cache active, I'd guess it's mainly the not-yet-optimized multi-device operations. b) As with any systemd launched unit, however, there's systemd configuration mechanisms for working around specific unit issues, including timeout issues. Of course most systems continue to use fstab and let systemd auto-generate the mount units, and in fact that is recommended, but either with fstab or directly created mount units, there's a timeout configuration option that can be set. b1) The general systemd *.mount unit [Mount] section option appears to be TimeoutSec=. As is usual with systemd times, the default is seconds, or pass the unit(s, like "5min 20s"). b2) I don't see it /specifically/ stated, but with a bit of reading between the lines, the corresponding fstab option appears to be either x-systemd.timeoutsec= or x-systemd.TimeoutSec= (IOW I'm not sure of the case). You may also want to try x-systemd.device-timeout=, which /is/ specifically mentioned, altho that appears to be specifically the timeout for the device to appear, NOT for the filesystem to mount after it does. b3) See the systemd.mount (5) and systemd-fstab-generator (8) manpages for more, that being what the above is based on. So it might take a bit of experimentation to find the exact command, but based on the above anyway, it /sh
Re: mount time of multi-disk arrays
Konstantinos Skarlatos posted on Mon, 07 Jul 2014 16:54:05 +0300 as excerpted: > On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: >> >> can anyone tell me how much time is acceptable and assumable for a >> multi-disk btrfs array with classical hard disk drives to mount? >> >> I'm having a bit of trouble with my current systemd setup, because it >> couldn't mount my btrfs raid anymore after adding the 5th drive. With >> the 4 drive setup it failed to mount once in a few times. Now it fails >> everytime because the default timeout of 1m 30s is reached and mount is >> aborted. >> My last 10 manual mounts took between 1m57s and 2m12s to finish. > I have the exact same problem, and have to manually mount my large > multi-disk btrfs filesystems, so I would be interested in a solution as > well. I don't have a direct answer, as my btrfs devices are all SSD, but... a) Btrfs, like some other filesystems, is designed not to need a pre-mount (or pre-rw-mount) fsck, because it does what /should/ be a quick-scan at mount-time. However, that isn't always as quick as it might be for a number of reasons: a1) Btrfs is still a relatively immature filesystem and certain operations are not yet optimized. In particular, multi-device btrfs operations tend to still be using a first-working-implementation type of algorithm instead of a well optimized for parallel operation algorithm, and thus often serialize access to multiple devices where a more optimized algorithm would parallelize operations across multiple devices at the same time. That will come, but it's not there yet. a2) Certain operations such as orphan cleanup ("orphans" are files that were deleted while they were in use and thus weren't fully deleted at the time; if they were still in use at unmount (remount-read-only), cleanup is done at mount-time) can delay mount as well. a3) Inode_cache mount option: Don't use this unless you can explain exactly WHY you are using it, preferably backed up with benchmark numbers, etc. It's useful only on 32-bit, generally high-file-activity server systems and has general-case problems, including long mount times and possible overflow issues that make it inappropriate for normal use. Unfortunately there's a lot of people out there using it that shouldn't be, and I even saw it listed on at least one distro (not mine!) wiki. =:^( a4) The space_cache mount option OTOH *IS* appropriate for normal use (and is in fact enabled by default these days), but particularly in improper shutdown cases can require rebuilding at mount time -- altho this should happen /after/ mount, the system will just be busy for some minutes, until the space-cache is rebuilt. But the IO from a space_cache rebuild on one filesystem could slow down the mounting of filesystems that mount after it, as well as the boot-time launching of other post- mount launched services. If you're seeing the time go up dramatically with the addition of more filesystem devices, however, and you do /not/ have inode_cache active, I'd guess it's mainly the not-yet-optimized multi-device operations. b) As with any systemd launched unit, however, there's systemd configuration mechanisms for working around specific unit issues, including timeout issues. Of course most systems continue to use fstab and let systemd auto-generate the mount units, and in fact that is recommended, but either with fstab or directly created mount units, there's a timeout configuration option that can be set. b1) The general systemd *.mount unit [Mount] section option appears to be TimeoutSec=. As is usual with systemd times, the default is seconds, or pass the unit(s, like "5min 20s"). b2) I don't see it /specifically/ stated, but with a bit of reading between the lines, the corresponding fstab option appears to be either x-systemd.timeoutsec= or x-systemd.TimeoutSec= (IOW I'm not sure of the case). You may also want to try x-systemd.device-timeout=, which /is/ specifically mentioned, altho that appears to be specifically the timeout for the device to appear, NOT for the filesystem to mount after it does. b3) See the systemd.mount (5) and systemd-fstab-generator (8) manpages for more, that being what the above is based on. So it might take a bit of experimentation to find the exact command, but based on the above anyway, it /should/ be pretty easy to tell systemd to wait a bit longer for that filesystem. When you find the right invocation, please reply with it here, as I'm sure there's others who will benefit as well. FWIW, I'm still on reiserfs for my spinning rust (only btrfs on my ssds), but I expect I'll switch them to btrfs at some point, so I may well use the information myself. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a
Re: [PATCH 2/2] btrfs-progs: Add mount point check for 'btrfs fi df' command
On 7/4/14, 8:52 AM, David Sterba wrote: > On Fri, Jul 04, 2014 at 04:38:49PM +0800, Qu Wenruo wrote: >> 'btrfs fi df' command is currently able to be executed on any file/dir >> inside btrfs since it uses btrfs ioctl to get disk usage info. >> >> However it is somewhat confusing for some end users since normally such >> command should only be executed on a mount point. > > I disagree here, it's much more convenient to run 'fi df' anywhere and > get the output. The system 'df' command works the same way. I agree with that, and said as much in the original bug filed @Fedora. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2] btrfs compression: merge inflate and deflate z_streams
Hello, This patch reduces zlib compression memory usage by `merging' inflate and deflate streams into a single stream. -- v2: rebased-on linux-next rc4 20140707 Sergey Senozhatsky (1): btrfs compression: merge inflate and deflate z_streams fs/btrfs/zlib.c | 138 1 file changed, 68 insertions(+), 70 deletions(-) -- 2.0.1.612.gea98109 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2] btrfs compression: merge inflate and deflate z_streams
`struct workspace' used for zlib compression contains two zlib z_stream-s: `def_strm' used in zlib_compress_pages(), and `inf_strm' used in zlib_decompress/zlib_decompress_biovec(). None of these functions use `inf_strm' and `def_strm' simultaniously, meaning that for every compress/decompress operation we need only one z_stream (out of two available). `inf_strm' and `def_strm' are different in size of ->workspace. For inflate stream we vmalloc() zlib_inflate_workspacesize() bytes, for deflate stream - zlib_deflate_workspacesize() bytes. On my system zlib returns the following workspace sizes, correspondingly: 42312 and 268104 (+ guard pages). Keep only one `z_stream' in `struct workspace' and use it for both compression and decompression. Hence, instead of vmalloc() of two z_stream->worskpace-s, allocate only one of size: max(zlib_deflate_workspacesize(), zlib_inflate_workspacesize()) Reviewed-by: David Sterba Signed-off-by: Sergey Senozhatsky --- fs/btrfs/zlib.c | 138 1 file changed, 68 insertions(+), 70 deletions(-) diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c index b67d8fc..fa56a56 100644 --- a/fs/btrfs/zlib.c +++ b/fs/btrfs/zlib.c @@ -33,8 +33,7 @@ #include "compression.h" struct workspace { - z_stream inf_strm; - z_stream def_strm; + z_stream strm; char *buf; struct list_head list; }; @@ -43,8 +42,7 @@ static void zlib_free_workspace(struct list_head *ws) { struct workspace *workspace = list_entry(ws, struct workspace, list); - vfree(workspace->def_strm.workspace); - vfree(workspace->inf_strm.workspace); + vfree(workspace->strm.workspace); kfree(workspace->buf); kfree(workspace); } @@ -52,17 +50,17 @@ static void zlib_free_workspace(struct list_head *ws) static struct list_head *zlib_alloc_workspace(void) { struct workspace *workspace; + int workspacesize; workspace = kzalloc(sizeof(*workspace), GFP_NOFS); if (!workspace) return ERR_PTR(-ENOMEM); - workspace->def_strm.workspace = vmalloc(zlib_deflate_workspacesize( - MAX_WBITS, MAX_MEM_LEVEL)); - workspace->inf_strm.workspace = vmalloc(zlib_inflate_workspacesize()); + workspacesize = max(zlib_deflate_workspacesize(MAX_WBITS, MAX_MEM_LEVEL), + zlib_inflate_workspacesize()); + workspace->strm.workspace = vmalloc(workspacesize); workspace->buf = kmalloc(PAGE_CACHE_SIZE, GFP_NOFS); - if (!workspace->def_strm.workspace || - !workspace->inf_strm.workspace || !workspace->buf) + if (!workspace->strm.workspace || !workspace->buf) goto fail; INIT_LIST_HEAD(&workspace->list); @@ -96,14 +94,14 @@ static int zlib_compress_pages(struct list_head *ws, *total_out = 0; *total_in = 0; - if (Z_OK != zlib_deflateInit(&workspace->def_strm, 3)) { + if (Z_OK != zlib_deflateInit(&workspace->strm, 3)) { printk(KERN_WARNING "BTRFS: deflateInit failed\n"); ret = -EIO; goto out; } - workspace->def_strm.total_in = 0; - workspace->def_strm.total_out = 0; + workspace->strm.total_in = 0; + workspace->strm.total_out = 0; in_page = find_get_page(mapping, start >> PAGE_CACHE_SHIFT); data_in = kmap(in_page); @@ -117,25 +115,25 @@ static int zlib_compress_pages(struct list_head *ws, pages[0] = out_page; nr_pages = 1; - workspace->def_strm.next_in = data_in; - workspace->def_strm.next_out = cpage_out; - workspace->def_strm.avail_out = PAGE_CACHE_SIZE; - workspace->def_strm.avail_in = min(len, PAGE_CACHE_SIZE); + workspace->strm.next_in = data_in; + workspace->strm.next_out = cpage_out; + workspace->strm.avail_out = PAGE_CACHE_SIZE; + workspace->strm.avail_in = min(len, PAGE_CACHE_SIZE); - while (workspace->def_strm.total_in < len) { - ret = zlib_deflate(&workspace->def_strm, Z_SYNC_FLUSH); + while (workspace->strm.total_in < len) { + ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH); if (ret != Z_OK) { printk(KERN_DEBUG "BTRFS: deflate in loop returned %d\n", ret); - zlib_deflateEnd(&workspace->def_strm); + zlib_deflateEnd(&workspace->strm); ret = -EIO; goto out; } /* we're making it bigger, give up */ - if (workspace->def_strm.total_in > 8192 && - workspace->def_strm.total_in < - workspace->def_strm.total_out) { + if (workspace->strm.total_in > 8192 && + workspace->strm.total_in < + workspace->strm.total_o
[PATCH] Btrfs-progs: fix Segmentation fault of btrfs-convert
Recently we merge a memory leak fix, which fails xfstests/btrfs/012, the cause is that it only frees @fs_devices but leaves it on the global fs_uuid list, which cause a 'Segmentation fault' over running command btrfs-convert. This fixes the problem. Signed-off-by: Liu Bo --- volumes.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/volumes.c b/volumes.c index a61928c..8b827fa 100644 --- a/volumes.c +++ b/volumes.c @@ -184,11 +184,17 @@ again: seed_devices = fs_devices->seed; fs_devices->seed = NULL; if (seed_devices) { + struct btrfs_fs_devices *orig; + + orig = fs_devices; fs_devices = seed_devices; + list_del(&orig->list); + free(orig); goto again; + } else { + list_del(&fs_devices->list); + free(fs_devices); } - - free(fs_devices); return 0; } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs compression: merge inflate and deflate z_streams
On (07/01/14 16:44), David Sterba wrote: > On Tue, Jul 01, 2014 at 12:32:10AM +0900, Sergey Senozhatsky wrote: > > `struct workspace' used for zlib compression contains two zlib > > z_stream-s: `def_strm' used in zlib_compress_pages(), and `inf_strm' > > used in zlib_decompress/zlib_decompress_biovec(). None of these > > functions use `inf_strm' and `def_strm' simultaniously, meaning that > > for every compress/decompress operation we need only one z_stream > > (out of two available). > > > > `inf_strm' and `def_strm' are different in size of ->workspace. For > > inflate stream we vmalloc() zlib_inflate_workspacesize() bytes, for > > deflate stream - zlib_deflate_workspacesize() bytes. On my system zlib > > returns the following workspace sizes, correspondingly: 42312 and 268104 > > (+ guard pages). > > > > Keep only one `z_stream' in `struct workspace' and use it for both > > compression and decompression. Hence, instead of vmalloc() of two > > z_stream->worskpace-s, allocate only one of size: > > max(zlib_deflate_workspacesize(), zlib_inflate_workspacesize()) > > > > Signed-off-by: Sergey Senozhatsky > > Reviewed-by: David Sterba > Hello, the patch does not apply against linux-next rc4-20140707 due to 130d5b415a091e. unhappy hunk is: + if (workspace->strm.total_in > 8192 && + workspace->strm.total_in < + workspace->strm.total_out) { ret = -EIO; now it should be: + if (workspace->strm.total_in > 8192 && + workspace->strm.total_in < + workspace->strm.total_out) { ret = -E2BIG; I'll rebase and resend. -ss -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mount time of multi-disk arrays
On 07/07/2014 03:54 PM, Konstantinos Skarlatos wrote: > On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: >> Hello List, >> >> can anyone tell me how much time is acceptable and assumable for a >> multi-disk btrfs array with classical hard disk drives to mount? >> >> I'm having a bit of trouble with my current systemd setup, because it >> couldn't mount my btrfs raid anymore after adding the 5th drive. With >> the 4 drive setup it failed to mount once in a few times. Now it fails >> everytime because the default timeout of 1m 30s is reached and mount is >> aborted. >> My last 10 manual mounts took between 1m57s and 2m12s to finish. > I have the exact same problem, and have to manually mount my large > multi-disk btrfs filesystems, so I would be interested in a solution > as well. Hi Konstantinos , you can workaround this by manual creating a systemd mount unit. - First review the autogenerated systemd mount unit (systemctl show .mount). You you can get the unit name by issuing a 'systemctl' and look for your failed mount. - Then you have to take the needed values (After, Before, Conflicts, RequiresMountsFor, Where, What, Options, Type, Wantedby) and put them into an new systemd mount unit file (possibly under /usr/lib/systemd/system/.mount ). - Now just add the TimeoutSec with a large enough value below [Mount]. - If you later want to automount you raid, add the WantedBy under [Install] - now issue a 'systemctl daemon-reload' and look for error messages in syslog. - If there are no errors you could enable your manual mount entry by 'systemctl enable .mount' and safely comment out your old fstab entry (systemd no longer generates autogenerated units). -- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- [Unit] Description=Mount /data/pool0 After=dev-disk-by\x2duuid-066141c6\x2d16ca\x2d4a30\x2db55c\x2de606b90ad0fb.device systemd-journald.socket local-fs-pre.target system.slice -.mount Before=umount.target Conflicts=umount.target RequiresMountsFor=/data /dev/disk/by-uuid/066141c6-16ca-4a30-b55c-e606b90ad0fb [Mount] Where=/data/pool0 What=/dev/disk/by-uuid/066141c6-16ca-4a30-b55c-e606b90ad0fb Options=rw,relatime,skip_balance,compress Type=btrfs TimeoutSec=3min [Install] WantedBy=dev-disk-by\x2duuid-066141c6\x2d16ca\x2d4a30\x2db55c\x2de606b90ad0fb.device -- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- > >> >> My hardware setup contains a >> - Intel Core i7 4770 >> - Kernel 3.15.2-1-ARCH >> - 32GB RAM >> - dev 1-4 are 4TB Seagate ST4000DM000 (5900rpm) >> - dev 5 is a 4TB Wstern Digital WDC WD40EFRX (5400rpm) >> >> Thanks in advance >> >> André-Sebastian Liebe >> -- >> >> >> # btrfs fi sh >> Label: 'apc01_pool0' uuid: 066141c6-16ca-4a30-b55c-e606b90ad0fb >> Total devices 5 FS bytes used 14.21TiB >> devid1 size 3.64TiB used 2.86TiB path /dev/sdd >> devid2 size 3.64TiB used 2.86TiB path /dev/sdc >> devid3 size 3.64TiB used 2.86TiB path /dev/sdf >> devid4 size 3.64TiB used 2.86TiB path /dev/sde >> devid5 size 3.64TiB used 2.88TiB path /dev/sdb >> >> Btrfs v3.14.2-dirty >> >> # btrfs fi df /data/pool0/ >> Data, single: total=14.28TiB, used=14.19TiB >> System, RAID1: total=8.00MiB, used=1.54MiB >> Metadata, RAID1: total=26.00GiB, used=20.20GiB >> unknown, single: total=512.00MiB, used=0.00 >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe >> linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Konstantinos Skarlatos -- André-Sebastian Liebe -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mount time of multi-disk arrays
On 2014-07-07 09:54, Konstantinos Skarlatos wrote: > On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: >> Hello List, >> >> can anyone tell me how much time is acceptable and assumable for a >> multi-disk btrfs array with classical hard disk drives to mount? >> >> I'm having a bit of trouble with my current systemd setup, because it >> couldn't mount my btrfs raid anymore after adding the 5th drive. With >> the 4 drive setup it failed to mount once in a few times. Now it fails >> everytime because the default timeout of 1m 30s is reached and mount is >> aborted. >> My last 10 manual mounts took between 1m57s and 2m12s to finish. > I have the exact same problem, and have to manually mount my large > multi-disk btrfs filesystems, so I would be interested in a solution as > well. > >> >> My hardware setup contains a >> - Intel Core i7 4770 >> - Kernel 3.15.2-1-ARCH >> - 32GB RAM >> - dev 1-4 are 4TB Seagate ST4000DM000 (5900rpm) >> - dev 5 is a 4TB Wstern Digital WDC WD40EFRX (5400rpm) >> >> Thanks in advance >> >> André-Sebastian Liebe >> -- >> >> >> # btrfs fi sh >> Label: 'apc01_pool0' uuid: 066141c6-16ca-4a30-b55c-e606b90ad0fb >> Total devices 5 FS bytes used 14.21TiB >> devid1 size 3.64TiB used 2.86TiB path /dev/sdd >> devid2 size 3.64TiB used 2.86TiB path /dev/sdc >> devid3 size 3.64TiB used 2.86TiB path /dev/sdf >> devid4 size 3.64TiB used 2.86TiB path /dev/sde >> devid5 size 3.64TiB used 2.88TiB path /dev/sdb >> >> Btrfs v3.14.2-dirty >> >> # btrfs fi df /data/pool0/ >> Data, single: total=14.28TiB, used=14.19TiB >> System, RAID1: total=8.00MiB, used=1.54MiB >> Metadata, RAID1: total=26.00GiB, used=20.20GiB >> unknown, single: total=512.00MiB, used=0.00 This is interesting, I actually did some profiling of the mount timings for a bunch of different configurations of 4 (identical other than hardware age) 1TB Seagate disks. One of the arrangements I tested was Data using single profile and Metadata/System using RAID1. Based on the results I got, and what you are reporting, the mount time doesn't scale linearly in proportion to the amount of storage space. You might want to try the RAID10 profile for Metadata, of the configurations I tested, the fastest used Single for Data and RAID10 for Metadata/System. Also, based on the System chunk usage, I'm guessing that you have a LOT of subvolumes/snapshots, and I do know that having very large (100+) numbers of either does slow down the mount command (I don't think that we cache subvolume information between mount invocations, so it has to re-parse the system chunks for each individual mount). smime.p7s Description: S/MIME Cryptographic Signature
Re: mount time of multi-disk arrays
On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: Hello List, can anyone tell me how much time is acceptable and assumable for a multi-disk btrfs array with classical hard disk drives to mount? I'm having a bit of trouble with my current systemd setup, because it couldn't mount my btrfs raid anymore after adding the 5th drive. With the 4 drive setup it failed to mount once in a few times. Now it fails everytime because the default timeout of 1m 30s is reached and mount is aborted. My last 10 manual mounts took between 1m57s and 2m12s to finish. I have the exact same problem, and have to manually mount my large multi-disk btrfs filesystems, so I would be interested in a solution as well. My hardware setup contains a - Intel Core i7 4770 - Kernel 3.15.2-1-ARCH - 32GB RAM - dev 1-4 are 4TB Seagate ST4000DM000 (5900rpm) - dev 5 is a 4TB Wstern Digital WDC WD40EFRX (5400rpm) Thanks in advance André-Sebastian Liebe -- # btrfs fi sh Label: 'apc01_pool0' uuid: 066141c6-16ca-4a30-b55c-e606b90ad0fb Total devices 5 FS bytes used 14.21TiB devid1 size 3.64TiB used 2.86TiB path /dev/sdd devid2 size 3.64TiB used 2.86TiB path /dev/sdc devid3 size 3.64TiB used 2.86TiB path /dev/sdf devid4 size 3.64TiB used 2.86TiB path /dev/sde devid5 size 3.64TiB used 2.88TiB path /dev/sdb Btrfs v3.14.2-dirty # btrfs fi df /data/pool0/ Data, single: total=14.28TiB, used=14.19TiB System, RAID1: total=8.00MiB, used=1.54MiB Metadata, RAID1: total=26.00GiB, used=20.20GiB unknown, single: total=512.00MiB, used=0.00 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Konstantinos Skarlatos -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mount time of multi-disk arrays
Hello List, can anyone tell me how much time is acceptable and assumable for a multi-disk btrfs array with classical hard disk drives to mount? I'm having a bit of trouble with my current systemd setup, because it couldn't mount my btrfs raid anymore after adding the 5th drive. With the 4 drive setup it failed to mount once in a few times. Now it fails everytime because the default timeout of 1m 30s is reached and mount is aborted. My last 10 manual mounts took between 1m57s and 2m12s to finish. My hardware setup contains a - Intel Core i7 4770 - Kernel 3.15.2-1-ARCH - 32GB RAM - dev 1-4 are 4TB Seagate ST4000DM000 (5900rpm) - dev 5 is a 4TB Wstern Digital WDC WD40EFRX (5400rpm) Thanks in advance André-Sebastian Liebe -- # btrfs fi sh Label: 'apc01_pool0' uuid: 066141c6-16ca-4a30-b55c-e606b90ad0fb Total devices 5 FS bytes used 14.21TiB devid1 size 3.64TiB used 2.86TiB path /dev/sdd devid2 size 3.64TiB used 2.86TiB path /dev/sdc devid3 size 3.64TiB used 2.86TiB path /dev/sdf devid4 size 3.64TiB used 2.86TiB path /dev/sde devid5 size 3.64TiB used 2.88TiB path /dev/sdb Btrfs v3.14.2-dirty # btrfs fi df /data/pool0/ Data, single: total=14.28TiB, used=14.19TiB System, RAID1: total=8.00MiB, used=1.54MiB Metadata, RAID1: total=26.00GiB, used=20.20GiB unknown, single: total=512.00MiB, used=0.00 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qgroup destroy / assign
Wang, Yes that certainly helps me make more sense of it. I was able to get the qgroup assigning to work properly. I guess the next question would be if it would be a valid feature to implement automatic qgroup deletion when a subvolume is destroyed. I suppose in order to help alleviate issues with that perhaps it may also be useful to require user created qgroups to be at least level 1. that way it would be trivial to detect qgroups that were created for subvolumes, as they would all be level 0. I don't think this would cause any issues since you can't assign a subvolume to another qgroup from what I can tell, only a qgroup to another qgroup. -Kevin On 07/06/2014 08:57 PM, Wang Shilong wrote: > Hi Kevin, > > On 07/05/2014 05:10 AM, Kevin Brandstatter wrote: >> how are qgroups accounted for? Are they specifially tied to one >> subvolume on creation? > Qgroup implementation is aslo a little confusing for me at first:-) . > > Yes, a qgroup is created automatically tied to one subvolume on creation > with the same objectid. > > To implement qgroup group, you may want to do something like following: > > [1/1] > / \ >/ \ > sub1(5) subv2(257) > >> >> If so, is it possible to auto delete relavant qgroups on deletion of the >> subvolume? > I supposed so, according to latest qgroup patches flighting on, a > subvolume > qgroup should be destroyed safely, when it finished sub-tree space > accounting. > >> >> also, how exactly does qgroup assign work? I havent been able to get it >> to work at all. >> in btrfsprogs cmds-cgroup.c >> if ((args.src >> 48) >= (args.dst >> 48)) { >>fprintf(stderr, "ERROR: bad relation requested '%s'\n", path); >>return 1; >> } > Oh, this is to implement a strict level qgroup group, which means a > u64 is > divided into two parts, 16bits for level and the rest for id. > > So we ask parent qgroup's level must be greater than child's qgroup.that > is the code you see above. > > You could create a qgroup relation like this: > > # btrfs qgroup assign 256 1/1 > > Hopely, this could help you. >> always seems to fail. I tried creating another qgroup id 1000, and >> assigning it to as sub, and vice versa, as well as assigning the sub to >> the root, and vice versa, as well as one subvol to another. >> The fixme comment leads me to believe that the src should be a path not >> a qgroup ("FIXME src should accept subvol path") >> but the progs let me create a qgroup without a subvol, which makes sense >> if you want to be able to have some meta-qgroup for a bunch of subvols. >> Further on noticing that a sub create also creates a qgroup with the >> same id as the subvol, it would seem that the qgroup is tied to the >> subvol via this shared id. >> >> -Kevin Brandstatter >> -- >> To unsubscribe from this list: send the line "unsubscribe >> linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: set error return value in btrfs_get_blocks_direct
We were returning with 0 (success) because we weren't extracting the error code from em (PTR_ERR(em)). Fix it. Signed-off-by: Filipe Manana --- fs/btrfs/inode.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 6b65fab..8a946c0 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6998,8 +6998,10 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, block_start, len, orig_block_len, ram_bytes, type); - if (IS_ERR(em)) + if (IS_ERR(em)) { + ret = PTR_ERR(em); goto unlock_err; + } } ret = btrfs_add_ordered_extent_dio(inode, start, -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 7/9] btrfs: fix null pointer dereference in clone_fs_devices when name is null
On 07/07/2014 12:22, Miao Xie wrote: On Mon, 7 Jul 2014 12:04:09 +0800, Anand Jain wrote: when one of the device path is missing btrfs_device name is null. So this patch will check for that. stack: BUG: unable to handle kernel NULL pointer dereference at 0010 IP: [] strlen+0x0/0x30 [] ? clone_fs_devices+0xaa/0x160 [btrfs] [] btrfs_init_new_device+0x317/0xca0 [btrfs] [] ? __kmalloc_track_caller+0x15a/0x1a0 [] btrfs_ioctl+0xaa3/0x2860 [btrfs] [] ? handle_mm_fault+0x48c/0x9c0 [] ? __blkdev_put+0x171/0x180 [] ? __do_page_fault+0x4ac/0x590 [] ? blkdev_put+0x106/0x110 [] ? mntput+0x35/0x40 [] do_vfs_ioctl+0x460/0x4a0 [] ? fput+0xe/0x10 [] ? task_work_run+0xb3/0xd0 [] SyS_ioctl+0x57/0x90 [] ? do_page_fault+0xe/0x10 [] system_call_fastpath+0x16/0x1b reproducer: mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2 btrfstune -S 1 /dev/sdg1 modprobe -r btrfs && modprobe btrfs mount -o degraded /dev/sdg1 /btrfs btrfs dev add /dev/sdg3 /btrfs Signed-off-by: Anand Jain Signed-off-by: Miao Xie --- Changelog v1->v2: - Fix the problem that we forgot to set the missing flag for the cloned device --- fs/btrfs/volumes.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1891541..4731bd6 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -598,16 +598,23 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig) if (IS_ERR(device)) goto error; -/* - * This is ok to do without rcu read locked because we hold the - * uuid mutex so nothing we touch in here is going to disappear. - */ -name = rcu_string_strdup(orig_dev->name->str, GFP_NOFS); -if (!name) { -kfree(device); -goto error; +if (orig_dev->missing) { +device->missing = 1; +fs_devices->missing_devices++; as mentioned in some places we just check name (for missing device) and don't set the missing flag so it better to .. if (orig_dev->missing || !orig_dev->name) { device->missing = 1; fs_devices->missing_devices++; I don't think we need check name pointer here because only missing device doesn't have its own name. Or there is something wrong in the code, so I add assert in else branch. Am I right? At few critical code, the below and I guess in the chunk/strips function as well, we don't make use of missing flag, but rather ->name. - btrfsic_process_superblock :: if (!device->bdev || !device->name) continue; - But here without !orig_dev->name check, is also good enough. Thanks, Anand +} else { +ASSERT(orig_dev->name); +/* + * This is ok to do without rcu read locked because + * we hold the uuid mutex so nothing we touch in here + * is going to disappear. + */ +name = rcu_string_strdup(orig_dev->name->str, GFP_NOFS); +if (!name) { +kfree(device); +goto error; +} +rcu_assign_pointer(device->name, name); } -rcu_assign_pointer(device->name, name); list_add(&device->dev_list, &fs_devices->devices); device->fs_devices = fs_devices; Thanks, Anand . -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs-progs: Add mount point check for 'btrfs fi df' command
On Fri, Jul 04, 2014 at 03:52:26PM +0200, David Sterba wrote: On Fri, Jul 04, 2014 at 04:38:49PM +0800, Qu Wenruo wrote: 'btrfs fi df' command is currently able to be executed on any file/dir inside btrfs since it uses btrfs ioctl to get disk usage info. However it is somewhat confusing for some end users since normally such command should only be executed on a mount point. I disagree here, it's much more convenient to run 'fi df' anywhere and get the output. The system 'df' command works the same way. Just to clarify, in case my earlier mail did not convey the idea properly. The basic difference between traditional df & btrfs fi df is that traditional df does not errors out when no arg is given & outputs all the mounted FSes with their mount points. So to be consistent, btrfs fi df should output all BTRFSes with mount points if no arg is given. Btrfs fi df insists for an arg but does not clarifies in its output if the given arg is a path inside of a mount point or is the mount point itself, which can become transparent, if the mount point is also shown in the output. This is a just a request & a pointer to an oversight/anomaly but if the developers do not feel in resonance with it right now then I just wish that they keep it in mind, think about it & remove this confusion caused by btrfs fi df as,when & how they feel fit. The 'fi df' command itself is not that user friendly and the numbers need further interpretation. I'm using it heavily during debugging and restricting it to the mountpoint seems too artifical, the tool can cope with that. The 'fi usage' is supposed to give the user-friendly overview, but the patchset is stuck because I found the numbers wrong or misleading under some circumstances. I'll reread the thread that motivated this patch to see if there's something to address. Thanks -- vikram... ^^'^^||root||^^^'''^^ // \\ )) //(( \\// \\ // /\\ || \\ || / )) ((\\ -- Rule of Life #1 -- Never get separated from your luggage. -- _ ~|~ = -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs: fix null pointer dereference in clone_fs_devices when name is null
It's a pity that the patch has been merged into the upstream kernel. Let's correct our miss before the next merge. What I found were new-bugs, those are not related to this patch. BTW, I sent some patches to fix the problems about seed device(including the updated patch of this one), could you try them and confirm that they can fix the problems you said above or not? [PATCH V2 7/9] btrfs: fix null pointer dereference in clone_fs_devices when name is null [PATCH 8/9] Btrfs: fix unzeroed members in fs_devices when creating a fs from seed fs [PATCH 9/9] Btrfs: fix writing data into the seed filesystem This first one is the updated patch of this one. With 8,9/9 it fixes the new-bugs as well. Thanks. Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs: fix null pointer dereference in clone_fs_devices when name is null
It's a pity that the patch has been merged into the upstream kernel. Let's correct our miss before the next merge. What I found were new-bugs, those are not related to this patch. BTW, I sent some patches to fix the problems about seed device(including the updated patch of this one), could you try them and confirm that they can fix the problems you said above or not? [PATCH V2 7/9] btrfs: fix null pointer dereference in clone_fs_devices when name is null [PATCH 8/9] Btrfs: fix unzeroed members in fs_devices when creating a fs from seed fs [PATCH 9/9] Btrfs: fix writing data into the seed filesystem This first one is the updated patch of this one. With 8,9/9 it fixes the new-bugs as well. Thanks. Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs loopback problems
On Mon, 7 Jul 2014 11:20:30 AM Qu Wenruo wrote: > As Chris Mason mentioned, fixed in the following patch: > https://patchwork.kernel.org/patch/4143821/ That should probably go to -stable (if it hasn't already), especially as 3.14 is a new LTS kernel. cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC signature.asc Description: This is a digitally signed message part.