Re: [PATCH] fstests: common/rc: fix device still mounted error with SCRATCH_DEV_POOL
On Mon, Jan 15, 2018 at 11:10:20PM -0800, Liu Bo wrote: > On Mon, Jan 15, 2018 at 02:22:28PM +0800, Eryu Guan wrote: > > On Fri, Jan 12, 2018 at 06:04:59PM -0700, Liu Bo wrote: > > > One of btrfs tests, btrfs/011, uses SCRATCH_DEV_POOL and puts a > > > non-SCRATCH_DEV > > > device as the first one when doing mkfs, and this makes > > > _require_scratch{_nocheck} fail to umount $SCRATCH_MNT since it checks > > > mount > > > point with SCRATCH_DEV only, and for sure it finds nothing to umount and > > > the > > > following tests complain about 'device still mounted' alike errors. > > > > > > Introduce a helper to address this special case where both btrfs and > > > scratch > > > dev pool are in use. > > > > > > Signed-off-by: Liu Bo> > > > Hmm, I didn't see this problem, I ran btrfs/011 then another tests that > > uses $SCRATCH_DEV, and the second test ran fine too. Can you please > > provide more details? > > Sure, so I was using 4 devices of size being 2500M, btrfs/011 bailed > out when doing a cp due to enospc then _fail is called to abort the > test, and the mount point now is associated with a different device > other than SCRATCH_DEV, so that _require_scratch_nocheck in btrfs/012 > was not able to umount SCRATCH_MNT. Yeah, that's the exact case I described as below. I think adding _scratch_umount >/dev/null 2>&1 in _cleanup() would resolve your issue. > > > > > Anyway, I think we should fix btrfs/011 to either not use $SCRATCH_DEV > > in replace operations (AFAIK, other btrfs replace tests do this) or > > umount all devices before exit. And I noticed btrfs/011 does umount > > $SCRATCH_MNT at the end of workout(), so usually all should be fine > > (perhaps it would leave a device mounted if interrupted in the middle of > > test run, because _cleanup() doesn't do umount). > > That's true, if you want, I could fix all btrfs replace tests to > umount SCRATCH_MNT right before exit. I think only the tests that replace $SCRATCH_DEV (as what btrfs/011 does) need fixes, _require_scratch would umount $SCRATCH_MNT for other tests. Thanks, Eryu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] btrfs: Remove btrfs_inode::delayed_iput_count
delayed_iput_count wa supposed to be used to implement, well, delayed iput. The idea is that we keep accumulating the number of iputs we do until eventually the inode is deleted. Turns out we never really switched the delayed_iput_count from 0 to 1, hence all conditional code relying on the value of that member being different than 0 was never executed. This, as it turns out, didn't cause any problem due to the simple fact that the generic inode's i_count member was always used to count the number of iputs. So let's just remove the unused member and all unused code. This patch essentially provides no functional changes. While at it, also add proper documentation for btrfs_add_delayed_iput Signed-off-by: Nikolay Borisov--- v2: Add function documentation to make it clear how delayed_iput works and uses vfs_inode::i_count fs/btrfs/btrfs_inode.h | 1 - fs/btrfs/inode.c | 26 -- 2 files changed, 12 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 63f0ccc92a71..f527e99c9f8d 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -195,7 +195,6 @@ struct btrfs_inode { /* Hook into fs_info->delayed_iputs */ struct list_head delayed_iput; - long delayed_iput_count; /* * To avoid races between lockless (i_mutex not held) direct IO writes diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 029399593049..7f568b05b8fd 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3243,6 +3243,15 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio, start, (size_t)(end - start + 1)); } +/* btrfs_add_delayed_iput - perform a delayed iput on @inode + * + * @inode: The inode we want to perform iput on + * + * This function uses the generic vfs_inode::i_count to track whether we + * should just decrement it (in case it's > 1) or if this is the last + * iput then link the inode to the delayed iput machinery. Delayed iputs + * are processed at transaction commit time/superblock commit/cleaner kthread + */ void btrfs_add_delayed_iput(struct inode *inode) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); @@ -3252,12 +3261,8 @@ void btrfs_add_delayed_iput(struct inode *inode) return; spin_lock(_info->delayed_iput_lock); - if (binode->delayed_iput_count == 0) { - ASSERT(list_empty(>delayed_iput)); - list_add_tail(>delayed_iput, _info->delayed_iputs); - } else { - binode->delayed_iput_count++; - } + ASSERT(list_empty(>delayed_iput)); + list_add_tail(>delayed_iput, _info->delayed_iputs); spin_unlock(_info->delayed_iput_lock); } @@ -3270,13 +3275,7 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info) inode = list_first_entry(_info->delayed_iputs, struct btrfs_inode, delayed_iput); - if (inode->delayed_iput_count) { - inode->delayed_iput_count--; - list_move_tail(>delayed_iput, - _info->delayed_iputs); - } else { - list_del_init(>delayed_iput); - } + list_del_init(>delayed_iput); spin_unlock(_info->delayed_iput_lock); iput(>vfs_inode); spin_lock(_info->delayed_iput_lock); @@ -9424,7 +9423,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) ei->dir_index = 0; ei->last_unlink_trans = 0; ei->last_log_commit = 0; - ei->delayed_iput_count = 0; spin_lock_init(>lock); ei->outstanding_extents = 0; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: common/rc: fix device still mounted error with SCRATCH_DEV_POOL
On Mon, Jan 15, 2018 at 02:22:28PM +0800, Eryu Guan wrote: > On Fri, Jan 12, 2018 at 06:04:59PM -0700, Liu Bo wrote: > > One of btrfs tests, btrfs/011, uses SCRATCH_DEV_POOL and puts a > > non-SCRATCH_DEV > > device as the first one when doing mkfs, and this makes > > _require_scratch{_nocheck} fail to umount $SCRATCH_MNT since it checks mount > > point with SCRATCH_DEV only, and for sure it finds nothing to umount and the > > following tests complain about 'device still mounted' alike errors. > > > > Introduce a helper to address this special case where both btrfs and scratch > > dev pool are in use. > > > > Signed-off-by: Liu Bo> > Hmm, I didn't see this problem, I ran btrfs/011 then another tests that > uses $SCRATCH_DEV, and the second test ran fine too. Can you please > provide more details? Sure, so I was using 4 devices of size being 2500M, btrfs/011 bailed out when doing a cp due to enospc then _fail is called to abort the test, and the mount point now is associated with a different device other than SCRATCH_DEV, so that _require_scratch_nocheck in btrfs/012 was not able to umount SCRATCH_MNT. > > Anyway, I think we should fix btrfs/011 to either not use $SCRATCH_DEV > in replace operations (AFAIK, other btrfs replace tests do this) or > umount all devices before exit. And I noticed btrfs/011 does umount > $SCRATCH_MNT at the end of workout(), so usually all should be fine > (perhaps it would leave a device mounted if interrupted in the middle of > test run, because _cleanup() doesn't do umount). That's true, if you want, I could fix all btrfs replace tests to umount SCRATCH_MNT right before exit. thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommendations for balancing as part of regular maintenance?
On Mon, Jan 15, 2018 at 11:23 AM, Tom Worsterwrote: > On 13 Jan 2018, at 17:09, Chris Murphy wrote: > >> On Fri, Jan 12, 2018 at 11:24 AM, Austin S. Hemmelgarn >> wrote: >> >> >>> To that end, I propose the following text for the FAQ: >>> >>> Q: Do I need to run a balance regularly? >>> >>> A: While not strictly necessary for normal operations, running a filtered >>> balance regularly can help prevent your filesystem from ending up with >>> ENOSPC issues. The following command run daily on each BTRFS volume >>> should >>> be more than sufficient for most users: >>> >>> `btrfs balance start -dusage=25 -dlimit=2..10 -musage=25 -mlimit=2..10` >> >> >> >> Daily? Seems excessive. >> >> I've got multiple Btrfs file systems that I haven't balanced, full or >> partial, in a year. And I have no problems. One is a laptop which >> accumulates snapshots until roughly 25% free space remains and then >> most of the snapshots are deleted, except the most recent few, all at >> one time. I'm not experiencing any problems so far. The other is a NAS >> and it's multiple copies, with maybe 100-200 snapshots. One backup >> volume is 99% full, there's no more unallocated free space, I delete >> snapshots only to make room for btrfs send receive to keep pushing the >> most recent snapshot from the main volume to the backup. Again no >> problems. >> >> I really think suggestions this broad are just going to paper over >> bugs or design flaws, we won't see as many bug reports and then real >> problems won't get fixed. > > > This is just an answer to a FAQ. This is not Austin or anyone else trying to > telling you or anyone else that you should do this. It should be clear that > there is an implied caveat along the lines of: "There are other ways to > manage allocation besides regular balancing. This recommendation is a > For-Dummies-kinda default that should work well enough if you don't have > another strategy better adapted to your situation." If this implication is > not obvious enough then we can add something explicit. It's an upstream answer to a frequently asked question. It's rather official, or about as close as it gets to it. > > >> I also thing the time based method is too subjective. What about the >> layout means a balance is needed? And if it's really a suggestion, why >> isn't there a chron or systemd unit that just does this for the user, >> in btrfs-progs, working and enabled by default? > > > As a newcomer to BTRFS, I was astonished to learn that it demands each user > figure out some workaround for what is, in my judgement, a required but > missing feature, i.e. a defect, a bug. At present the docs are pretty > confusing for someone trying to deal with it on their own. > > Unless some better fix is in the works, this _should_ be a systemd unit or > something. Until then, please put it in FAQ. At least openSUSE has a systemd unit for a long time now, but last time I checked (a bit over a year ago) it's disabled by default. Why? And insofar as I'm aware, openSUSE users aren't having big problems related to lack of balancing, they have problems due to the lack of balancing combined with schizo snapper defaults, which are these days masked somewhat by turning on quotas so snapper can be more accurate about cleaning up. Basically the scripted balance tells me two things: a. Something is broken (still) b. None of the developers has time to investigate coherent bug reports about a. and fix/refine it. And therefore papering over the problem is all we have. Basically it's a sledgehammer approach. The main person working on enoscp stuff is Josef so I'd run this by him and make sure this papering over bugs is something he agrees with. > > >> I really do not like >> all this hand holding of Btrfs, it's not going to make it better. > > > Maybe it won't but, absent better proposals, and given the nature of the > problem, this kind of hand-holding is only fair to the user. This is hardly the biggest gotcha with Btrfs. I'm fine with the idea of papering over design flaws and long standing bugs with user space work arounds. I just want everyone on the same page about it, so it's not some big surprise it's happening. As far as I know, none of the developers regularly looks at the Btrfs wiki. And I think the best way of communicating: a. this is busted, and it sucks b. here's a proposed user space work around, so users aren't so pissed off. Is to try and get it into btrfs-progs, and enabled by default, because that will get in front of at least one developer. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: invalid files names, btrfs check can't repair it
On 2018年01月16日 12:51, Qu Wenruo wrote: > Now the problems are all located: > > For file "2f3f379b2a3d7499471edb74869efe-1948311.d", it's the problem of > its DIR_ITEM has wrong type: > > -- > item 14 key (57648595 DIR_ITEM 3363354030) itemoff 3053 itemsize 70 > location key (57923894 INODE_ITEM 0) type DIR_ITEM.33 > ^^^ > -- > > There is unexpected type DIR_ITEM and a special number 33 here. > > Despite that, the file is completely fine. > > > For file "454bf066ddfbf42e0f3b77ea71c82f-878732.o" > The problem is its namelen. > > -- > item 13 key (57648595 DIR_ITEM 3331247447) itemoff 3123 itemsize 69 > location key (58472210 INODE_ITEM 0) type FILE > transid 89418 data_len 0 name_len 8231 > Insane > name: 454bf066ddfbf42e0f3b77ea71c82f-878732.oq > -- > > Despite that, it should be fine. > > I'm not 100% sure if repair can really handle it well. > But I could craft a temporary fix based on btrfs-corrupt-block (I know > the name is scary). > And you may need to compile btrfs-progs with my patch. I just assume it's fs_root, and pushed the hard-coded fix branch to my github: https://github.com/adam900710/btrfs-progs/tree/hard_coded_fix_for_sebastian Usage: ./btrfs-corrupt-block -X Just as commit message says, if anything went wrong, it will not touch the fs at all. So it should be somewhat safe to use. And if something went wrong, it will cause backtrace and abort, it's designed and you don't need to panic: Example: No dir_item in my btrfs -- ./btrfs-corrupt-block -X /dev/data/btrfs ERROR: corrupted DIR_ITEM not found extent buffer leak: start 4227072 len 16384 extent_io.c:607: free_extent_buffer_internal: BUG_ON `eb->flags & EXTENT_DIRTY` triggered, value 1 ./btrfs-corrupt-block(+0x251df)[0x5623e003a1df] ./btrfs-corrupt-block(free_extent_buffer_nocache+0x1f)[0x5623e003aac1] ./btrfs-corrupt-block(extent_io_tree_cleanup+0x6d)[0x5623e003ab33] ./btrfs-corrupt-block(btrfs_cleanup_all_caches+0x76)[0x5623e0027747] ./btrfs-corrupt-block(close_ctree_fs_info+0x111)[0x5623e0028027] ./btrfs-corrupt-block(main+0x3f5)[0x5623e00551df] /usr/lib/libc.so.6(__libc_start_main+0xea)[0x7fdce4e0ff4a] ./btrfs-corrupt-block(_start+0x2a)[0x5623e001ee3a] Aborted (core dumped) -- And if above error happens, please paste the error output and provide the subvolume id. Thanks, Qu > > The only remaining thing I need is the subvolume id which contains the > corrupted files. > > Since there is no other hit, I assume it's root subvolume (5), but I > still need the extra confirm since the fix will be hard-coded. > > Thanks, > Qu > signature.asc Description: OpenPGP digital signature
Re: invalid files names, btrfs check can't repair it
Now the problems are all located: For file "2f3f379b2a3d7499471edb74869efe-1948311.d", it's the problem of its DIR_ITEM has wrong type: -- item 14 key (57648595 DIR_ITEM 3363354030) itemoff 3053 itemsize 70 location key (57923894 INODE_ITEM 0) type DIR_ITEM.33 ^^^ -- There is unexpected type DIR_ITEM and a special number 33 here. Despite that, the file is completely fine. For file "454bf066ddfbf42e0f3b77ea71c82f-878732.o" The problem is its namelen. -- item 13 key (57648595 DIR_ITEM 3331247447) itemoff 3123 itemsize 69 location key (58472210 INODE_ITEM 0) type FILE transid 89418 data_len 0 name_len 8231 Insane name: 454bf066ddfbf42e0f3b77ea71c82f-878732.oq -- Despite that, it should be fine. I'm not 100% sure if repair can really handle it well. But I could craft a temporary fix based on btrfs-corrupt-block (I know the name is scary). And you may need to compile btrfs-progs with my patch. The only remaining thing I need is the subvolume id which contains the corrupted files. Since there is no other hit, I assume it's root subvolume (5), but I still need the extra confirm since the fix will be hard-coded. Thanks, Qu signature.asc Description: OpenPGP digital signature
Re: [PATCH] btrfs-progs: ins: fix arg order in print_inode_item()
Hi David, Would you please queue this patch to devel branch? This is a small enough, but quite important fix when handling dump tree output. Thanks, Qu On 2017年10月30日 16:20, Qu Wenruo wrote: > > > On 2017年10月30日 16:10, Misono, Tomohiro wrote: >> In the print_inode_item(), the argument order of sequence and flags are >> reversed: >> >> printf("... sequence %llu flags 0x%llx(%s)\n", >> ... >> (unsigned long long)btrfs_inode_flags(eb,ii), >> (unsigned long long)btrfs_inode_sequence(eb, ii), >> ...) >> >> So, just fix it. >> >> Signed-off-by: Tomohiro Misono> > Reviewed-by: Qu Wenruo > > Thanks, > Qu > >> --- >> print-tree.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/print-tree.c b/print-tree.c >> index 3c585e3..8abd760 100644 >> --- a/print-tree.c >> +++ b/print-tree.c >> @@ -896,8 +896,8 @@ static void print_inode_item(struct extent_buffer *eb, >> btrfs_inode_uid(eb, ii), >> btrfs_inode_gid(eb, ii), >> (unsigned long long)btrfs_inode_rdev(eb,ii), >> - (unsigned long long)btrfs_inode_flags(eb,ii), >> (unsigned long long)btrfs_inode_sequence(eb, ii), >> + (unsigned long long)btrfs_inode_flags(eb,ii), >> flags_str); >> print_timespec(eb, btrfs_inode_atime(ii), "\t\tatime ", "\n"); >> print_timespec(eb, btrfs_inode_ctime(ii), "\t\tctime ", "\n"); >> > signature.asc Description: OpenPGP digital signature
Re: big volumes only work reliable with ssd_spread
Stefan Priebe - Profihost AG posted on Mon, 15 Jan 2018 10:55:42 +0100 as excerpted: > since around two or three years i'm using btrfs for incremental VM > backups. > > some data: > - volume size 60TB > - around 2000 subvolumes > - each differential backup stacks on top of a subvolume > - compress-force=zstd > - space_cache=v2 > - no quote / qgroup > > this works fine since Kernel 4.14 except that i need ssd_spread as an > option. If i do not use ssd_spread i always end up with very slow > performance and a single kworker process using 100% CPU after some days. > > With ssd_spread those boxes run fine since around 6 month. Is this > something expected? I haven't found any hint regarding such an impact. My understanding of the technical details is "limited" as I'm not a dev, and I expect you'll get a more technically accurate response later, but sometimes a first not particularly technical response can be helpful as long as it's not /wrong/. (And if it is this is a good way to have my understanding corrected as well. =:^) With that caveat, based on my understanding of what I've seen on-list... The kernel v4.14 ssd mount-option changes apparently primarily affected data, not metadata. Apparently, ssd_spread has a heavier metadata effect, and the v4.14 changes moved additional (I believe metadata) functionality to ssd-spread that had originally been part of ssd as well. There has been some discussion of metadata tweaks similar to those in 4.14 for the ssd option with data, but they weren't deemed as demonstrably needed as the ssd option tweaks and needed further discussion, so were put off until the effect of the 4.14 tweaks could be gauged in more widespread use, after which they were to be reconsidered, if necessary. Meanwhile, in the discussion I saw, Chris Mason mentioned that Facebook is using ssd-spread for various reasons there, so it's well-tested with their deployments, which I'd assume have many of the same qualities yours do, thus implying that your observations about ssd-spread are no accident. In fact, if I interpreted Chris's comments correctly, they use ssd_spread on very large multi-layered non-ssd storage arrays, in part because the larger layout-alignment optimizations make sense there as well as on ssds. That would appear to be precisely what you are seeing. =:^) If that's the case, then arguably the option is misnamed and the ssd_spread name may well at some point be deprecated in favor of something more descriptive of its actual function and target devices. Purely my own speculation here, but perhaps something like vla_spread (very-large- array)? -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to make a cache directory nodatacow while also excluded from snapshots?
16.01.2018 00:56, Dave пишет: > I want to exclude my ~/.cache directory from snapshots. The obvious > way to do this is to mount a btrfs subvolume at that location. > > However, I also want the ~/.cache directory to be nodatacow. Since the > parent volume is COW, I believe it isn't possible to mount the > subvolume with different mount options. > > What's the solution for achieving both of these goals? > > I tried this without success: > > chattr +C ~/.cache > > Since ~/.cache is a btrfs subvolume, apparently that doesn't work. > > lsattr ~/.cache > > returns nothing. Try creating file under ~/.cache and check its attributes. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Neujahrsspende von 4.800.000 €
-- Hallo, Sie haben eine Spende von 4.800.000,00 Euro, ich habe die America Lottery in Amerika im Wert von 40 Millionen Dollar gewonnen und ich gebe einen Teil davon an fünf glückliche Menschen und Wohltätigkeits-Häuser in Erinnerung an meine verstorbene Frau, die an Krebs gestorben ist. Kontaktieren Sie mich für weitere Details:(tomcrist2...@gmail.com) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
how to make a cache directory nodatacow while also excluded from snapshots?
I want to exclude my ~/.cache directory from snapshots. The obvious way to do this is to mount a btrfs subvolume at that location. However, I also want the ~/.cache directory to be nodatacow. Since the parent volume is COW, I believe it isn't possible to mount the subvolume with different mount options. What's the solution for achieving both of these goals? I tried this without success: chattr +C ~/.cache Since ~/.cache is a btrfs subvolume, apparently that doesn't work. lsattr ~/.cache returns nothing. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/4] btrfs: cleanup btrfs_mount() using btrfs_mount_root()
On Fri, Jan 12, 2018 at 06:14:40PM +0800, Anand Jain wrote: > > Misono, > > This change is causing subsequent (subvol) mount to fail when device > option is specified. The simplest eg for failure is .. > mkfs.btrfs -qf /dev/sdc /dev/sdb > mount -o device=/dev/sdb /dev/sdc /btrfs > mount -o device=/dev/sdb /dev/sdc /btrfs1 >mount: /dev/sdc is already mounted or /btrfs1 busy > >Looks like > blkdev_get_by_path() <-- is failing. > btrfs_scan_one_device() > btrfs_parse_early_options() > btrfs_mount() > > Which is due to different holders (viz. btrfs_root_fs_type and > btrfs_fs_type) one is used for vfs_mount and other for scan, > so they form different holders and can't let EXCL open which > is needed for both scan and open. This looks close to what I see in the random test failures. I've reverted your patch "btrfs: optimize move uuid_mutex closer to the critical section" as I bisected to it. The uuid mutex around blkdev_get_path probably protected the concurrent mount and scan so they did not ask for EXCL at the same time. Reverting (or removing the patch from the current misc-next) queue is simpler for me ATM as I want to get to a stable base now, we can add it later if we understand the issue with the mount/scan. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommendations for balancing as part of regular maintenance?
On 13 Jan 2018, at 17:09, Chris Murphy wrote: On Fri, Jan 12, 2018 at 11:24 AM, Austin S. Hemmelgarnwrote: To that end, I propose the following text for the FAQ: Q: Do I need to run a balance regularly? A: While not strictly necessary for normal operations, running a filtered balance regularly can help prevent your filesystem from ending up with ENOSPC issues. The following command run daily on each BTRFS volume should be more than sufficient for most users: `btrfs balance start -dusage=25 -dlimit=2..10 -musage=25 -mlimit=2..10` Daily? Seems excessive. I've got multiple Btrfs file systems that I haven't balanced, full or partial, in a year. And I have no problems. One is a laptop which accumulates snapshots until roughly 25% free space remains and then most of the snapshots are deleted, except the most recent few, all at one time. I'm not experiencing any problems so far. The other is a NAS and it's multiple copies, with maybe 100-200 snapshots. One backup volume is 99% full, there's no more unallocated free space, I delete snapshots only to make room for btrfs send receive to keep pushing the most recent snapshot from the main volume to the backup. Again no problems. I really think suggestions this broad are just going to paper over bugs or design flaws, we won't see as many bug reports and then real problems won't get fixed. This is just an answer to a FAQ. This is not Austin or anyone else trying to telling you or anyone else that you should do this. It should be clear that there is an implied caveat along the lines of: "There are other ways to manage allocation besides regular balancing. This recommendation is a For-Dummies-kinda default that should work well enough if you don't have another strategy better adapted to your situation." If this implication is not obvious enough then we can add something explicit. I also thing the time based method is too subjective. What about the layout means a balance is needed? And if it's really a suggestion, why isn't there a chron or systemd unit that just does this for the user, in btrfs-progs, working and enabled by default? As a newcomer to BTRFS, I was astonished to learn that it demands each user figure out some workaround for what is, in my judgement, a required but missing feature, i.e. a defect, a bug. At present the docs are pretty confusing for someone trying to deal with it on their own. Unless some better fix is in the works, this _should_ be a systemd unit or something. Until then, please put it in FAQ. I really do not like all this hand holding of Btrfs, it's not going to make it better. Maybe it won't but, absent better proposals, and given the nature of the problem, this kind of hand-holding is only fair to the user. Tom -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Remove btrfs_inode::delayed_iput_count
On Mon, Jan 15, 2018 at 10:16:54AM -0700, Edmund Nadolski wrote: > > > On 01/15/2018 05:31 AM, Nikolay Borisov wrote: > > delayed_iput_count wa supposed to be used to implement, well, delayed > > iput. The idea is that we keep accumulating the number of iputs we do > > until eventually the inode is deleted. Turns out we never really > > switched the delayed_iput_count from 0 to 1, hence all conditional > > code relying on the value of that member being different than 0 was > > never executed. This, as it turns out, didn't cause any problem due > > to the simple fact that the generic inode's i_count member was always > > used to count the number of iputs. So let's just remove the unused > > member and all unused code. This patch essentially provides no > > functional changes. > > > > Signed-off-by: Nikolay Borisov> > Since the 8089fe62c6 changelog mentions the need for a count, it might > be nice to include a brief code comment about the i_count effect. Agreed. > Reviewed-by: Edmund Nadolski Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Remove btrfs_inode::delayed_iput_count
On 01/15/2018 05:31 AM, Nikolay Borisov wrote: > delayed_iput_count wa supposed to be used to implement, well, delayed > iput. The idea is that we keep accumulating the number of iputs we do > until eventually the inode is deleted. Turns out we never really > switched the delayed_iput_count from 0 to 1, hence all conditional > code relying on the value of that member being different than 0 was > never executed. This, as it turns out, didn't cause any problem due > to the simple fact that the generic inode's i_count member was always > used to count the number of iputs. So let's just remove the unused > member and all unused code. This patch essentially provides no > functional changes. > > Signed-off-by: Nikolay BorisovSince the 8089fe62c6 changelog mentions the need for a count, it might be nice to include a brief code comment about the i_count effect. Reviewed-by: Edmund Nadolski > --- > fs/btrfs/btrfs_inode.h | 1 - > fs/btrfs/inode.c | 17 +++-- > 2 files changed, 3 insertions(+), 15 deletions(-) > > diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h > index 63f0ccc92a71..f527e99c9f8d 100644 > --- a/fs/btrfs/btrfs_inode.h > +++ b/fs/btrfs/btrfs_inode.h > @@ -195,7 +195,6 @@ struct btrfs_inode { > > /* Hook into fs_info->delayed_iputs */ > struct list_head delayed_iput; > - long delayed_iput_count; > > /* >* To avoid races between lockless (i_mutex not held) direct IO writes > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 029399593049..2225f613516c 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -3252,12 +3252,8 @@ void btrfs_add_delayed_iput(struct inode *inode) > return; > > spin_lock(_info->delayed_iput_lock); > - if (binode->delayed_iput_count == 0) { > - ASSERT(list_empty(>delayed_iput)); > - list_add_tail(>delayed_iput, _info->delayed_iputs); > - } else { > - binode->delayed_iput_count++; > - } > + ASSERT(list_empty(>delayed_iput)); > + list_add_tail(>delayed_iput, _info->delayed_iputs); > spin_unlock(_info->delayed_iput_lock); > } > > @@ -3270,13 +3266,7 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info > *fs_info) > > inode = list_first_entry(_info->delayed_iputs, > struct btrfs_inode, delayed_iput); > - if (inode->delayed_iput_count) { > - inode->delayed_iput_count--; > - list_move_tail(>delayed_iput, > - _info->delayed_iputs); > - } else { > - list_del_init(>delayed_iput); > - } > + list_del_init(>delayed_iput); > spin_unlock(_info->delayed_iput_lock); > iput(>vfs_inode); > spin_lock(_info->delayed_iput_lock); > @@ -9424,7 +9414,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) > ei->dir_index = 0; > ei->last_unlink_trans = 0; > ei->last_log_commit = 0; > - ei->delayed_iput_count = 0; > > spin_lock_init(>lock); > ei->outstanding_extents = 0; > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: common/rc: fix device still mounted error with SCRATCH_DEV_POOL
On Mon, Jan 15, 2018 at 02:22:28PM +0800, Eryu Guan wrote: > On Fri, Jan 12, 2018 at 06:04:59PM -0700, Liu Bo wrote: > > One of btrfs tests, btrfs/011, uses SCRATCH_DEV_POOL and puts a > > non-SCRATCH_DEV > > device as the first one when doing mkfs, and this makes > > _require_scratch{_nocheck} fail to umount $SCRATCH_MNT since it checks mount > > point with SCRATCH_DEV only, and for sure it finds nothing to umount and the > > following tests complain about 'device still mounted' alike errors. > > > > Introduce a helper to address this special case where both btrfs and scratch > > dev pool are in use. > > > > Signed-off-by: Liu Bo> > Hmm, I didn't see this problem, I ran btrfs/011 then another tests that > uses $SCRATCH_DEV, and the second test ran fine too. Can you please > provide more details? > > Anyway, I think we should fix btrfs/011 to either not use $SCRATCH_DEV > in replace operations (AFAIK, other btrfs replace tests do this) or > umount all devices before exit. And I noticed btrfs/011 does umount > $SCRATCH_MNT at the end of workout(), so usually all should be fine > (perhaps it would leave a device mounted if interrupted in the middle of > test run, because _cleanup() doesn't do umount). In my case I saw lots of test failures (btrfs/ 012 068 071 074 116 136 138 152 154 155 ...), some of them repeatedly but not reliably. This could have been triggered by a patch in my testing branch, but I can't tell for sure due to the inaccurate fstest checks. The common problem was that the scratch device appeared as mounted. We discussed that with Bo, I was suspecting some of our changes that could theoretically leave some data in flight after umount. Bo found the potential problems in fstests so I'll redo all the testing again with updated fstests. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs subvolume mount with different options
Thanks, chattr +C is that's what I am currently using. Also you already answered my next question, why it is not possible to set +C attribute on the existing file :) Yours sincerely, Konstantin V. Gavrilenko - Original Message - From: "Roman Mamedov"To: "Konstantin V. Gavrilenko" Cc: "Linux fs Btrfs" Sent: Friday, 12 January, 2018 9:37:49 PM Subject: Re: btrfs subvolume mount with different options On Fri, 12 Jan 2018 17:49:38 + (GMT) "Konstantin V. Gavrilenko" wrote: > Hi list, > > just wondering whether it is possible to mount two subvolumes with different > mount options, i.e. > > | > |- /a defaults,compress-force=lza You can have use different compression algorithms across the filesystem (including none), via "btrfs properties" on directories or subvolumes. They are inherited down the tree. $ mkdir test $ sudo btrfs prop set test compression zstd $ echo abc > test/def $ sudo btrfs prop get test/def compression compression=zstd But it appears this doesn't provide a way to apply compress-force. > |- /b defaults,nodatacow Nodatacow can be applied to any dir/subvolume recursively, or any file (as long as it's created but not written yet) via chattr +C. -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommendations for balancing as part of regular maintenance?
On 2018-01-13 17:09, Chris Murphy wrote: On Fri, Jan 12, 2018 at 11:24 AM, Austin S. Hemmelgarnwrote: To that end, I propose the following text for the FAQ: Q: Do I need to run a balance regularly? A: While not strictly necessary for normal operations, running a filtered balance regularly can help prevent your filesystem from ending up with ENOSPC issues. The following command run daily on each BTRFS volume should be more than sufficient for most users: `btrfs balance start -dusage=25 -dlimit=2..10 -musage=25 -mlimit=2..10` Daily? Seems excessive. For handling of chunks that are only 25% full and capping it at 10 chunks processed each for data and metadata? That's only (assuming I remember the max chunk size correctly) about 15GB of data being moved at the absolute most, and that will likely only happen in pathologically bad cases. In most cases it should be either nothing (in most cases) or about 768MB being shuffled around, and even on traditional hard drives that should complete insanely fast (barring impact from very large numbers of snapshots or use of qgroups). If there are no chunks that match (or only one chunk), this finishes in at most a second with near zero disk I/O. If exactly two match (which should be the common case for most users when it matches at all), it should take at most a few seconds to complete, even on traditional hard drives. If more match, it will of course take longer, but it should be pretty rare that more than two match. Given that, it really doesn't seem all that excessive to me. As a point of comparison, automated X.509 certificate renewal checks via certbot take more resources to perform when there's not a renewal due than this balance command takes when there's nothing to work on, and it's absolutely standard to run the X.509 checks daily despite the fact that weekly checks would still give no worse security (certbot will renew things well before they expire). I've got multiple Btrfs file systems that I haven't balanced, full or partial, in a year. And I have no problems. One is a laptop which accumulates snapshots until roughly 25% free space remains and then most of the snapshots are deleted, except the most recent few, all at one time. I'm not experiencing any problems so far. The other is a NAS and it's multiple copies, with maybe 100-200 snapshots. One backup volume is 99% full, there's no more unallocated free space, I delete snapshots only to make room for btrfs send receive to keep pushing the most recent snapshot from the main volume to the backup. Again no problems. In the first case, you're dealing with a special configuration that makes most of this irrelevant most of the time (as I'm assuming things change _enough_ between snapshots that dumping most of them will completely empty out most of the chunks they were stored in). In the second I'd have to say you've been lucky. I've personally never run a volume that close to full with BTRFS without balancing regularly and not had some kind of issue. I really think suggestions this broad are just going to paper over bugs or design flaws, we won't see as many bug reports and then real problems won't get fixed. So maybe we should fix things so that this is never needed? Yes, it's a workaround for a well known and documented design flaw (and yes, I consider the whole two-level allocator's handling of free space exhaustion to be a design flaw), but I don't see any patches forthcoming to fix it, so if we want to keep users around, we need to provide some way for them to mitigate the problems it can cause (otherwise we won't find any bugs because we won't have any users). I also thing the time based method is too subjective. What about the layout means a balance is needed? And if it's really a suggestion, why isn't there a chron or systemd unit that just does this for the user, in btrfs-progs, working and enabled by default? I really do not like all this hand holding of Btrfs, it's not going to make it better. For a filesystem you really have two generic possibilities for use cases: 1. It's designed for general purpose usage. Doesn't really excel at any thing in particular, but isn't really bad at anything either. 2. It's designed for a very specific use case. Does an amazing job for that particular use case and possibly for some similar ones, and may or may not do a reasonable job for other use cases. Your comments here seem to imply that BTRFS falls under the second case, which is odd since most everything else I've seen implies that BTRFS fits the first case (or is trying to at least). In either case though, you need to provide something to deal with this particular design flaw. In the first case, you _need_ to make it as easy as possible for people who have no understanding of computers to use. While needing balances from time to time is not exactly in-line with that, requiring people to try and judge based on the
[PATCH] btrfs: Remove btrfs_inode::delayed_iput_count
delayed_iput_count wa supposed to be used to implement, well, delayed iput. The idea is that we keep accumulating the number of iputs we do until eventually the inode is deleted. Turns out we never really switched the delayed_iput_count from 0 to 1, hence all conditional code relying on the value of that member being different than 0 was never executed. This, as it turns out, didn't cause any problem due to the simple fact that the generic inode's i_count member was always used to count the number of iputs. So let's just remove the unused member and all unused code. This patch essentially provides no functional changes. Signed-off-by: Nikolay Borisov--- fs/btrfs/btrfs_inode.h | 1 - fs/btrfs/inode.c | 17 +++-- 2 files changed, 3 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 63f0ccc92a71..f527e99c9f8d 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -195,7 +195,6 @@ struct btrfs_inode { /* Hook into fs_info->delayed_iputs */ struct list_head delayed_iput; - long delayed_iput_count; /* * To avoid races between lockless (i_mutex not held) direct IO writes diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 029399593049..2225f613516c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3252,12 +3252,8 @@ void btrfs_add_delayed_iput(struct inode *inode) return; spin_lock(_info->delayed_iput_lock); - if (binode->delayed_iput_count == 0) { - ASSERT(list_empty(>delayed_iput)); - list_add_tail(>delayed_iput, _info->delayed_iputs); - } else { - binode->delayed_iput_count++; - } + ASSERT(list_empty(>delayed_iput)); + list_add_tail(>delayed_iput, _info->delayed_iputs); spin_unlock(_info->delayed_iput_lock); } @@ -3270,13 +3266,7 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info) inode = list_first_entry(_info->delayed_iputs, struct btrfs_inode, delayed_iput); - if (inode->delayed_iput_count) { - inode->delayed_iput_count--; - list_move_tail(>delayed_iput, - _info->delayed_iputs); - } else { - list_del_init(>delayed_iput); - } + list_del_init(>delayed_iput); spin_unlock(_info->delayed_iput_lock); iput(>vfs_inode); spin_lock(_info->delayed_iput_lock); @@ -9424,7 +9414,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) ei->dir_index = 0; ei->last_unlink_trans = 0; ei->last_log_commit = 0; - ei->delayed_iput_count = 0; spin_lock_init(>lock); ei->outstanding_extents = 0; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: invalid files names, btrfs check can't repair it
On 2018-01-15 12:23:05 [+0800], Qu Wenruo wrote: > Right, I'll fix it soon. > > And BTW what makes the the output different from the original one? > > Sebastaian, did you do extra write or other operation to the fs after > previous btrfs check? Well the filesystem is in use but there should be no writtes to it since initial `check' output. The `check' invalidates the space cache that is rebuilt, not sure if this has any effect. Those two magic files are in a subfolder of ccache and ccache shouldn't look into it at all. > Thanks, > Qu Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: Fwd: Question regarding to Btrfs patchwork /2831525
On 2018年01月15日 20:08, Ilan Schwarts wrote: > Thanks for detailed information ! > Its a legacy code for kernel module i maintain.. dont talk to me about > ancient when i need to maintain it to systems like solaris 8 or RHEL4 > 2.6.9 :( Well, that's unfortunate, I mean real unforunate... Despite that, if sticking to device number (dev_t), I think the one in super_block->s_dev won't help much. Especially it can change when btrfs tries to add/delete devices. So it will be a very hard time for you to trace device number for btrfs. Thanks, Qu > > > > On Mon, Jan 15, 2018 at 12:01 PM, Qu Wenruowrote: >> >> >> On 2018年01月15日 17:24, Ilan Schwarts wrote: >>> Qu, >>> Given inode, i get the fsid via: inode->i_sb->s_dev; >>> this return dev_t and not u8/u16 >> >> That's just a device number. >> >> Not really useful in btrfs, since btrfs is a multi-device filesystem. >> >> Thanks, >> Qu >> >>> >>> >>> On Sun, Jan 14, 2018 at 12:44 PM, Qu Wenruo wrote: On 2018年01月14日 18:32, Ilan Schwarts wrote: > Thank you for clarification. > Just 2 quick questions, > 1. Sub volumes - 2 sub volumes cannot have 2 same inode numbers ? They can. So to really locate an inode in btrfs, you need: fsid (locate the fs) -> subvolume id (locate subvolume) -> inode number. fsid can be feteched from superblock as mentioned in previous reply. subvolume id can be get from BTRFS_I(inode)->root. And normally root is what you need. If you really want the number, then either BTRFS_I(inode)->root->objectid or BTRFS_I(inode)->root->root_key->objectid will give you the u64 subvolume id. > 2. Why fsInfo fsid return u8 and the traditional file system return > dev_t, usually 32 integer ? As far as I found in xfs or ext4, their fsid is still u8[16] or uuid_t, same as btrfs. For ext4 it's ext4_super_block->s_uuid[16] And for xfs, it's xfs_sb->sb_uuid. I don't know how you get the dev_t parameter. Thanks, Qu > > > On Sun, Jan 14, 2018 at 12:22 PM, Qu Wenruo > wrote: >> >> >> On 2018年01月14日 18:13, Ilan Schwarts wrote: >>> both btrfs filesystems will have same fsid ? >>> >>> >>> On Sun, Jan 14, 2018 at 12:06 PM, Ilan Schwarts >>> wrote: But both filesystems will have same fsid? On Jan 14, 2018 12:04, "Nikolay Borisov" wrote: > > > > On 14.01.2018 12:02, Ilan Schwarts wrote: >> First of all, Thanks for response ! >> So if i have 2 btrfs file system on the same machine (not your >> everyday scenario, i know) >> >> Not a problem, the 2 filesystems will have 2 different fsid. >> >> (And it's my everyday scenario, since fstests neeeds TEST_DEV and >> SCRATCH_DEV_POOL) >> >> Lets say a file is created on device A, the file gets inode number X >> is it possible on device B to have inode number X also ? >> or each device has its own Inode number range ? >> >> Forget the mess about device. >> >> Inode is bounded to a filesystem, not bounded to a device. >> >> Just traditional filesytems are normally bounded to a single device. >> (Although even traditional filesystems can have external journal devices) >> >> So there is nothing to do with device at all. >> >> And you can have same inode numbers in different filesystems, but >> BTRFS_I(inode)->root->fs_info will point to different fs_infos, with >> different fsid. >> >> So return to your initial question: >>> both btrfs filesystems will have same fsid ? >> >> No, different filesystems will have different fsid. >> >> (Unless you're SUUUPER lucky to have 2 filesystems with >> same fsid) >> >> Thanks, >> Qu >> >> > > Of course it is possible. Inodes are guaranteed to be unique only > across > filesystem instances. In your case you are going to have 2 fs > instances. > >> >> I need to create unique identifier for a file, I need to understand >> if >> the identifier would be: GlobalFSID_DeviceID_Inode or DeviceID_Inode >> is enough. >> >> Thanks >> >> >> >> >> >> On Sun, Jan 14, 2018 at 11:13 AM, Qu Wenruo >> wrote: >>> >>> >>> On 2018年01月14日 16:33, Ilan Schwarts wrote: Hello btrfs developers/users, I was wondering regarding to fetching the correct fsid on btrfs from the context of a kernel module. >>> >>> There are two IDs for btrfs. (in fact
Re: Fwd: Fwd: Question regarding to Btrfs patchwork /2831525
Thanks for detailed information ! Its a legacy code for kernel module i maintain.. dont talk to me about ancient when i need to maintain it to systems like solaris 8 or RHEL4 2.6.9 :( On Mon, Jan 15, 2018 at 12:01 PM, Qu Wenruowrote: > > > On 2018年01月15日 17:24, Ilan Schwarts wrote: >> Qu, >> Given inode, i get the fsid via: inode->i_sb->s_dev; >> this return dev_t and not u8/u16 > > That's just a device number. > > Not really useful in btrfs, since btrfs is a multi-device filesystem. > > Thanks, > Qu > >> >> >> On Sun, Jan 14, 2018 at 12:44 PM, Qu Wenruo wrote: >>> >>> >>> On 2018年01月14日 18:32, Ilan Schwarts wrote: Thank you for clarification. Just 2 quick questions, 1. Sub volumes - 2 sub volumes cannot have 2 same inode numbers ? >>> >>> They can. >>> >>> So to really locate an inode in btrfs, you need: >>> >>> fsid (locate the fs) -> subvolume id (locate subvolume) -> inode number. >>> >>> fsid can be feteched from superblock as mentioned in previous reply. >>> >>> subvolume id can be get from BTRFS_I(inode)->root. >>> And normally root is what you need. >>> >>> If you really want the number, then either >>> BTRFS_I(inode)->root->objectid or >>> BTRFS_I(inode)->root->root_key->objectid will give you the u64 subvolume id. >>> 2. Why fsInfo fsid return u8 and the traditional file system return dev_t, usually 32 integer ? >>> >>> As far as I found in xfs or ext4, their fsid is still u8[16] or uuid_t, >>> same as btrfs. >>> >>> For ext4 it's ext4_super_block->s_uuid[16] >>> And for xfs, it's xfs_sb->sb_uuid. >>> >>> I don't know how you get the dev_t parameter. >>> >>> Thanks, >>> Qu >>> On Sun, Jan 14, 2018 at 12:22 PM, Qu Wenruo wrote: > > > On 2018年01月14日 18:13, Ilan Schwarts wrote: >> both btrfs filesystems will have same fsid ? >> >> >> On Sun, Jan 14, 2018 at 12:06 PM, Ilan Schwarts wrote: >>> But both filesystems will have same fsid? >>> >>> On Jan 14, 2018 12:04, "Nikolay Borisov" wrote: On 14.01.2018 12:02, Ilan Schwarts wrote: > First of all, Thanks for response ! > So if i have 2 btrfs file system on the same machine (not your > everyday scenario, i know) > > Not a problem, the 2 filesystems will have 2 different fsid. > > (And it's my everyday scenario, since fstests neeeds TEST_DEV and > SCRATCH_DEV_POOL) > > Lets say a file is created on device A, the file gets inode number X > is it possible on device B to have inode number X also ? > or each device has its own Inode number range ? > > Forget the mess about device. > > Inode is bounded to a filesystem, not bounded to a device. > > Just traditional filesytems are normally bounded to a single device. > (Although even traditional filesystems can have external journal devices) > > So there is nothing to do with device at all. > > And you can have same inode numbers in different filesystems, but > BTRFS_I(inode)->root->fs_info will point to different fs_infos, with > different fsid. > > So return to your initial question: >> both btrfs filesystems will have same fsid ? > > No, different filesystems will have different fsid. > > (Unless you're SUUUPER lucky to have 2 filesystems with > same fsid) > > Thanks, > Qu > > Of course it is possible. Inodes are guaranteed to be unique only across filesystem instances. In your case you are going to have 2 fs instances. > > I need to create unique identifier for a file, I need to understand if > the identifier would be: GlobalFSID_DeviceID_Inode or DeviceID_Inode > is enough. > > Thanks > > > > > > On Sun, Jan 14, 2018 at 11:13 AM, Qu Wenruo > wrote: >> >> >> On 2018年01月14日 16:33, Ilan Schwarts wrote: >>> Hello btrfs developers/users, >>> >>> I was wondering regarding to fetching the correct fsid on btrfs from >>> the context of a kernel module. >> >> There are two IDs for btrfs. (in fact more, but you properly won't >> need >> the extra ids) >> >> FSID: Global one, one fs one FSID. >> Device ID: Bonded to device, each device will have one. >> >> So in case of 2 devices btrfs, each device will has its own device >> id, >> while both of the devices have the same fsid. >> >> And I think you're talking about the global fsid instead of device >> id. >> >>> if on suse11.3 kernel 3.0.101-0.47.71-default in order to get fsid,
Re: Fwd: Fwd: Question regarding to Btrfs patchwork /2831525
On 2018年01月15日 17:24, Ilan Schwarts wrote: > Qu, > Given inode, i get the fsid via: inode->i_sb->s_dev; > this return dev_t and not u8/u16 That's just a device number. Not really useful in btrfs, since btrfs is a multi-device filesystem. Thanks, Qu > > > On Sun, Jan 14, 2018 at 12:44 PM, Qu Wenruowrote: >> >> >> On 2018年01月14日 18:32, Ilan Schwarts wrote: >>> Thank you for clarification. >>> Just 2 quick questions, >>> 1. Sub volumes - 2 sub volumes cannot have 2 same inode numbers ? >> >> They can. >> >> So to really locate an inode in btrfs, you need: >> >> fsid (locate the fs) -> subvolume id (locate subvolume) -> inode number. >> >> fsid can be feteched from superblock as mentioned in previous reply. >> >> subvolume id can be get from BTRFS_I(inode)->root. >> And normally root is what you need. >> >> If you really want the number, then either >> BTRFS_I(inode)->root->objectid or >> BTRFS_I(inode)->root->root_key->objectid will give you the u64 subvolume id. >> >>> 2. Why fsInfo fsid return u8 and the traditional file system return >>> dev_t, usually 32 integer ? >> >> As far as I found in xfs or ext4, their fsid is still u8[16] or uuid_t, >> same as btrfs. >> >> For ext4 it's ext4_super_block->s_uuid[16] >> And for xfs, it's xfs_sb->sb_uuid. >> >> I don't know how you get the dev_t parameter. >> >> Thanks, >> Qu >> >>> >>> >>> On Sun, Jan 14, 2018 at 12:22 PM, Qu Wenruo wrote: On 2018年01月14日 18:13, Ilan Schwarts wrote: > both btrfs filesystems will have same fsid ? > > > On Sun, Jan 14, 2018 at 12:06 PM, Ilan Schwarts wrote: >> But both filesystems will have same fsid? >> >> On Jan 14, 2018 12:04, "Nikolay Borisov" wrote: >>> >>> >>> >>> On 14.01.2018 12:02, Ilan Schwarts wrote: First of all, Thanks for response ! So if i have 2 btrfs file system on the same machine (not your everyday scenario, i know) Not a problem, the 2 filesystems will have 2 different fsid. (And it's my everyday scenario, since fstests neeeds TEST_DEV and SCRATCH_DEV_POOL) Lets say a file is created on device A, the file gets inode number X is it possible on device B to have inode number X also ? or each device has its own Inode number range ? Forget the mess about device. Inode is bounded to a filesystem, not bounded to a device. Just traditional filesytems are normally bounded to a single device. (Although even traditional filesystems can have external journal devices) So there is nothing to do with device at all. And you can have same inode numbers in different filesystems, but BTRFS_I(inode)->root->fs_info will point to different fs_infos, with different fsid. So return to your initial question: > both btrfs filesystems will have same fsid ? No, different filesystems will have different fsid. (Unless you're SUUUPER lucky to have 2 filesystems with same fsid) Thanks, Qu >>> >>> Of course it is possible. Inodes are guaranteed to be unique only across >>> filesystem instances. In your case you are going to have 2 fs instances. >>> I need to create unique identifier for a file, I need to understand if the identifier would be: GlobalFSID_DeviceID_Inode or DeviceID_Inode is enough. Thanks On Sun, Jan 14, 2018 at 11:13 AM, Qu Wenruo wrote: > > > On 2018年01月14日 16:33, Ilan Schwarts wrote: >> Hello btrfs developers/users, >> >> I was wondering regarding to fetching the correct fsid on btrfs from >> the context of a kernel module. > > There are two IDs for btrfs. (in fact more, but you properly won't > need > the extra ids) > > FSID: Global one, one fs one FSID. > Device ID: Bonded to device, each device will have one. > > So in case of 2 devices btrfs, each device will has its own device id, > while both of the devices have the same fsid. > > And I think you're talking about the global fsid instead of device id. > >> if on suse11.3 kernel 3.0.101-0.47.71-default in order to get fsid, I >> do the following: >> convert inode struct to btrfs_inode struct (use btrfsInode = >> BTRFS_I(inode)), then from btrfs_inode struct i go to root field, and >> from root i take anon_dev or anon_super.s_dev. >> struct btrfs_inode *btrfsInode; >> btrfsInode = BTRFS_I(inode); >>btrfsInode->root->anon_super.s_devor >>btrfsInode->root->anon_dev- depend on
Re: Fwd: Fwd: Question regarding to Btrfs patchwork /2831525
On 2018年01月15日 17:05, Ilan Schwarts wrote: > Qu, Thank you very much for detailed response. > > I would like to understand something, on VFS, it is guaranteed that in > a given filesystem, only 1 inode number will be used, it is unique.> In > btrfs, you say the inode uniqueness is per volume, each volume has > its own inode space, How is it possible ? Not 100% sure on how VFS should handle an inode, but since each filesystem has its own interfaces to handle inode allocation/drop/evict and etc, I don't believe VFS will be so stupid to just use a u64 inode number to distinguish different inodes (if VFS really needs to). And since each inode is allocated by implementing fs, it's completely fine for two VFS inodes have same inode number. But in fact such two inodes will still be different as their fs-specific inode structures are still different. > > Thats why when I execute "stat /somepath/file" I receive fsid that > looks like "36h/54d" but from kernel code, If I examine struct > fs_info->fsid i get 52. That's not FSID!! That's device! Define what you really need first. For FSID, that should be something in lsblk output like: ├─nvme0n1p1 vfat 5188-EF6C/boot ├─system-root xfs07179caf-b406-4357-8cd8-3268c6238fb6 / This is FSID! ID to identify a fs. And each fs can have their own FSID schema, just as you can see, FAT32 FSID is only 4 bytes, while XFS has 16 bytes FSID. So there is no generic way to get a fsid. And the "false" fsid is just device, just like stat command shows: stat /mnt/btrfs/ File: /mnt/btrfs/ Size: 6 Blocks: 0 IO Block: 4096 directory Device: fe00h/65024dInode: 75732 Links: 2 ^^ And for btrfs, I'm not pretty sure if the device of an inode has any real meaning. > I call this Physical/Virtual, physical is the real id - 52, and > virtual is 36h/54d, because this what btrfs implementation returns.. > from where is that property taken ? Don't call it, check all man pages of stat first. And I really don't understand how you get the wrong understanding of fsid. There is even no string "fsid" in include/linux/fs.h. > > The inode number inode->i_ino is the same from both userspace (stat > ...) and kernel code. > > If on the same filesystem (52) Check the correct man pages. That 52 is your major and minor block device number of the device for the containing fs. >, you say, same inode number can be > used, as long as they are on different volumes - Is it possible ? So in short, yet. > Doesn't it break the VFS inode uniqueness ? No. VFS inode is just part of fs-specific inode structure. In btrfs' case, VFS inode is just btrfs_inode->vfs_inode. So even vfs_inode have same inode number, they are still different inodes. > Is there also a virtual/physical inode numbers ? That's device. And yes it's possible to get that device number. But almost meaningless for btrfs, since btrfs can be across several disks. (This needs to understand btrfs chunk mapping first) > and if so, is it > possible to get from kernel structure ? For what reason? It's not that useful in btrfs. > because inode->i_ino always > return what stat returns.. unlike fsid as i wrote above. I think you should build a correct understanding of what an inode/filesystem is. And most importantly, read newer kernel source instead of some ancient random vendor specified kernel source. Thanks, Qu > > Thanks !! > > > > > > On Sun, Jan 14, 2018 at 12:44 PM, Qu Wenruowrote: >> >> >> On 2018年01月14日 18:32, Ilan Schwarts wrote: >>> Thank you for clarification. >>> Just 2 quick questions, >>> 1. Sub volumes - 2 sub volumes cannot have 2 same inode numbers ? >> >> They can. >> >> So to really locate an inode in btrfs, you need: >> >> fsid (locate the fs) -> subvolume id (locate subvolume) -> inode number. >> >> fsid can be feteched from superblock as mentioned in previous reply. >> >> subvolume id can be get from BTRFS_I(inode)->root. >> And normally root is what you need. >> >> If you really want the number, then either >> BTRFS_I(inode)->root->objectid or >> BTRFS_I(inode)->root->root_key->objectid will give you the u64 subvolume id. >> >>> 2. Why fsInfo fsid return u8 and the traditional file system return >>> dev_t, usually 32 integer ? >> >> As far as I found in xfs or ext4, their fsid is still u8[16] or uuid_t, >> same as btrfs. >> >> For ext4 it's ext4_super_block->s_uuid[16] >> And for xfs, it's xfs_sb->sb_uuid. >> >> I don't know how you get the dev_t parameter. >> >> Thanks, >> Qu >> >>> >>> >>> On Sun, Jan 14, 2018 at 12:22 PM, Qu Wenruo wrote: On 2018年01月14日 18:13, Ilan Schwarts wrote: > both btrfs filesystems will have same fsid ? > > > On Sun, Jan 14, 2018 at 12:06 PM, Ilan Schwarts wrote: >> But both
big volumes only work reliable with ssd_spread
Hello, since around two or three years i'm using btrfs for incremental VM backups. some data: - volume size 60TB - around 2000 subvolumes - each differential backup stacks on top of a subvolume - compress-force=zstd - space_cache=v2 - no quote / qgroup this works fine since Kernel 4.14 except that i need ssd_spread as an option. If i do not use ssd_spread i always end up with very slow performance and a single kworker process using 100% CPU after some days. With ssd_spread those boxes run fine since around 6 month. Is this something expected? I haven't found any hint regarding such an impact. Thanks! Greets, Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: Fwd: Question regarding to Btrfs patchwork /2831525
Qu, Given inode, i get the fsid via: inode->i_sb->s_dev; this return dev_t and not u8/u16 On Sun, Jan 14, 2018 at 12:44 PM, Qu Wenruowrote: > > > On 2018年01月14日 18:32, Ilan Schwarts wrote: >> Thank you for clarification. >> Just 2 quick questions, >> 1. Sub volumes - 2 sub volumes cannot have 2 same inode numbers ? > > They can. > > So to really locate an inode in btrfs, you need: > > fsid (locate the fs) -> subvolume id (locate subvolume) -> inode number. > > fsid can be feteched from superblock as mentioned in previous reply. > > subvolume id can be get from BTRFS_I(inode)->root. > And normally root is what you need. > > If you really want the number, then either > BTRFS_I(inode)->root->objectid or > BTRFS_I(inode)->root->root_key->objectid will give you the u64 subvolume id. > >> 2. Why fsInfo fsid return u8 and the traditional file system return >> dev_t, usually 32 integer ? > > As far as I found in xfs or ext4, their fsid is still u8[16] or uuid_t, > same as btrfs. > > For ext4 it's ext4_super_block->s_uuid[16] > And for xfs, it's xfs_sb->sb_uuid. > > I don't know how you get the dev_t parameter. > > Thanks, > Qu > >> >> >> On Sun, Jan 14, 2018 at 12:22 PM, Qu Wenruo wrote: >>> >>> >>> On 2018年01月14日 18:13, Ilan Schwarts wrote: both btrfs filesystems will have same fsid ? On Sun, Jan 14, 2018 at 12:06 PM, Ilan Schwarts wrote: > But both filesystems will have same fsid? > > On Jan 14, 2018 12:04, "Nikolay Borisov" wrote: >> >> >> >> On 14.01.2018 12:02, Ilan Schwarts wrote: >>> First of all, Thanks for response ! >>> So if i have 2 btrfs file system on the same machine (not your >>> everyday scenario, i know) >>> >>> Not a problem, the 2 filesystems will have 2 different fsid. >>> >>> (And it's my everyday scenario, since fstests neeeds TEST_DEV and >>> SCRATCH_DEV_POOL) >>> >>> Lets say a file is created on device A, the file gets inode number X >>> is it possible on device B to have inode number X also ? >>> or each device has its own Inode number range ? >>> >>> Forget the mess about device. >>> >>> Inode is bounded to a filesystem, not bounded to a device. >>> >>> Just traditional filesytems are normally bounded to a single device. >>> (Although even traditional filesystems can have external journal devices) >>> >>> So there is nothing to do with device at all. >>> >>> And you can have same inode numbers in different filesystems, but >>> BTRFS_I(inode)->root->fs_info will point to different fs_infos, with >>> different fsid. >>> >>> So return to your initial question: both btrfs filesystems will have same fsid ? >>> >>> No, different filesystems will have different fsid. >>> >>> (Unless you're SUUUPER lucky to have 2 filesystems with >>> same fsid) >>> >>> Thanks, >>> Qu >>> >>> >> >> Of course it is possible. Inodes are guaranteed to be unique only across >> filesystem instances. In your case you are going to have 2 fs instances. >> >>> >>> I need to create unique identifier for a file, I need to understand if >>> the identifier would be: GlobalFSID_DeviceID_Inode or DeviceID_Inode >>> is enough. >>> >>> Thanks >>> >>> >>> >>> >>> >>> On Sun, Jan 14, 2018 at 11:13 AM, Qu Wenruo >>> wrote: On 2018年01月14日 16:33, Ilan Schwarts wrote: > Hello btrfs developers/users, > > I was wondering regarding to fetching the correct fsid on btrfs from > the context of a kernel module. There are two IDs for btrfs. (in fact more, but you properly won't need the extra ids) FSID: Global one, one fs one FSID. Device ID: Bonded to device, each device will have one. So in case of 2 devices btrfs, each device will has its own device id, while both of the devices have the same fsid. And I think you're talking about the global fsid instead of device id. > if on suse11.3 kernel 3.0.101-0.47.71-default in order to get fsid, I > do the following: > convert inode struct to btrfs_inode struct (use btrfsInode = > BTRFS_I(inode)), then from btrfs_inode struct i go to root field, and > from root i take anon_dev or anon_super.s_dev. > struct btrfs_inode *btrfsInode; > btrfsInode = BTRFS_I(inode); >btrfsInode->root->anon_super.s_devor >btrfsInode->root->anon_dev- depend on kernel. The most directly method would be: btrfs_inode->root->fs_info->fsid. (For newer kernel, as I'm not familiar with older kernels) Or from superblock: btrfs_inode->root->fs_info->super_copy->fsid. (The most reliable one, no matter
Re: [PATCH 0/7] Misc btrfs-progs cleanups/fixes
On 5.12.2017 10:39, Nikolay Borisov wrote: > Here is a series doing some minor code cleanups, hopefully making the code > more idiomatic and easier to follow. They should be pretty low-risk and > introduce no functional changes (patches 1-5). > > The the last 2 patches deal with a regression of btrfs rescue super-recovery. > Turns out this was broken for sometime. Patch 6 introduces a regression test > which hopefully will prevent further occurences and patch 7 fixes the actual > bug. > > Nikolay Borisov (7): > btrfs-progs: Explictly state test.sh must be executable > btrfs-progs: Factor out common print_device_info > btrfs-progs: Remove recover_get_good_super > btrfs-progs: Use list_for_each_entry in write_dev_all_supers > btrfs-progs: Document logic of btrfs_read_dev_super > btrfs-progs: Add test for super block recovery > btrfs-progs: Fix super-recovery > > chunk-recover.c | 18 --- > disk-io.c| 21 ++-- > super-recover.c | 28 ++- > tests/README.md | 4 +- > tests/fsck-tests/029-superblock-recovery/test.sh | 64 > > utils.c | 18 +++ > utils.h | 3 ++ > 7 files changed, 110 insertions(+), 46 deletions(-) > create mode 100755 tests/fsck-tests/029-superblock-recovery/test.sh Gentle ping since I'd like to get this into next btrfs-progs version, especially the "fix super-recovery" patch. > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: Fwd: Question regarding to Btrfs patchwork /2831525
Qu, Thank you very much for detailed response. I would like to understand something, on VFS, it is guaranteed that in a given filesystem, only 1 inode number will be used, it is unique. In btrfs, you say the inode uniqueness is per volume, each volume has its own inode space, How is it possible ? Thats why when I execute "stat /somepath/file" I receive fsid that looks like "36h/54d" but from kernel code, If I examine struct fs_info->fsid i get 52. I call this Physical/Virtual, physical is the real id - 52, and virtual is 36h/54d, because this what btrfs implementation returns.. from where is that property taken ? The inode number inode->i_ino is the same from both userspace (stat ...) and kernel code. If on the same filesystem (52), you say, same inode number can be used, as long as they are on different volumes - Is it possible ? Doesn't it break the VFS inode uniqueness ? Is there also a virtual/physical inode numbers ? and if so, is it possible to get from kernel structure ? because inode->i_ino always return what stat returns.. unlike fsid as i wrote above. Thanks !! On Sun, Jan 14, 2018 at 12:44 PM, Qu Wenruowrote: > > > On 2018年01月14日 18:32, Ilan Schwarts wrote: >> Thank you for clarification. >> Just 2 quick questions, >> 1. Sub volumes - 2 sub volumes cannot have 2 same inode numbers ? > > They can. > > So to really locate an inode in btrfs, you need: > > fsid (locate the fs) -> subvolume id (locate subvolume) -> inode number. > > fsid can be feteched from superblock as mentioned in previous reply. > > subvolume id can be get from BTRFS_I(inode)->root. > And normally root is what you need. > > If you really want the number, then either > BTRFS_I(inode)->root->objectid or > BTRFS_I(inode)->root->root_key->objectid will give you the u64 subvolume id. > >> 2. Why fsInfo fsid return u8 and the traditional file system return >> dev_t, usually 32 integer ? > > As far as I found in xfs or ext4, their fsid is still u8[16] or uuid_t, > same as btrfs. > > For ext4 it's ext4_super_block->s_uuid[16] > And for xfs, it's xfs_sb->sb_uuid. > > I don't know how you get the dev_t parameter. > > Thanks, > Qu > >> >> >> On Sun, Jan 14, 2018 at 12:22 PM, Qu Wenruo wrote: >>> >>> >>> On 2018年01月14日 18:13, Ilan Schwarts wrote: both btrfs filesystems will have same fsid ? On Sun, Jan 14, 2018 at 12:06 PM, Ilan Schwarts wrote: > But both filesystems will have same fsid? > > On Jan 14, 2018 12:04, "Nikolay Borisov" wrote: >> >> >> >> On 14.01.2018 12:02, Ilan Schwarts wrote: >>> First of all, Thanks for response ! >>> So if i have 2 btrfs file system on the same machine (not your >>> everyday scenario, i know) >>> >>> Not a problem, the 2 filesystems will have 2 different fsid. >>> >>> (And it's my everyday scenario, since fstests neeeds TEST_DEV and >>> SCRATCH_DEV_POOL) >>> >>> Lets say a file is created on device A, the file gets inode number X >>> is it possible on device B to have inode number X also ? >>> or each device has its own Inode number range ? >>> >>> Forget the mess about device. >>> >>> Inode is bounded to a filesystem, not bounded to a device. >>> >>> Just traditional filesytems are normally bounded to a single device. >>> (Although even traditional filesystems can have external journal devices) >>> >>> So there is nothing to do with device at all. >>> >>> And you can have same inode numbers in different filesystems, but >>> BTRFS_I(inode)->root->fs_info will point to different fs_infos, with >>> different fsid. >>> >>> So return to your initial question: both btrfs filesystems will have same fsid ? >>> >>> No, different filesystems will have different fsid. >>> >>> (Unless you're SUUUPER lucky to have 2 filesystems with >>> same fsid) >>> >>> Thanks, >>> Qu >>> >>> >> >> Of course it is possible. Inodes are guaranteed to be unique only across >> filesystem instances. In your case you are going to have 2 fs instances. >> >>> >>> I need to create unique identifier for a file, I need to understand if >>> the identifier would be: GlobalFSID_DeviceID_Inode or DeviceID_Inode >>> is enough. >>> >>> Thanks >>> >>> >>> >>> >>> >>> On Sun, Jan 14, 2018 at 11:13 AM, Qu Wenruo >>> wrote: On 2018年01月14日 16:33, Ilan Schwarts wrote: > Hello btrfs developers/users, > > I was wondering regarding to fetching the correct fsid on btrfs from > the context of a kernel module. There are two IDs for btrfs. (in fact more, but you properly won't need the extra ids) FSID: Global one, one fs one FSID. Device ID: Bonded to device, each device will have one. So in case of 2 devices btrfs, each device will has its own device
Re: invalid files names, btrfs check can't repair it
On 2018-01-15 09:26:27 [+0800], Qu Wenruo wrote: > Please run the following command too: > > # btrfs inspect dump-tree | grep -C20 \(57923894 ~# btrfs inspect dump-tree /dev/sdb4 | grep -C20 \(57923894 ctime 1515602448.66422211 (2018-01-10 17:40:48) mtime 1515602448.66422211 (2018-01-10 17:40:48) otime 1513266995.540343055 (2017-12-14 16:56:35) item 10 key (57643872 INODE_REF 682) itemoff 3363 itemsize 13 index 58 namelen 3 name: tmp item 11 key (57648595 INODE_ITEM 0) itemoff 3203 itemsize 160 generation 89045 transid 89423 size 8350 nbytes 0 block group 0 mode 40755 links 1 uid 1000 gid 1000 rdev 0 sequence 0 flags 0xc90(none) atime 1513267009.164686143 (2017-12-14 16:56:49) ctime 1513868329.753150507 (2017-12-21 15:58:49) mtime 1513868329.753150507 (2017-12-21 15:58:49) otime 1513267009.164686143 (2017-12-14 16:56:49) item 12 key (57648595 INODE_REF 57643659) itemoff 3192 itemsize 11 index 113 namelen 1 name: d item 13 key (57648595 DIR_ITEM 3331247447) itemoff 3123 itemsize 69 location key (58472210 INODE_ITEM 0) type FILE transid 89418 data_len 0 name_len 8231 name: 454bf066ddfbf42e0f3b77ea71c82f-878732.oq item 14 key (57648595 DIR_ITEM 3363354030) itemoff 3053 itemsize 70 location key (57923894 INODE_ITEM 0) type DIR_ITEM.33 transid 89142 data_len 0 name_len 40 name: 2f3f379b2a3d7499471edb74869efe-1948311.d item 15 key (57648595 DIR_INDEX 435) itemoff 2983 itemsize 70 location key (57923894 INODE_ITEM 0) type FILE transid 89142 data_len 0 name_len 40 name: 2f3f379b2a3d7499471edb74869efe-1948311.d item 16 key (57648595 DIR_INDEX 1137) itemoff 2914 itemsize 69 location key (58472210 INODE_ITEM 0) type FILE transid 89418 data_len 0 name_len 39 name: 454bf066ddfbf42e0f3b77ea71c82f-878732.o item 17 key (57923894 INODE_ITEM 0) itemoff 2754 itemsize 160 generation 89142 transid 89142 size 36092 nbytes 36864 block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0 sequence 0 flags 0x91(none) atime 1513278413.460486168 (2017-12-14 20:06:53) ctime 1513278413.460486168 (2017-12-14 20:06:53) mtime 1513278413.460486168 (2017-12-14 20:06:53) otime 1513278413.460486168 (2017-12-14 20:06:53) item 18 key (57923894 INODE_REF 57648595) itemoff 2704 itemsize 50 index 435 namelen 40 name: 2f3f379b2a3d7499471edb74869efe-1948311.d item 19 key (57923894 EXTENT_DATA 0) itemoff 2651 itemsize 53 generation 89142 type 1 (regular) extent data disk byte 123290755072 nr 36864 extent data offset 0 nr 36864 ram 36864 extent compression 0 (none) item 20 key (58191388 INODE_ITEM 0) itemoff 2491 itemsize 160 generation 89259 transid 89259 size 395280 nbytes 397312 block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0 sequence 0 flags 0xa4(none) atime 1513332325.477020047 (2017-12-15 11:05:25) ctime 1513332325.477020047 (2017-12-15 11:05:25) mtime 1513332325.477020047 (2017-12-15 11:05:25) otime 1513332325.477020047 (2017-12-15 11:05:25) item 21 key (58191388 INODE_REF 40424284) itemoff 2470 itemsize 21 index 1094 namelen 11 name: bzImage.tsc leaf 146426621952 items 46 free space 11 generation 89680 owner 5 leaf 146426621952 flags 0x1(WRITTEN) backref revision 1 fs uuid b3bfb56e-d445-4335-93f0-c1fb2d1f6df1 chunk uuid 732d73c9-d037-4406-8dcb-dfa101bc5a9b item 0 key (58191388 EXTENT_DATA 0) itemoff 3942 itemsize 53 generation 87303 type 1 (regular) > > transid 89142 data_len 0 name_len 40 > > name: 2f3f379b2a3d7499471edb74869efe-1948311.d > > item 16 key (57648595 DIR_INDEX 1137) itemoff 2914 itemsize 69 > > location key (58472210 INODE_ITEM 0) type FILE > > And this command too: > > # btrfs inspect dump-tree | grep -C20 \(58472210 ~# btrfs inspect dump-tree /dev/sdb4 | grep -C20 \(58472210 generation 89044 transid 89699 size 0 nbytes 0 block group 0 mode 40755 links 1 uid 1000 gid 1000 rdev 0 sequence 0 flags 0x603b3(none) atime 1513266995.540343055 (2017-12-14 16:56:35) ctime 1515602448.66422211 (2018-01-10 17:40:48) mtime 1515602448.66422211 (2018-01-10 17:40:48) otime 1513266995.540343055 (2017-12-14 16:56:35) item 10 key (57643872
Re: [PATCH v4 2/4] btrfs: cleanup btrfs_mount() using btrfs_mount_root()
On 2018/01/12 19:14, Anand Jain wrote: > > Misono, > > This change is causing subsequent (subvol) mount to fail when device > option is specified. The simplest eg for failure is .. > mkfs.btrfs -qf /dev/sdc /dev/sdb > mount -o device=/dev/sdb /dev/sdc /btrfs > mount -o device=/dev/sdb /dev/sdc /btrfs1 >mount: /dev/sdc is already mounted or /btrfs1 busy > >Looks like > blkdev_get_by_path() <-- is failing. > btrfs_scan_one_device() > btrfs_parse_early_options() > btrfs_mount() > > Which is due to different holders (viz. btrfs_root_fs_type and > btrfs_fs_type) one is used for vfs_mount and other for scan, > so they form different holders and can't let EXCL open which > is needed for both scan and open. > > Thanks, Anand Thanks for the reporting. I'm sorry but I will be busy today and tomorrow, and the investigation will be after Wednesday. Regards, Tomohiro Misono > > > On 12/14/2017 04:25 PM, Misono, Tomohiro wrote: >> Cleanup btrfs_mount() by using btrfs_mount_root(). This avoids getting >> btrfs_mount() called twice in mount path. >> >> Old btrfs_mount() will do: >> 0. VFS layer calls vfs_kern_mount() with registered file_system_type >> (for btrfs, btrfs_fs_type). btrfs_mount() is called on the way. >> 1. btrfs_parse_early_options() parses "subvolid=" mount option and set the >> value to subvol_objectid. Otherwise, subvol_objectid has the initial >> value of 0 >> 2. check subvol_objectid is 5 or not. Assume this time id is not 5, then >> btrfs_mount() returns by calling mount_subvol() >> 3. In mount_subvol(), original mount options are modified to contain >> "subvolid=0" in setup_root_args(). Then, vfs_kern_mount() is called with >> btrfs_fs_type and new options >> 4. btrfs_mount() is called again >> 5. btrfs_parse_early_options() parses "subvolid=0" and set 5 (instead of 0) >> to subvol_objectid >> 6. check subvol_objectid is 5 or not. This time id is 5 and mount_subvol() >> is not called. btrfs_mount() finishes mounting a root >> 7. (in mount_subvol()) with using a return vale of vfs_kern_mount(), it >> calls mount_subtree() >> 8. return subvolume's dentry >> >> Reusing the same file_system_type (and btrfs_mount()) for vfs_kern_mount() >> is the cause of complication. >> >> Instead, new btrfs_mount() will do: >> 1. parse subvol id related options for later use in mount_subvol() >> 2. mount device's root by calling vfs_kern_mount() with >> btrfs_root_fs_type, which is not registered to VFS by >> register_filesystem(). As a result, btrfs_mount_root() is called >> 3. return by calling mount_subvol() >> >> The code of 2. is moved from the first part of mount_subvol(). >> >> Signed-off-by: Tomohiro Misono>> --- >> fs/btrfs/super.c | 193 >> +++ >> 1 file changed, 65 insertions(+), 128 deletions(-) >> >> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c >> index 14189ad47466..ce93d87b2a69 100644 >> --- a/fs/btrfs/super.c >> +++ b/fs/btrfs/super.c >> @@ -66,6 +66,11 @@ >> #include >> >> static const struct super_operations btrfs_super_ops; >> +/* >> + * btrfs_root_fs_type is used internally while >> + * btrfs_fs_type is used for VFS layer. >> + * See the comment at btrfs_mount for more detail. >> + */ >> static struct file_system_type btrfs_root_fs_type; >> static struct file_system_type btrfs_fs_type; >> >> @@ -1404,48 +1409,11 @@ static char *setup_root_args(char *args) >> >> static struct dentry *mount_subvol(const char *subvol_name, u64 >> subvol_objectid, >> int flags, const char *device_name, >> - char *data) >> + char *data, struct vfsmount *mnt) >> { >> struct dentry *root; >> -struct vfsmount *mnt = NULL; >> -char *newargs; >> int ret; >> >> -newargs = setup_root_args(data); >> -if (!newargs) { >> -root = ERR_PTR(-ENOMEM); >> -goto out; >> -} >> - >> -mnt = vfs_kern_mount(_fs_type, flags, device_name, newargs); >> -if (PTR_ERR_OR_ZERO(mnt) == -EBUSY) { >> -if (flags & SB_RDONLY) { >> -mnt = vfs_kern_mount(_fs_type, flags & ~SB_RDONLY, >> - device_name, newargs); >> -} else { >> -mnt = vfs_kern_mount(_fs_type, flags | SB_RDONLY, >> - device_name, newargs); >> -if (IS_ERR(mnt)) { >> -root = ERR_CAST(mnt); >> -mnt = NULL; >> -goto out; >> -} >> - >> -down_write(>mnt_sb->s_umount); >> -ret = btrfs_remount(mnt->mnt_sb, , NULL); >> -up_write(>mnt_sb->s_umount); >> -if (ret < 0) { >> -