Re: [PATCH] fstests: common/rc: fix device still mounted error with SCRATCH_DEV_POOL

2018-01-15 Thread Eryu Guan
On Mon, Jan 15, 2018 at 11:10:20PM -0800, Liu Bo wrote:
> On Mon, Jan 15, 2018 at 02:22:28PM +0800, Eryu Guan wrote:
> > On Fri, Jan 12, 2018 at 06:04:59PM -0700, Liu Bo wrote:
> > > One of btrfs tests, btrfs/011, uses SCRATCH_DEV_POOL and puts a 
> > > non-SCRATCH_DEV
> > > device as the first one when doing mkfs, and this makes
> > > _require_scratch{_nocheck} fail to umount $SCRATCH_MNT since it checks 
> > > mount
> > > point with SCRATCH_DEV only, and for sure it finds nothing to umount and 
> > > the
> > > following tests complain about 'device still mounted' alike errors.
> > > 
> > > Introduce a helper to address this special case where both btrfs and 
> > > scratch
> > > dev pool are in use.
> > > 
> > > Signed-off-by: Liu Bo 
> > 
> > Hmm, I didn't see this problem, I ran btrfs/011 then another tests that
> > uses $SCRATCH_DEV, and the second test ran fine too. Can you please
> > provide more details?
> 
> Sure, so I was using 4 devices of size being 2500M, btrfs/011 bailed
> out when doing a cp due to enospc then _fail is called to abort the
> test, and the mount point now is associated with a different device
> other than SCRATCH_DEV, so that _require_scratch_nocheck in btrfs/012
> was not able to umount SCRATCH_MNT.

Yeah, that's the exact case I described as below. I think adding
_scratch_umount >/dev/null 2>&1 in _cleanup() would resolve your issue.

> 
> > 
> > Anyway, I think we should fix btrfs/011 to either not use $SCRATCH_DEV
> > in replace operations (AFAIK, other btrfs replace tests do this) or
> > umount all devices before exit. And I noticed btrfs/011 does umount
> > $SCRATCH_MNT at the end of workout(), so usually all should be fine
> > (perhaps it would leave a device mounted if interrupted in the middle of
> > test run, because _cleanup() doesn't do umount).
> 
> That's true, if you want, I could fix all btrfs replace tests to
> umount SCRATCH_MNT right before exit.

I think only the tests that replace $SCRATCH_DEV (as what btrfs/011
does) need fixes, _require_scratch would umount $SCRATCH_MNT for other
tests.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs: Remove btrfs_inode::delayed_iput_count

2018-01-15 Thread Nikolay Borisov
delayed_iput_count wa supposed to be used to implement, well, delayed
iput. The idea is that we keep accumulating the number of iputs we do
until eventually the inode is deleted. Turns out we never really
switched the delayed_iput_count from 0 to 1, hence all conditional
code relying on the value of that member being different than 0 was
never executed. This, as it turns out, didn't cause any problem due
to the simple fact that the generic inode's i_count member was always
used to count the number of iputs. So let's just remove the unused
member and all unused code. This patch essentially provides no
functional changes. While at it, also add proper documentation for 
btrfs_add_delayed_iput 

Signed-off-by: Nikolay Borisov 
---
v2: Add function documentation to make it clear how delayed_iput works and 
uses vfs_inode::i_count

 fs/btrfs/btrfs_inode.h |  1 -
 fs/btrfs/inode.c   | 26 --
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 63f0ccc92a71..f527e99c9f8d 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -195,7 +195,6 @@ struct btrfs_inode {
 
/* Hook into fs_info->delayed_iputs */
struct list_head delayed_iput;
-   long delayed_iput_count;
 
/*
 * To avoid races between lockless (i_mutex not held) direct IO writes
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 029399593049..7f568b05b8fd 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3243,6 +3243,15 @@ static int btrfs_readpage_end_io_hook(struct 
btrfs_io_bio *io_bio,
  start, (size_t)(end - start + 1));
 }
 
+/* btrfs_add_delayed_iput - perform a delayed iput on @inode
+ *
+ * @inode: The inode we want to perform iput on
+ *
+ * This function uses the generic vfs_inode::i_count to track whether we
+ * should just decrement it (in case it's > 1) or if this is the last
+ * iput then link the inode to the delayed iput machinery. Delayed iputs
+ * are processed at transaction commit time/superblock commit/cleaner kthread
+ */
 void btrfs_add_delayed_iput(struct inode *inode)
 {
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -3252,12 +3261,8 @@ void btrfs_add_delayed_iput(struct inode *inode)
return;
 
spin_lock(_info->delayed_iput_lock);
-   if (binode->delayed_iput_count == 0) {
-   ASSERT(list_empty(>delayed_iput));
-   list_add_tail(>delayed_iput, _info->delayed_iputs);
-   } else {
-   binode->delayed_iput_count++;
-   }
+   ASSERT(list_empty(>delayed_iput));
+   list_add_tail(>delayed_iput, _info->delayed_iputs);
spin_unlock(_info->delayed_iput_lock);
 }
 
@@ -3270,13 +3275,7 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info 
*fs_info)
 
inode = list_first_entry(_info->delayed_iputs,
struct btrfs_inode, delayed_iput);
-   if (inode->delayed_iput_count) {
-   inode->delayed_iput_count--;
-   list_move_tail(>delayed_iput,
-   _info->delayed_iputs);
-   } else {
-   list_del_init(>delayed_iput);
-   }
+   list_del_init(>delayed_iput);
spin_unlock(_info->delayed_iput_lock);
iput(>vfs_inode);
spin_lock(_info->delayed_iput_lock);
@@ -9424,7 +9423,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
ei->dir_index = 0;
ei->last_unlink_trans = 0;
ei->last_log_commit = 0;
-   ei->delayed_iput_count = 0;
 
spin_lock_init(>lock);
ei->outstanding_extents = 0;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstests: common/rc: fix device still mounted error with SCRATCH_DEV_POOL

2018-01-15 Thread Liu Bo
On Mon, Jan 15, 2018 at 02:22:28PM +0800, Eryu Guan wrote:
> On Fri, Jan 12, 2018 at 06:04:59PM -0700, Liu Bo wrote:
> > One of btrfs tests, btrfs/011, uses SCRATCH_DEV_POOL and puts a 
> > non-SCRATCH_DEV
> > device as the first one when doing mkfs, and this makes
> > _require_scratch{_nocheck} fail to umount $SCRATCH_MNT since it checks mount
> > point with SCRATCH_DEV only, and for sure it finds nothing to umount and the
> > following tests complain about 'device still mounted' alike errors.
> > 
> > Introduce a helper to address this special case where both btrfs and scratch
> > dev pool are in use.
> > 
> > Signed-off-by: Liu Bo 
> 
> Hmm, I didn't see this problem, I ran btrfs/011 then another tests that
> uses $SCRATCH_DEV, and the second test ran fine too. Can you please
> provide more details?

Sure, so I was using 4 devices of size being 2500M, btrfs/011 bailed
out when doing a cp due to enospc then _fail is called to abort the
test, and the mount point now is associated with a different device
other than SCRATCH_DEV, so that _require_scratch_nocheck in btrfs/012
was not able to umount SCRATCH_MNT.

> 
> Anyway, I think we should fix btrfs/011 to either not use $SCRATCH_DEV
> in replace operations (AFAIK, other btrfs replace tests do this) or
> umount all devices before exit. And I noticed btrfs/011 does umount
> $SCRATCH_MNT at the end of workout(), so usually all should be fine
> (perhaps it would leave a device mounted if interrupted in the middle of
> test run, because _cleanup() doesn't do umount).

That's true, if you want, I could fix all btrfs replace tests to
umount SCRATCH_MNT right before exit.

thanks,
-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recommendations for balancing as part of regular maintenance?

2018-01-15 Thread Chris Murphy
On Mon, Jan 15, 2018 at 11:23 AM, Tom Worster  wrote:
> On 13 Jan 2018, at 17:09, Chris Murphy wrote:
>
>> On Fri, Jan 12, 2018 at 11:24 AM, Austin S. Hemmelgarn
>>  wrote:
>>
>>
>>> To that end, I propose the following text for the FAQ:
>>>
>>> Q: Do I need to run a balance regularly?
>>>
>>> A: While not strictly necessary for normal operations, running a filtered
>>> balance regularly can help prevent your filesystem from ending up with
>>> ENOSPC issues.  The following command run daily on each BTRFS volume
>>> should
>>> be more than sufficient for most users:
>>>
>>> `btrfs balance start -dusage=25 -dlimit=2..10 -musage=25 -mlimit=2..10`
>>
>>
>>
>> Daily? Seems excessive.
>>
>> I've got multiple Btrfs file systems that I haven't balanced, full or
>> partial, in a year. And I have no problems. One is a laptop which
>> accumulates snapshots until roughly 25% free space remains and then
>> most of the snapshots are deleted, except the most recent few, all at
>> one time. I'm not experiencing any problems so far. The other is a NAS
>> and it's multiple copies, with maybe 100-200 snapshots. One backup
>> volume is 99% full, there's no more unallocated free space, I delete
>> snapshots only to make room for btrfs send receive to keep pushing the
>> most recent snapshot from the main volume to the backup. Again no
>> problems.
>>
>> I really think suggestions this broad are just going to paper over
>> bugs or design flaws, we won't see as many bug reports and then real
>> problems won't get fixed.
>
>
> This is just an answer to a FAQ. This is not Austin or anyone else trying to
> telling you or anyone else that you should do this. It should be clear that
> there is an implied caveat along the lines of: "There are other ways to
> manage allocation besides regular balancing. This recommendation is a
> For-Dummies-kinda default that should work well enough if you don't have
> another strategy better adapted to your situation." If this implication is
> not obvious enough then we can add something explicit.

It's an upstream answer to a frequently asked question. It's rather
official, or about as close as it gets to it.



>
>
>> I also thing the time based method is too subjective. What about the
>> layout means a balance is needed? And if it's really a suggestion, why
>> isn't there a chron or systemd unit that just does this for the user,
>> in btrfs-progs, working and enabled by default?
>
>
> As a newcomer to BTRFS, I was astonished to learn that it demands each user
> figure out some workaround for what is, in my judgement, a required but
> missing feature, i.e. a defect, a bug. At present the docs are pretty
> confusing for someone trying to deal with it on their own.
>
> Unless some better fix is in the works, this _should_ be a systemd unit or
> something. Until then, please put it in FAQ.

At least openSUSE has a systemd unit for a long time now, but last
time I checked (a bit over a year ago) it's disabled by default. Why?

And insofar as I'm aware, openSUSE users aren't having big problems
related to lack of balancing, they have problems due to the lack of
balancing combined with schizo snapper defaults, which are these days
masked somewhat by turning on quotas so snapper can be more accurate
about cleaning up.

Basically the scripted balance tells me two things:
a. Something is broken (still)
b. None of the developers has time to investigate coherent bug reports
about a. and fix/refine it.

And therefore papering over the problem is all we have. Basically it's
a sledgehammer approach.

The main person working on enoscp stuff is Josef so I'd run this by
him and make sure this papering over bugs is something he agrees with.




>
>
>> I really do not like
>> all this hand holding of Btrfs, it's not going to make it better.
>
>
> Maybe it won't but, absent better proposals, and given the nature of the
> problem, this kind of hand-holding is only fair to the user.


This is hardly the biggest gotcha with Btrfs. I'm fine with the idea
of papering over design flaws and long standing bugs with user space
work arounds. I just want everyone on the same page about it, so it's
not some big surprise it's happening. As far as I know, none of the
developers regularly looks at the Btrfs wiki.

And I think the best way of communicating:
a. this is busted, and it sucks
b. here's a proposed user space work around, so users aren't so pissed off.

Is to try and get it into btrfs-progs, and enabled by default, because
that will get in front of at least one developer.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: invalid files names, btrfs check can't repair it

2018-01-15 Thread Qu Wenruo


On 2018年01月16日 12:51, Qu Wenruo wrote:
> Now the problems are all located:
> 
> For file "2f3f379b2a3d7499471edb74869efe-1948311.d", it's the problem of
> its DIR_ITEM has wrong type:
> 
> --
> item 14 key (57648595 DIR_ITEM 3363354030) itemoff 3053 itemsize 70
> location key (57923894 INODE_ITEM 0) type DIR_ITEM.33
>   ^^^
> --
> 
> There is unexpected type DIR_ITEM and a special number 33 here.
> 
> Despite that, the file is completely fine.
> 
> 
> For file "454bf066ddfbf42e0f3b77ea71c82f-878732.o"
> The problem is its namelen.
> 
> --
> item 13 key (57648595 DIR_ITEM 3331247447) itemoff 3123 itemsize 69
> location key (58472210 INODE_ITEM 0) type FILE
> transid 89418 data_len 0 name_len 8231
>    Insane
> name: 454bf066ddfbf42e0f3b77ea71c82f-878732.oq
> --
> 
> Despite that, it should be fine.
> 
> I'm not 100% sure if repair can really handle it well.
> But I could craft a temporary fix based on btrfs-corrupt-block (I know
> the name is scary).
> And you may need to compile btrfs-progs with my patch.

I just assume it's fs_root, and pushed the hard-coded fix branch to my
github:
https://github.com/adam900710/btrfs-progs/tree/hard_coded_fix_for_sebastian

Usage:
./btrfs-corrupt-block -X 

Just as commit message says, if anything went wrong, it will not touch
the fs at all.
So it should be somewhat safe to use.

And if something went wrong, it will cause backtrace and abort, it's
designed and you don't need to panic:

Example: No dir_item in my btrfs
--
./btrfs-corrupt-block -X /dev/data/btrfs
ERROR: corrupted DIR_ITEM not found
extent buffer leak: start 4227072 len 16384
extent_io.c:607: free_extent_buffer_internal: BUG_ON `eb->flags &
EXTENT_DIRTY` triggered, value 1
./btrfs-corrupt-block(+0x251df)[0x5623e003a1df]
./btrfs-corrupt-block(free_extent_buffer_nocache+0x1f)[0x5623e003aac1]
./btrfs-corrupt-block(extent_io_tree_cleanup+0x6d)[0x5623e003ab33]
./btrfs-corrupt-block(btrfs_cleanup_all_caches+0x76)[0x5623e0027747]
./btrfs-corrupt-block(close_ctree_fs_info+0x111)[0x5623e0028027]
./btrfs-corrupt-block(main+0x3f5)[0x5623e00551df]
/usr/lib/libc.so.6(__libc_start_main+0xea)[0x7fdce4e0ff4a]
./btrfs-corrupt-block(_start+0x2a)[0x5623e001ee3a]
Aborted (core dumped)
--

And if above error happens, please paste the error output and provide
the subvolume id.

Thanks,
Qu

> 
> The only remaining thing I need is the subvolume id which contains the
> corrupted files.
> 
> Since there is no other hit, I assume it's root subvolume (5), but I
> still need the extra confirm since the fix will be hard-coded.
> 
> Thanks,
> Qu
> 



signature.asc
Description: OpenPGP digital signature


Re: invalid files names, btrfs check can't repair it

2018-01-15 Thread Qu Wenruo
Now the problems are all located:

For file "2f3f379b2a3d7499471edb74869efe-1948311.d", it's the problem of
its DIR_ITEM has wrong type:

--
item 14 key (57648595 DIR_ITEM 3363354030) itemoff 3053 itemsize 70
location key (57923894 INODE_ITEM 0) type DIR_ITEM.33
  ^^^
--

There is unexpected type DIR_ITEM and a special number 33 here.

Despite that, the file is completely fine.


For file "454bf066ddfbf42e0f3b77ea71c82f-878732.o"
The problem is its namelen.

--
item 13 key (57648595 DIR_ITEM 3331247447) itemoff 3123 itemsize 69
location key (58472210 INODE_ITEM 0) type FILE
transid 89418 data_len 0 name_len 8231
   Insane
name: 454bf066ddfbf42e0f3b77ea71c82f-878732.oq
--

Despite that, it should be fine.

I'm not 100% sure if repair can really handle it well.
But I could craft a temporary fix based on btrfs-corrupt-block (I know
the name is scary).
And you may need to compile btrfs-progs with my patch.

The only remaining thing I need is the subvolume id which contains the
corrupted files.

Since there is no other hit, I assume it's root subvolume (5), but I
still need the extra confirm since the fix will be hard-coded.

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] btrfs-progs: ins: fix arg order in print_inode_item()

2018-01-15 Thread Qu Wenruo
Hi David,

Would you please queue this patch to devel branch?

This is a small enough, but quite important fix when handling dump tree
output.

Thanks,
Qu

On 2017年10月30日 16:20, Qu Wenruo wrote:
> 
> 
> On 2017年10月30日 16:10, Misono, Tomohiro wrote:
>> In the print_inode_item(), the argument order of sequence and flags are
>> reversed:
>>
>> printf("... sequence %llu flags 0x%llx(%s)\n",
>>  ... 
>> (unsigned long long)btrfs_inode_flags(eb,ii),
>> (unsigned long long)btrfs_inode_sequence(eb, ii),
>>  ...)
>>
>> So, just fix it.
>>
>> Signed-off-by: Tomohiro Misono 
> 
> Reviewed-by: Qu Wenruo 
> 
> Thanks,
> Qu
> 
>> ---
>>  print-tree.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/print-tree.c b/print-tree.c
>> index 3c585e3..8abd760 100644
>> --- a/print-tree.c
>> +++ b/print-tree.c
>> @@ -896,8 +896,8 @@ static void print_inode_item(struct extent_buffer *eb,
>> btrfs_inode_uid(eb, ii),
>> btrfs_inode_gid(eb, ii),
>> (unsigned long long)btrfs_inode_rdev(eb,ii),
>> -   (unsigned long long)btrfs_inode_flags(eb,ii),
>> (unsigned long long)btrfs_inode_sequence(eb, ii),
>> +   (unsigned long long)btrfs_inode_flags(eb,ii),
>> flags_str);
>>  print_timespec(eb, btrfs_inode_atime(ii), "\t\tatime ", "\n");
>>  print_timespec(eb, btrfs_inode_ctime(ii), "\t\tctime ", "\n");
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: big volumes only work reliable with ssd_spread

2018-01-15 Thread Duncan
Stefan Priebe - Profihost AG posted on Mon, 15 Jan 2018 10:55:42 +0100 as
excerpted:

> since around two or three years i'm using btrfs for incremental VM
> backups.
> 
> some data:
> - volume size 60TB
> - around 2000 subvolumes
> - each differential backup stacks on top of a subvolume
> - compress-force=zstd
> - space_cache=v2
> - no quote / qgroup
> 
> this works fine since Kernel 4.14 except that i need ssd_spread as an
> option. If i do not use ssd_spread i always end up with very slow
> performance and a single kworker process using 100% CPU after some days.
> 
> With ssd_spread those boxes run fine since around 6 month. Is this
> something expected? I haven't found any hint regarding such an impact.

My understanding of the technical details is "limited" as I'm not a dev, 
and I expect you'll get a more technically accurate response later, but 
sometimes a first not particularly technical response can be helpful as 
long as it's not /wrong/.  (And if it is this is a good way to have my 
understanding corrected as well. =:^)  With that caveat, based on my 
understanding of what I've seen on-list...

The kernel v4.14 ssd mount-option changes apparently primarily affected 
data, not metadata.  Apparently, ssd_spread has a heavier metadata 
effect, and the v4.14 changes moved additional (I believe metadata) 
functionality to ssd-spread that had originally been part of ssd as 
well.  There has been some discussion of metadata tweaks similar to those 
in 4.14 for the ssd option with data, but they weren't deemed as 
demonstrably needed as the ssd option tweaks and needed further 
discussion, so were put off until the effect of the 4.14 tweaks could be 
gauged in more widespread use, after which they were to be reconsidered, 
if necessary.

Meanwhile, in the discussion I saw, Chris Mason mentioned that Facebook 
is using ssd-spread for various reasons there, so it's well-tested with 
their deployments, which I'd assume have many of the same qualities yours 
do, thus implying that your observations about ssd-spread are no accident.

In fact, if I interpreted Chris's comments correctly, they use ssd_spread 
on very large multi-layered non-ssd storage arrays, in part because the 
larger layout-alignment optimizations make sense there as well as on 
ssds.  That would appear to be precisely what you are seeing. =:^)  If 
that's the case, then arguably the option is misnamed and the ssd_spread 
name may well at some point be deprecated in favor of something more 
descriptive of its actual function and target devices.  Purely my own 
speculation here, but perhaps something like vla_spread (very-large-
array)?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to make a cache directory nodatacow while also excluded from snapshots?

2018-01-15 Thread Andrei Borzenkov
16.01.2018 00:56, Dave пишет:
> I want to exclude my ~/.cache directory from snapshots. The obvious
> way to do this is to mount a btrfs subvolume at that location.
> 
> However, I also want the ~/.cache directory to be nodatacow. Since the
> parent volume is COW, I believe it isn't possible to mount the
> subvolume with different mount options.
> 
> What's the solution for achieving both of these goals?
> 
> I tried this without success:
> 
> chattr +C ~/.cache
> 
> Since ~/.cache is a btrfs subvolume, apparently that doesn't work.
> 
> lsattr ~/.cache
> 
> returns nothing.

Try creating file under ~/.cache and check its attributes.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Neujahrsspende von 4.800.000 €

2018-01-15 Thread Tom Crist


-- 
 Hallo, Sie haben eine Spende von 4.800.000,00 Euro, ich habe die America 
Lottery in Amerika im Wert von 40 Millionen Dollar gewonnen und ich gebe einen 
Teil davon an fünf glückliche Menschen und Wohltätigkeits-Häuser in Erinnerung 
an meine verstorbene Frau, die an Krebs gestorben ist. Kontaktieren Sie mich 
für weitere Details:(tomcrist2...@gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


how to make a cache directory nodatacow while also excluded from snapshots?

2018-01-15 Thread Dave
I want to exclude my ~/.cache directory from snapshots. The obvious
way to do this is to mount a btrfs subvolume at that location.

However, I also want the ~/.cache directory to be nodatacow. Since the
parent volume is COW, I believe it isn't possible to mount the
subvolume with different mount options.

What's the solution for achieving both of these goals?

I tried this without success:

chattr +C ~/.cache

Since ~/.cache is a btrfs subvolume, apparently that doesn't work.

lsattr ~/.cache

returns nothing.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/4] btrfs: cleanup btrfs_mount() using btrfs_mount_root()

2018-01-15 Thread David Sterba
On Fri, Jan 12, 2018 at 06:14:40PM +0800, Anand Jain wrote:
> 
> Misono,
> 
>   This change is causing subsequent (subvol) mount to fail when device
>   option is specified. The simplest eg for failure is ..
> mkfs.btrfs -qf /dev/sdc /dev/sdb
> mount -o device=/dev/sdb /dev/sdc /btrfs
> mount -o device=/dev/sdb /dev/sdc /btrfs1
>mount: /dev/sdc is already mounted or /btrfs1 busy
> 
>Looks like
>  blkdev_get_by_path() <-- is failing.
>  btrfs_scan_one_device()
>  btrfs_parse_early_options()
>  btrfs_mount()
> 
>   Which is due to different holders (viz. btrfs_root_fs_type and
>   btrfs_fs_type) one is used for vfs_mount and other for scan,
>   so they form different holders and can't let EXCL open which
>   is needed for both scan and open.

This looks close to what I see in the random test failures. I've
reverted your patch "btrfs: optimize move uuid_mutex closer to the
critical section" as I bisected to it. The uuid mutex around
blkdev_get_path probably protected the concurrent mount and scan so they
did not ask for EXCL at the same time.

Reverting (or removing the patch from the current misc-next) queue is
simpler for me ATM as I want to get to a stable base now, we can add it
later if we understand the issue with the mount/scan.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recommendations for balancing as part of regular maintenance?

2018-01-15 Thread Tom Worster

On 13 Jan 2018, at 17:09, Chris Murphy wrote:


On Fri, Jan 12, 2018 at 11:24 AM, Austin S. Hemmelgarn
 wrote:



To that end, I propose the following text for the FAQ:

Q: Do I need to run a balance regularly?

A: While not strictly necessary for normal operations, running a 
filtered
balance regularly can help prevent your filesystem from ending up 
with
ENOSPC issues.  The following command run daily on each BTRFS volume 
should

be more than sufficient for most users:

`btrfs balance start -dusage=25 -dlimit=2..10 -musage=25 
-mlimit=2..10`



Daily? Seems excessive.

I've got multiple Btrfs file systems that I haven't balanced, full or
partial, in a year. And I have no problems. One is a laptop which
accumulates snapshots until roughly 25% free space remains and then
most of the snapshots are deleted, except the most recent few, all at
one time. I'm not experiencing any problems so far. The other is a NAS
and it's multiple copies, with maybe 100-200 snapshots. One backup
volume is 99% full, there's no more unallocated free space, I delete
snapshots only to make room for btrfs send receive to keep pushing the
most recent snapshot from the main volume to the backup. Again no
problems.

I really think suggestions this broad are just going to paper over
bugs or design flaws, we won't see as many bug reports and then real
problems won't get fixed.


This is just an answer to a FAQ. This is not Austin or anyone else 
trying to telling you or anyone else that you should do this. It should 
be clear that there is an implied caveat along the lines of: "There are 
other ways to manage allocation besides regular balancing. This 
recommendation is a For-Dummies-kinda default that should work well 
enough if you don't have another strategy better adapted to your 
situation." If this implication is not obvious enough then we can add 
something explicit.




I also thing the time based method is too subjective. What about the
layout means a balance is needed? And if it's really a suggestion, why
isn't there a chron or systemd unit that just does this for the user,
in btrfs-progs, working and enabled by default?


As a newcomer to BTRFS, I was astonished to learn that it demands each 
user figure out some workaround for what is, in my judgement, a required 
but missing feature, i.e. a defect, a bug. At present the docs are 
pretty confusing for someone trying to deal with it on their own.


Unless some better fix is in the works, this _should_ be a systemd unit 
or something. Until then, please put it in FAQ.




I really do not like
all this hand holding of Btrfs, it's not going to make it better.


Maybe it won't but, absent better proposals, and given the nature of the 
problem, this kind of hand-holding is only fair to the user.


Tom
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Remove btrfs_inode::delayed_iput_count

2018-01-15 Thread David Sterba
On Mon, Jan 15, 2018 at 10:16:54AM -0700, Edmund Nadolski wrote:
> 
> 
> On 01/15/2018 05:31 AM, Nikolay Borisov wrote:
> > delayed_iput_count wa supposed to be used to implement, well, delayed
> > iput. The idea is that we keep accumulating the number of iputs we do
> > until eventually the inode is deleted. Turns out we never really
> > switched the delayed_iput_count from 0 to 1, hence all conditional
> > code relying on the value of that member being different than 0 was
> > never executed. This, as it turns out, didn't cause any problem due
> > to the simple fact that the generic inode's i_count member was always
> > used to count the number of iputs. So let's just remove the unused
> > member and all unused code. This patch essentially provides no
> > functional changes.
> > 
> > Signed-off-by: Nikolay Borisov 
> 
> Since the 8089fe62c6 changelog mentions the need for a count, it might
> be nice to include a brief code comment about the i_count effect.

Agreed.

> Reviewed-by: Edmund Nadolski 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Remove btrfs_inode::delayed_iput_count

2018-01-15 Thread Edmund Nadolski


On 01/15/2018 05:31 AM, Nikolay Borisov wrote:
> delayed_iput_count wa supposed to be used to implement, well, delayed
> iput. The idea is that we keep accumulating the number of iputs we do
> until eventually the inode is deleted. Turns out we never really
> switched the delayed_iput_count from 0 to 1, hence all conditional
> code relying on the value of that member being different than 0 was
> never executed. This, as it turns out, didn't cause any problem due
> to the simple fact that the generic inode's i_count member was always
> used to count the number of iputs. So let's just remove the unused
> member and all unused code. This patch essentially provides no
> functional changes.
> 
> Signed-off-by: Nikolay Borisov 

Since the 8089fe62c6 changelog mentions the need for a count, it might
be nice to include a brief code comment about the i_count effect.

Reviewed-by: Edmund Nadolski 


> ---
>  fs/btrfs/btrfs_inode.h |  1 -
>  fs/btrfs/inode.c   | 17 +++--
>  2 files changed, 3 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index 63f0ccc92a71..f527e99c9f8d 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -195,7 +195,6 @@ struct btrfs_inode {
>  
>   /* Hook into fs_info->delayed_iputs */
>   struct list_head delayed_iput;
> - long delayed_iput_count;
>  
>   /*
>* To avoid races between lockless (i_mutex not held) direct IO writes
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 029399593049..2225f613516c 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3252,12 +3252,8 @@ void btrfs_add_delayed_iput(struct inode *inode)
>   return;
>  
>   spin_lock(_info->delayed_iput_lock);
> - if (binode->delayed_iput_count == 0) {
> - ASSERT(list_empty(>delayed_iput));
> - list_add_tail(>delayed_iput, _info->delayed_iputs);
> - } else {
> - binode->delayed_iput_count++;
> - }
> + ASSERT(list_empty(>delayed_iput));
> + list_add_tail(>delayed_iput, _info->delayed_iputs);
>   spin_unlock(_info->delayed_iput_lock);
>  }
>  
> @@ -3270,13 +3266,7 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info 
> *fs_info)
>  
>   inode = list_first_entry(_info->delayed_iputs,
>   struct btrfs_inode, delayed_iput);
> - if (inode->delayed_iput_count) {
> - inode->delayed_iput_count--;
> - list_move_tail(>delayed_iput,
> - _info->delayed_iputs);
> - } else {
> - list_del_init(>delayed_iput);
> - }
> + list_del_init(>delayed_iput);
>   spin_unlock(_info->delayed_iput_lock);
>   iput(>vfs_inode);
>   spin_lock(_info->delayed_iput_lock);
> @@ -9424,7 +9414,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
>   ei->dir_index = 0;
>   ei->last_unlink_trans = 0;
>   ei->last_log_commit = 0;
> - ei->delayed_iput_count = 0;
>  
>   spin_lock_init(>lock);
>   ei->outstanding_extents = 0;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstests: common/rc: fix device still mounted error with SCRATCH_DEV_POOL

2018-01-15 Thread David Sterba
On Mon, Jan 15, 2018 at 02:22:28PM +0800, Eryu Guan wrote:
> On Fri, Jan 12, 2018 at 06:04:59PM -0700, Liu Bo wrote:
> > One of btrfs tests, btrfs/011, uses SCRATCH_DEV_POOL and puts a 
> > non-SCRATCH_DEV
> > device as the first one when doing mkfs, and this makes
> > _require_scratch{_nocheck} fail to umount $SCRATCH_MNT since it checks mount
> > point with SCRATCH_DEV only, and for sure it finds nothing to umount and the
> > following tests complain about 'device still mounted' alike errors.
> > 
> > Introduce a helper to address this special case where both btrfs and scratch
> > dev pool are in use.
> > 
> > Signed-off-by: Liu Bo 
> 
> Hmm, I didn't see this problem, I ran btrfs/011 then another tests that
> uses $SCRATCH_DEV, and the second test ran fine too. Can you please
> provide more details?
> 
> Anyway, I think we should fix btrfs/011 to either not use $SCRATCH_DEV
> in replace operations (AFAIK, other btrfs replace tests do this) or
> umount all devices before exit. And I noticed btrfs/011 does umount
> $SCRATCH_MNT at the end of workout(), so usually all should be fine
> (perhaps it would leave a device mounted if interrupted in the middle of
> test run, because _cleanup() doesn't do umount).

In my case I saw lots of test failures (btrfs/ 012 068 071 074 116 136
138 152 154 155 ...), some of them repeatedly but not reliably. This
could have been triggered by a patch in my testing branch, but I can't
tell for sure due to the inaccurate fstest checks. The common problem
was that the scratch device appeared as mounted.

We discussed that with Bo, I was suspecting some of our changes that
could theoretically leave some data in flight after umount. Bo found the
potential problems in fstests so I'll redo all the testing again with
updated fstests.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs subvolume mount with different options

2018-01-15 Thread Konstantin V. Gavrilenko
Thanks, chattr +C is that's what I am currently using.
Also you already answered my next question, why it is not possible to set +C 
attribute on the existing file :)


Yours sincerely,
Konstantin V. Gavrilenko


- Original Message -
From: "Roman Mamedov" 
To: "Konstantin V. Gavrilenko" 
Cc: "Linux fs Btrfs" 
Sent: Friday, 12 January, 2018 9:37:49 PM
Subject: Re: btrfs subvolume mount with different options

On Fri, 12 Jan 2018 17:49:38 + (GMT)
"Konstantin V. Gavrilenko"  wrote:

> Hi list,
> 
> just wondering whether it is possible to mount two subvolumes with different 
> mount options, i.e.
> 
> |
> |- /a  defaults,compress-force=lza

You can have use different compression algorithms across the filesystem
(including none), via "btrfs properties" on directories or subvolumes. They
are inherited down the tree.

$ mkdir test
$ sudo btrfs prop set test compression zstd
$ echo abc > test/def
$ sudo btrfs prop get test/def compression
compression=zstd

But it appears this doesn't provide a way to apply compress-force.

> |- /b  defaults,nodatacow

Nodatacow can be applied to any dir/subvolume recursively, or any file (as long 
as it's created but not
written yet) via chattr +C.

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recommendations for balancing as part of regular maintenance?

2018-01-15 Thread Austin S. Hemmelgarn

On 2018-01-13 17:09, Chris Murphy wrote:

On Fri, Jan 12, 2018 at 11:24 AM, Austin S. Hemmelgarn
 wrote:



To that end, I propose the following text for the FAQ:

Q: Do I need to run a balance regularly?

A: While not strictly necessary for normal operations, running a filtered
balance regularly can help prevent your filesystem from ending up with
ENOSPC issues.  The following command run daily on each BTRFS volume should
be more than sufficient for most users:

`btrfs balance start -dusage=25 -dlimit=2..10 -musage=25 -mlimit=2..10`



Daily? Seems excessive.
For handling of chunks that are only 25% full and capping it at 10 
chunks processed each for data and metadata?  That's only (assuming I 
remember the max chunk size correctly) about 15GB of data being moved at 
the absolute most, and that will likely only happen in pathologically 
bad cases.  In most cases it should be either nothing (in most cases) or 
about 768MB being shuffled around, and even on traditional hard drives 
that should complete insanely fast (barring impact from very large 
numbers of snapshots or use of qgroups).


If there are no chunks that match (or only one chunk), this finishes in 
at most a second with near zero disk I/O.  If exactly two match (which 
should be the common case for most users when it matches at all), it 
should take at most a few seconds to complete, even on traditional hard 
drives.  If more match, it will of course take longer, but it should be 
pretty rare that more than two match.


Given that, it really doesn't seem all that excessive to me.  As a point 
of comparison, automated X.509 certificate renewal checks via certbot 
take more resources to perform when there's not a renewal due than this 
balance command takes when there's nothing to work on, and it's 
absolutely standard to run the X.509 checks daily despite the fact that 
weekly checks would still give no worse security (certbot will renew 
things well before they expire).


I've got multiple Btrfs file systems that I haven't balanced, full or
partial, in a year. And I have no problems. One is a laptop which
accumulates snapshots until roughly 25% free space remains and then
most of the snapshots are deleted, except the most recent few, all at
one time. I'm not experiencing any problems so far. The other is a NAS
and it's multiple copies, with maybe 100-200 snapshots. One backup
volume is 99% full, there's no more unallocated free space, I delete
snapshots only to make room for btrfs send receive to keep pushing the
most recent snapshot from the main volume to the backup. Again no
problems.
In the first case, you're dealing with a special configuration that 
makes most of this irrelevant most of the time (as I'm assuming things 
change _enough_ between snapshots that dumping most of them will 
completely empty out most of the chunks they were stored in).


In the second I'd have to say you've been lucky.  I've personally never 
run a volume that close to full with BTRFS without balancing regularly 
and not had some kind of issue.


I really think suggestions this broad are just going to paper over
bugs or design flaws, we won't see as many bug reports and then real
problems won't get fixed.
So maybe we should fix things so that this is never needed?  Yes, it's a 
workaround for a well known and documented design flaw (and yes, I 
consider the whole two-level allocator's handling of free space 
exhaustion to be a design flaw), but I don't see any patches forthcoming 
to fix it, so if we want to keep users around, we need to provide some 
way for them to mitigate the problems it can cause (otherwise we won't 
find any bugs because we won't have any users).


I also thing the time based method is too subjective. What about the
layout means a balance is needed? And if it's really a suggestion, why
isn't there a chron or systemd unit that just does this for the user,
in btrfs-progs, working and enabled by default? I really do not like
all this hand holding of Btrfs, it's not going to make it better.

For a filesystem you really have two generic possibilities for use cases:

1. It's designed for general purpose usage.  Doesn't really excel at any 
thing in particular, but isn't really bad at anything either.
2. It's designed for a very specific use case.  Does an amazing job for 
that particular use case and possibly for some similar ones, and may or 
may not do a reasonable job for other use cases.


Your comments here seem to imply that BTRFS falls under the second case, 
which is odd since most everything else I've seen implies that BTRFS 
fits the first case (or is trying to at least).  In either case though, 
you need to provide something to deal with this particular design flaw.


In the first case, you _need_ to make it as easy as possible for people 
who have no understanding of computers to use.  While needing balances 
from time to time is not exactly in-line with that, requiring people to 
try and judge based on the 

[PATCH] btrfs: Remove btrfs_inode::delayed_iput_count

2018-01-15 Thread Nikolay Borisov
delayed_iput_count wa supposed to be used to implement, well, delayed
iput. The idea is that we keep accumulating the number of iputs we do
until eventually the inode is deleted. Turns out we never really
switched the delayed_iput_count from 0 to 1, hence all conditional
code relying on the value of that member being different than 0 was
never executed. This, as it turns out, didn't cause any problem due
to the simple fact that the generic inode's i_count member was always
used to count the number of iputs. So let's just remove the unused
member and all unused code. This patch essentially provides no
functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/btrfs_inode.h |  1 -
 fs/btrfs/inode.c   | 17 +++--
 2 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 63f0ccc92a71..f527e99c9f8d 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -195,7 +195,6 @@ struct btrfs_inode {
 
/* Hook into fs_info->delayed_iputs */
struct list_head delayed_iput;
-   long delayed_iput_count;
 
/*
 * To avoid races between lockless (i_mutex not held) direct IO writes
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 029399593049..2225f613516c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3252,12 +3252,8 @@ void btrfs_add_delayed_iput(struct inode *inode)
return;
 
spin_lock(_info->delayed_iput_lock);
-   if (binode->delayed_iput_count == 0) {
-   ASSERT(list_empty(>delayed_iput));
-   list_add_tail(>delayed_iput, _info->delayed_iputs);
-   } else {
-   binode->delayed_iput_count++;
-   }
+   ASSERT(list_empty(>delayed_iput));
+   list_add_tail(>delayed_iput, _info->delayed_iputs);
spin_unlock(_info->delayed_iput_lock);
 }
 
@@ -3270,13 +3266,7 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info 
*fs_info)
 
inode = list_first_entry(_info->delayed_iputs,
struct btrfs_inode, delayed_iput);
-   if (inode->delayed_iput_count) {
-   inode->delayed_iput_count--;
-   list_move_tail(>delayed_iput,
-   _info->delayed_iputs);
-   } else {
-   list_del_init(>delayed_iput);
-   }
+   list_del_init(>delayed_iput);
spin_unlock(_info->delayed_iput_lock);
iput(>vfs_inode);
spin_lock(_info->delayed_iput_lock);
@@ -9424,7 +9414,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
ei->dir_index = 0;
ei->last_unlink_trans = 0;
ei->last_log_commit = 0;
-   ei->delayed_iput_count = 0;
 
spin_lock_init(>lock);
ei->outstanding_extents = 0;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: invalid files names, btrfs check can't repair it

2018-01-15 Thread Sebastian Andrzej Siewior
On 2018-01-15 12:23:05 [+0800], Qu Wenruo wrote:
> Right, I'll fix it soon.
> 
> And BTW what makes the the output different from the original one?
> 
> Sebastaian, did you do extra write or other operation to the fs after
> previous btrfs check?

Well the filesystem is in use but there should be no writtes to it since
initial `check' output. The `check' invalidates the space cache that is
rebuilt, not sure if this has any effect. Those two magic files are in a
subfolder of ccache and ccache shouldn't look into it at all.

> Thanks,
> Qu

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: Fwd: Question regarding to Btrfs patchwork /2831525

2018-01-15 Thread Qu Wenruo


On 2018年01月15日 20:08, Ilan Schwarts wrote:
> Thanks for detailed information !
> Its a legacy code for kernel module i maintain.. dont talk to me about
> ancient when i need to maintain it to systems like solaris 8 or RHEL4
> 2.6.9 :(

Well, that's unfortunate, I mean real unforunate...

Despite that, if sticking to device number (dev_t), I think the one in
super_block->s_dev won't help much.
Especially it can change when btrfs tries to add/delete devices.

So it will be a very hard time for you to trace device number for btrfs.

Thanks,
Qu

> 
> 
> 
> On Mon, Jan 15, 2018 at 12:01 PM, Qu Wenruo  wrote:
>>
>>
>> On 2018年01月15日 17:24, Ilan Schwarts wrote:
>>> Qu,
>>> Given inode, i get the fsid via: inode->i_sb->s_dev;
>>> this return dev_t and not u8/u16
>>
>> That's just a device number.
>>
>> Not really useful in btrfs, since btrfs is a multi-device filesystem.
>>
>> Thanks,
>> Qu
>>
>>>
>>>
>>> On Sun, Jan 14, 2018 at 12:44 PM, Qu Wenruo  wrote:


 On 2018年01月14日 18:32, Ilan Schwarts wrote:
> Thank you for clarification.
> Just 2 quick questions,
> 1. Sub volumes - 2 sub volumes cannot have 2 same inode numbers ?

 They can.

 So to really locate an inode in btrfs, you need:

 fsid (locate the fs) -> subvolume id (locate subvolume) -> inode number.

 fsid can be feteched from superblock as mentioned in previous reply.

 subvolume id can be get from BTRFS_I(inode)->root.
 And normally root is what you need.

 If you really want the number, then either
 BTRFS_I(inode)->root->objectid or
 BTRFS_I(inode)->root->root_key->objectid will give you the u64 subvolume 
 id.

> 2. Why fsInfo fsid return u8 and the traditional file system return
> dev_t, usually 32 integer ?

 As far as I found in xfs or ext4, their fsid is still u8[16] or uuid_t,
 same as btrfs.

 For ext4 it's ext4_super_block->s_uuid[16]
 And for xfs, it's xfs_sb->sb_uuid.

 I don't know how you get the dev_t parameter.

 Thanks,
 Qu

>
>
> On Sun, Jan 14, 2018 at 12:22 PM, Qu Wenruo  
> wrote:
>>
>>
>> On 2018年01月14日 18:13, Ilan Schwarts wrote:
>>> both btrfs filesystems will have same fsid ?
>>>
>>>
>>> On Sun, Jan 14, 2018 at 12:06 PM, Ilan Schwarts  
>>> wrote:
 But both filesystems will have same fsid?

 On Jan 14, 2018 12:04, "Nikolay Borisov"  wrote:
>
>
>
> On 14.01.2018 12:02, Ilan Schwarts wrote:
>> First of all, Thanks for response !
>> So if i have 2 btrfs file system on the same machine (not your
>> everyday scenario, i know)
>>
>> Not a problem, the 2 filesystems will have 2 different fsid.
>>
>> (And it's my everyday scenario, since fstests neeeds TEST_DEV and
>> SCRATCH_DEV_POOL)
>>
>> Lets say a file is created on device A, the file gets inode number X
>> is it possible on device B to have inode number X also ?
>> or each device has its own Inode number range ?
>>
>> Forget the mess about device.
>>
>> Inode is bounded to a filesystem, not bounded to a device.
>>
>> Just traditional filesytems are normally bounded to a single device.
>> (Although even traditional filesystems can have external journal devices)
>>
>> So there is nothing to do with device at all.
>>
>> And you can have same inode numbers in different filesystems, but
>> BTRFS_I(inode)->root->fs_info will point to different fs_infos, with
>> different fsid.
>>
>> So return to your initial question:
>>> both btrfs filesystems will have same fsid ?
>>
>> No, different filesystems will have different fsid.
>>
>> (Unless you're SUUUPER lucky to have 2 filesystems with
>> same fsid)
>>
>> Thanks,
>> Qu
>>
>>
>
> Of course it is possible. Inodes are guaranteed to be unique only 
> across
> filesystem instances. In your case you are going to have 2 fs 
> instances.
>
>>
>> I need to create unique identifier for a file, I need to understand 
>> if
>> the identifier would be: GlobalFSID_DeviceID_Inode or DeviceID_Inode
>> is enough.
>>
>> Thanks
>>
>>
>>
>>
>>
>> On Sun, Jan 14, 2018 at 11:13 AM, Qu Wenruo 
>> wrote:
>>>
>>>
>>> On 2018年01月14日 16:33, Ilan Schwarts wrote:
 Hello btrfs developers/users,

 I was wondering regarding to fetching the correct fsid on btrfs 
 from
 the context of a kernel module.
>>>
>>> There are two IDs for btrfs. (in fact 

Re: Fwd: Fwd: Question regarding to Btrfs patchwork /2831525

2018-01-15 Thread Ilan Schwarts
Thanks for detailed information !
Its a legacy code for kernel module i maintain.. dont talk to me about
ancient when i need to maintain it to systems like solaris 8 or RHEL4
2.6.9 :(



On Mon, Jan 15, 2018 at 12:01 PM, Qu Wenruo  wrote:
>
>
> On 2018年01月15日 17:24, Ilan Schwarts wrote:
>> Qu,
>> Given inode, i get the fsid via: inode->i_sb->s_dev;
>> this return dev_t and not u8/u16
>
> That's just a device number.
>
> Not really useful in btrfs, since btrfs is a multi-device filesystem.
>
> Thanks,
> Qu
>
>>
>>
>> On Sun, Jan 14, 2018 at 12:44 PM, Qu Wenruo  wrote:
>>>
>>>
>>> On 2018年01月14日 18:32, Ilan Schwarts wrote:
 Thank you for clarification.
 Just 2 quick questions,
 1. Sub volumes - 2 sub volumes cannot have 2 same inode numbers ?
>>>
>>> They can.
>>>
>>> So to really locate an inode in btrfs, you need:
>>>
>>> fsid (locate the fs) -> subvolume id (locate subvolume) -> inode number.
>>>
>>> fsid can be feteched from superblock as mentioned in previous reply.
>>>
>>> subvolume id can be get from BTRFS_I(inode)->root.
>>> And normally root is what you need.
>>>
>>> If you really want the number, then either
>>> BTRFS_I(inode)->root->objectid or
>>> BTRFS_I(inode)->root->root_key->objectid will give you the u64 subvolume id.
>>>
 2. Why fsInfo fsid return u8 and the traditional file system return
 dev_t, usually 32 integer ?
>>>
>>> As far as I found in xfs or ext4, their fsid is still u8[16] or uuid_t,
>>> same as btrfs.
>>>
>>> For ext4 it's ext4_super_block->s_uuid[16]
>>> And for xfs, it's xfs_sb->sb_uuid.
>>>
>>> I don't know how you get the dev_t parameter.
>>>
>>> Thanks,
>>> Qu
>>>


 On Sun, Jan 14, 2018 at 12:22 PM, Qu Wenruo  wrote:
>
>
> On 2018年01月14日 18:13, Ilan Schwarts wrote:
>> both btrfs filesystems will have same fsid ?
>>
>>
>> On Sun, Jan 14, 2018 at 12:06 PM, Ilan Schwarts  wrote:
>>> But both filesystems will have same fsid?
>>>
>>> On Jan 14, 2018 12:04, "Nikolay Borisov"  wrote:



 On 14.01.2018 12:02, Ilan Schwarts wrote:
> First of all, Thanks for response !
> So if i have 2 btrfs file system on the same machine (not your
> everyday scenario, i know)
>
> Not a problem, the 2 filesystems will have 2 different fsid.
>
> (And it's my everyday scenario, since fstests neeeds TEST_DEV and
> SCRATCH_DEV_POOL)
>
> Lets say a file is created on device A, the file gets inode number X
> is it possible on device B to have inode number X also ?
> or each device has its own Inode number range ?
>
> Forget the mess about device.
>
> Inode is bounded to a filesystem, not bounded to a device.
>
> Just traditional filesytems are normally bounded to a single device.
> (Although even traditional filesystems can have external journal devices)
>
> So there is nothing to do with device at all.
>
> And you can have same inode numbers in different filesystems, but
> BTRFS_I(inode)->root->fs_info will point to different fs_infos, with
> different fsid.
>
> So return to your initial question:
>> both btrfs filesystems will have same fsid ?
>
> No, different filesystems will have different fsid.
>
> (Unless you're SUUUPER lucky to have 2 filesystems with
> same fsid)
>
> Thanks,
> Qu
>
>

 Of course it is possible. Inodes are guaranteed to be unique only 
 across
 filesystem instances. In your case you are going to have 2 fs 
 instances.

>
> I need to create unique identifier for a file, I need to understand if
> the identifier would be: GlobalFSID_DeviceID_Inode or DeviceID_Inode
> is enough.
>
> Thanks
>
>
>
>
>
> On Sun, Jan 14, 2018 at 11:13 AM, Qu Wenruo 
> wrote:
>>
>>
>> On 2018年01月14日 16:33, Ilan Schwarts wrote:
>>> Hello btrfs developers/users,
>>>
>>> I was wondering regarding to fetching the correct fsid on btrfs from
>>> the context of a kernel module.
>>
>> There are two IDs for btrfs. (in fact more, but you properly won't 
>> need
>> the extra ids)
>>
>> FSID: Global one, one fs one FSID.
>> Device ID: Bonded to device, each device will have one.
>>
>> So in case of 2 devices btrfs, each device will has its own device 
>> id,
>> while both of the devices have the same fsid.
>>
>> And I think you're talking about the global fsid instead of device 
>> id.
>>
>>> if on suse11.3 kernel 3.0.101-0.47.71-default in order to get fsid, 

Re: Fwd: Fwd: Question regarding to Btrfs patchwork /2831525

2018-01-15 Thread Qu Wenruo


On 2018年01月15日 17:24, Ilan Schwarts wrote:
> Qu,
> Given inode, i get the fsid via: inode->i_sb->s_dev;
> this return dev_t and not u8/u16

That's just a device number.

Not really useful in btrfs, since btrfs is a multi-device filesystem.

Thanks,
Qu

> 
> 
> On Sun, Jan 14, 2018 at 12:44 PM, Qu Wenruo  wrote:
>>
>>
>> On 2018年01月14日 18:32, Ilan Schwarts wrote:
>>> Thank you for clarification.
>>> Just 2 quick questions,
>>> 1. Sub volumes - 2 sub volumes cannot have 2 same inode numbers ?
>>
>> They can.
>>
>> So to really locate an inode in btrfs, you need:
>>
>> fsid (locate the fs) -> subvolume id (locate subvolume) -> inode number.
>>
>> fsid can be feteched from superblock as mentioned in previous reply.
>>
>> subvolume id can be get from BTRFS_I(inode)->root.
>> And normally root is what you need.
>>
>> If you really want the number, then either
>> BTRFS_I(inode)->root->objectid or
>> BTRFS_I(inode)->root->root_key->objectid will give you the u64 subvolume id.
>>
>>> 2. Why fsInfo fsid return u8 and the traditional file system return
>>> dev_t, usually 32 integer ?
>>
>> As far as I found in xfs or ext4, their fsid is still u8[16] or uuid_t,
>> same as btrfs.
>>
>> For ext4 it's ext4_super_block->s_uuid[16]
>> And for xfs, it's xfs_sb->sb_uuid.
>>
>> I don't know how you get the dev_t parameter.
>>
>> Thanks,
>> Qu
>>
>>>
>>>
>>> On Sun, Jan 14, 2018 at 12:22 PM, Qu Wenruo  wrote:


 On 2018年01月14日 18:13, Ilan Schwarts wrote:
> both btrfs filesystems will have same fsid ?
>
>
> On Sun, Jan 14, 2018 at 12:06 PM, Ilan Schwarts  wrote:
>> But both filesystems will have same fsid?
>>
>> On Jan 14, 2018 12:04, "Nikolay Borisov"  wrote:
>>>
>>>
>>>
>>> On 14.01.2018 12:02, Ilan Schwarts wrote:
 First of all, Thanks for response !
 So if i have 2 btrfs file system on the same machine (not your
 everyday scenario, i know)

 Not a problem, the 2 filesystems will have 2 different fsid.

 (And it's my everyday scenario, since fstests neeeds TEST_DEV and
 SCRATCH_DEV_POOL)

 Lets say a file is created on device A, the file gets inode number X
 is it possible on device B to have inode number X also ?
 or each device has its own Inode number range ?

 Forget the mess about device.

 Inode is bounded to a filesystem, not bounded to a device.

 Just traditional filesytems are normally bounded to a single device.
 (Although even traditional filesystems can have external journal devices)

 So there is nothing to do with device at all.

 And you can have same inode numbers in different filesystems, but
 BTRFS_I(inode)->root->fs_info will point to different fs_infos, with
 different fsid.

 So return to your initial question:
> both btrfs filesystems will have same fsid ?

 No, different filesystems will have different fsid.

 (Unless you're SUUUPER lucky to have 2 filesystems with
 same fsid)

 Thanks,
 Qu


>>>
>>> Of course it is possible. Inodes are guaranteed to be unique only across
>>> filesystem instances. In your case you are going to have 2 fs instances.
>>>

 I need to create unique identifier for a file, I need to understand if
 the identifier would be: GlobalFSID_DeviceID_Inode or DeviceID_Inode
 is enough.

 Thanks





 On Sun, Jan 14, 2018 at 11:13 AM, Qu Wenruo 
 wrote:
>
>
> On 2018年01月14日 16:33, Ilan Schwarts wrote:
>> Hello btrfs developers/users,
>>
>> I was wondering regarding to fetching the correct fsid on btrfs from
>> the context of a kernel module.
>
> There are two IDs for btrfs. (in fact more, but you properly won't 
> need
> the extra ids)
>
> FSID: Global one, one fs one FSID.
> Device ID: Bonded to device, each device will have one.
>
> So in case of 2 devices btrfs, each device will has its own device id,
> while both of the devices have the same fsid.
>
> And I think you're talking about the global fsid instead of device id.
>
>> if on suse11.3 kernel 3.0.101-0.47.71-default in order to get fsid, I
>> do the following:
>> convert inode struct to btrfs_inode struct (use btrfsInode =
>> BTRFS_I(inode)), then from btrfs_inode struct i go to root field, and
>> from root i take anon_dev or anon_super.s_dev.
>> struct btrfs_inode *btrfsInode;
>> btrfsInode = BTRFS_I(inode);
>>btrfsInode->root->anon_super.s_devor
>>btrfsInode->root->anon_dev- depend on 

Re: Fwd: Fwd: Question regarding to Btrfs patchwork /2831525

2018-01-15 Thread Qu Wenruo


On 2018年01月15日 17:05, Ilan Schwarts wrote:
> Qu, Thank you very much for detailed response.
> 
> I would like to understand something, on VFS, it is guaranteed that in
> a given filesystem, only 1 inode number will be used, it is unique.> In 
> btrfs, you say the inode uniqueness is per volume, each volume has
> its own inode space, How is it possible ?


Not 100% sure on how VFS should handle an inode, but since each
filesystem has its own interfaces to handle inode allocation/drop/evict
and etc, I don't believe VFS will be so stupid to just use a u64 inode
number to distinguish different inodes (if VFS really needs to).

And since each inode is allocated by implementing fs, it's completely
fine for two VFS inodes have same inode number.
But in fact such two inodes will still be different as their fs-specific
inode structures are still different.

> 
> Thats why when I execute "stat /somepath/file" I receive fsid that
> looks like "36h/54d" but from kernel code, If I examine struct
> fs_info->fsid i get 52.

That's not FSID!!

That's device!

Define what you really need first.

For FSID, that should be something in lsblk output like:
├─nvme0n1p1 vfat   5188-EF6C/boot
  ├─system-root xfs07179caf-b406-4357-8cd8-3268c6238fb6   /
   
   This is FSID! ID to identify a fs.

And each fs can have their own FSID schema, just as you can see, FAT32
FSID is only 4 bytes, while XFS has 16 bytes FSID.

So there is no generic way to get a fsid.

And the "false" fsid is just device, just like stat command shows:
stat  /mnt/btrfs/
  File: /mnt/btrfs/
  Size: 6   Blocks: 0  IO Block: 4096   directory
Device: fe00h/65024dInode: 75732   Links: 2
^^

And for btrfs, I'm not pretty sure if the device of an inode has any
real meaning.

> I call this Physical/Virtual, physical is the real id - 52, and
> virtual is 36h/54d, because this what btrfs implementation returns..
> from where is that property taken ?

Don't call it, check all man pages of stat first.

And I really don't understand how you get the wrong understanding of fsid.
There is even no string "fsid" in include/linux/fs.h.

> 
> The inode number inode->i_ino is the same from both userspace (stat
> ...) and kernel code.
> 
> If on the same filesystem (52)

Check the correct man pages.
That 52 is your major and minor block device number of the device for
the containing fs.

>, you say, same inode number can be
> used, as long as they are on different volumes - Is it possible ?

So in short, yet.

> Doesn't it break the VFS inode uniqueness ?

No. VFS inode is just part of fs-specific inode structure.
In btrfs' case, VFS inode is just btrfs_inode->vfs_inode.

So even vfs_inode have same inode number, they are still different inodes.

> Is there also a virtual/physical inode numbers ?

That's device. And yes it's possible to get that device number.
But almost meaningless for btrfs, since btrfs can be across several disks.

(This needs to understand btrfs chunk mapping first)

> and if so, is it
> possible to get from kernel structure ?

For what reason? It's not that useful in btrfs.

> because inode->i_ino always
> return what stat returns.. unlike fsid as i wrote above.

I think you should build a correct understanding of what an
inode/filesystem is.

And most importantly, read newer kernel source instead of some ancient
random vendor specified kernel source.

Thanks,
Qu

> 
> Thanks !!
> 
> 
> 
> 
> 
> On Sun, Jan 14, 2018 at 12:44 PM, Qu Wenruo  wrote:
>>
>>
>> On 2018年01月14日 18:32, Ilan Schwarts wrote:
>>> Thank you for clarification.
>>> Just 2 quick questions,
>>> 1. Sub volumes - 2 sub volumes cannot have 2 same inode numbers ?
>>
>> They can.
>>
>> So to really locate an inode in btrfs, you need:
>>
>> fsid (locate the fs) -> subvolume id (locate subvolume) -> inode number.
>>
>> fsid can be feteched from superblock as mentioned in previous reply.
>>
>> subvolume id can be get from BTRFS_I(inode)->root.
>> And normally root is what you need.
>>
>> If you really want the number, then either
>> BTRFS_I(inode)->root->objectid or
>> BTRFS_I(inode)->root->root_key->objectid will give you the u64 subvolume id.
>>
>>> 2. Why fsInfo fsid return u8 and the traditional file system return
>>> dev_t, usually 32 integer ?
>>
>> As far as I found in xfs or ext4, their fsid is still u8[16] or uuid_t,
>> same as btrfs.
>>
>> For ext4 it's ext4_super_block->s_uuid[16]
>> And for xfs, it's xfs_sb->sb_uuid.
>>
>> I don't know how you get the dev_t parameter.
>>
>> Thanks,
>> Qu
>>
>>>
>>>
>>> On Sun, Jan 14, 2018 at 12:22 PM, Qu Wenruo  wrote:


 On 2018年01月14日 18:13, Ilan Schwarts wrote:
> both btrfs filesystems will have same fsid ?
>
>
> On Sun, Jan 14, 2018 at 12:06 PM, Ilan Schwarts  wrote:
>> But both 

big volumes only work reliable with ssd_spread

2018-01-15 Thread Stefan Priebe - Profihost AG
Hello,

since around two or three years i'm using btrfs for incremental VM backups.

some data:
- volume size 60TB
- around 2000 subvolumes
- each differential backup stacks on top of a subvolume
- compress-force=zstd
- space_cache=v2
- no quote / qgroup

this works fine since Kernel 4.14 except that i need ssd_spread as an
option. If i do not use ssd_spread i always end up with very slow
performance and a single kworker process using 100% CPU after some days.

With ssd_spread those boxes run fine since around 6 month. Is this
something expected? I haven't found any hint regarding such an impact.

Thanks!

Greets,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: Fwd: Question regarding to Btrfs patchwork /2831525

2018-01-15 Thread Ilan Schwarts
Qu,
Given inode, i get the fsid via: inode->i_sb->s_dev;
this return dev_t and not u8/u16



On Sun, Jan 14, 2018 at 12:44 PM, Qu Wenruo  wrote:
>
>
> On 2018年01月14日 18:32, Ilan Schwarts wrote:
>> Thank you for clarification.
>> Just 2 quick questions,
>> 1. Sub volumes - 2 sub volumes cannot have 2 same inode numbers ?
>
> They can.
>
> So to really locate an inode in btrfs, you need:
>
> fsid (locate the fs) -> subvolume id (locate subvolume) -> inode number.
>
> fsid can be feteched from superblock as mentioned in previous reply.
>
> subvolume id can be get from BTRFS_I(inode)->root.
> And normally root is what you need.
>
> If you really want the number, then either
> BTRFS_I(inode)->root->objectid or
> BTRFS_I(inode)->root->root_key->objectid will give you the u64 subvolume id.
>
>> 2. Why fsInfo fsid return u8 and the traditional file system return
>> dev_t, usually 32 integer ?
>
> As far as I found in xfs or ext4, their fsid is still u8[16] or uuid_t,
> same as btrfs.
>
> For ext4 it's ext4_super_block->s_uuid[16]
> And for xfs, it's xfs_sb->sb_uuid.
>
> I don't know how you get the dev_t parameter.
>
> Thanks,
> Qu
>
>>
>>
>> On Sun, Jan 14, 2018 at 12:22 PM, Qu Wenruo  wrote:
>>>
>>>
>>> On 2018年01月14日 18:13, Ilan Schwarts wrote:
 both btrfs filesystems will have same fsid ?


 On Sun, Jan 14, 2018 at 12:06 PM, Ilan Schwarts  wrote:
> But both filesystems will have same fsid?
>
> On Jan 14, 2018 12:04, "Nikolay Borisov"  wrote:
>>
>>
>>
>> On 14.01.2018 12:02, Ilan Schwarts wrote:
>>> First of all, Thanks for response !
>>> So if i have 2 btrfs file system on the same machine (not your
>>> everyday scenario, i know)
>>>
>>> Not a problem, the 2 filesystems will have 2 different fsid.
>>>
>>> (And it's my everyday scenario, since fstests neeeds TEST_DEV and
>>> SCRATCH_DEV_POOL)
>>>
>>> Lets say a file is created on device A, the file gets inode number X
>>> is it possible on device B to have inode number X also ?
>>> or each device has its own Inode number range ?
>>>
>>> Forget the mess about device.
>>>
>>> Inode is bounded to a filesystem, not bounded to a device.
>>>
>>> Just traditional filesytems are normally bounded to a single device.
>>> (Although even traditional filesystems can have external journal devices)
>>>
>>> So there is nothing to do with device at all.
>>>
>>> And you can have same inode numbers in different filesystems, but
>>> BTRFS_I(inode)->root->fs_info will point to different fs_infos, with
>>> different fsid.
>>>
>>> So return to your initial question:
 both btrfs filesystems will have same fsid ?
>>>
>>> No, different filesystems will have different fsid.
>>>
>>> (Unless you're SUUUPER lucky to have 2 filesystems with
>>> same fsid)
>>>
>>> Thanks,
>>> Qu
>>>
>>>
>>
>> Of course it is possible. Inodes are guaranteed to be unique only across
>> filesystem instances. In your case you are going to have 2 fs instances.
>>
>>>
>>> I need to create unique identifier for a file, I need to understand if
>>> the identifier would be: GlobalFSID_DeviceID_Inode or DeviceID_Inode
>>> is enough.
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Jan 14, 2018 at 11:13 AM, Qu Wenruo 
>>> wrote:


 On 2018年01月14日 16:33, Ilan Schwarts wrote:
> Hello btrfs developers/users,
>
> I was wondering regarding to fetching the correct fsid on btrfs from
> the context of a kernel module.

 There are two IDs for btrfs. (in fact more, but you properly won't need
 the extra ids)

 FSID: Global one, one fs one FSID.
 Device ID: Bonded to device, each device will have one.

 So in case of 2 devices btrfs, each device will has its own device id,
 while both of the devices have the same fsid.

 And I think you're talking about the global fsid instead of device id.

> if on suse11.3 kernel 3.0.101-0.47.71-default in order to get fsid, I
> do the following:
> convert inode struct to btrfs_inode struct (use btrfsInode =
> BTRFS_I(inode)), then from btrfs_inode struct i go to root field, and
> from root i take anon_dev or anon_super.s_dev.
> struct btrfs_inode *btrfsInode;
> btrfsInode = BTRFS_I(inode);
>btrfsInode->root->anon_super.s_devor
>btrfsInode->root->anon_dev- depend on kernel.

 The most directly method would be:

 btrfs_inode->root->fs_info->fsid.
 (For newer kernel, as I'm not familiar with older kernels)

 Or from superblock:
 btrfs_inode->root->fs_info->super_copy->fsid.
 (The most reliable one, no matter 

Re: [PATCH 0/7] Misc btrfs-progs cleanups/fixes

2018-01-15 Thread Nikolay Borisov


On  5.12.2017 10:39, Nikolay Borisov wrote:
> Here is a series doing some minor code cleanups, hopefully making the code 
> more idiomatic and easier to follow. They should be pretty low-risk and 
> introduce no functional changes (patches 1-5). 
> 
> The the last 2 patches deal with a regression of btrfs rescue super-recovery. 
> Turns out this was broken for sometime. Patch 6 introduces a regression test
> which hopefully will prevent further occurences and patch 7 fixes the actual 
> bug. 
> 
> Nikolay Borisov (7):
>   btrfs-progs: Explictly state test.sh must be executable
>   btrfs-progs: Factor out common print_device_info
>   btrfs-progs: Remove recover_get_good_super
>   btrfs-progs: Use list_for_each_entry in write_dev_all_supers
>   btrfs-progs: Document logic of btrfs_read_dev_super
>   btrfs-progs: Add test for super block recovery
>   btrfs-progs: Fix super-recovery
> 
>  chunk-recover.c  | 18 ---
>  disk-io.c| 21 ++--
>  super-recover.c  | 28 ++-
>  tests/README.md  |  4 +-
>  tests/fsck-tests/029-superblock-recovery/test.sh | 64 
> 
>  utils.c  | 18 +++
>  utils.h  |  3 ++
>  7 files changed, 110 insertions(+), 46 deletions(-)
>  create mode 100755 tests/fsck-tests/029-superblock-recovery/test.sh


Gentle ping since I'd like to get this into next btrfs-progs version,
especially the  "fix super-recovery" patch.

> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: Fwd: Question regarding to Btrfs patchwork /2831525

2018-01-15 Thread Ilan Schwarts
Qu, Thank you very much for detailed response.

I would like to understand something, on VFS, it is guaranteed that in
a given filesystem, only 1 inode number will be used, it is unique.
In btrfs, you say the inode uniqueness is per volume, each volume has
its own inode space, How is it possible ?

Thats why when I execute "stat /somepath/file" I receive fsid that
looks like "36h/54d" but from kernel code, If I examine struct
fs_info->fsid i get 52.
I call this Physical/Virtual, physical is the real id - 52, and
virtual is 36h/54d, because this what btrfs implementation returns..
from where is that property taken ?

The inode number inode->i_ino is the same from both userspace (stat
...) and kernel code.

If on the same filesystem (52), you say, same inode number can be
used, as long as they are on different volumes - Is it possible ?
Doesn't it break the VFS inode uniqueness ?
Is there also a virtual/physical inode numbers ? and if so, is it
possible to get from kernel structure ? because inode->i_ino always
return what stat returns.. unlike fsid as i wrote above.

Thanks !!





On Sun, Jan 14, 2018 at 12:44 PM, Qu Wenruo  wrote:
>
>
> On 2018年01月14日 18:32, Ilan Schwarts wrote:
>> Thank you for clarification.
>> Just 2 quick questions,
>> 1. Sub volumes - 2 sub volumes cannot have 2 same inode numbers ?
>
> They can.
>
> So to really locate an inode in btrfs, you need:
>
> fsid (locate the fs) -> subvolume id (locate subvolume) -> inode number.
>
> fsid can be feteched from superblock as mentioned in previous reply.
>
> subvolume id can be get from BTRFS_I(inode)->root.
> And normally root is what you need.
>
> If you really want the number, then either
> BTRFS_I(inode)->root->objectid or
> BTRFS_I(inode)->root->root_key->objectid will give you the u64 subvolume id.
>
>> 2. Why fsInfo fsid return u8 and the traditional file system return
>> dev_t, usually 32 integer ?
>
> As far as I found in xfs or ext4, their fsid is still u8[16] or uuid_t,
> same as btrfs.
>
> For ext4 it's ext4_super_block->s_uuid[16]
> And for xfs, it's xfs_sb->sb_uuid.
>
> I don't know how you get the dev_t parameter.
>
> Thanks,
> Qu
>
>>
>>
>> On Sun, Jan 14, 2018 at 12:22 PM, Qu Wenruo  wrote:
>>>
>>>
>>> On 2018年01月14日 18:13, Ilan Schwarts wrote:
 both btrfs filesystems will have same fsid ?


 On Sun, Jan 14, 2018 at 12:06 PM, Ilan Schwarts  wrote:
> But both filesystems will have same fsid?
>
> On Jan 14, 2018 12:04, "Nikolay Borisov"  wrote:
>>
>>
>>
>> On 14.01.2018 12:02, Ilan Schwarts wrote:
>>> First of all, Thanks for response !
>>> So if i have 2 btrfs file system on the same machine (not your
>>> everyday scenario, i know)
>>>
>>> Not a problem, the 2 filesystems will have 2 different fsid.
>>>
>>> (And it's my everyday scenario, since fstests neeeds TEST_DEV and
>>> SCRATCH_DEV_POOL)
>>>
>>> Lets say a file is created on device A, the file gets inode number X
>>> is it possible on device B to have inode number X also ?
>>> or each device has its own Inode number range ?
>>>
>>> Forget the mess about device.
>>>
>>> Inode is bounded to a filesystem, not bounded to a device.
>>>
>>> Just traditional filesytems are normally bounded to a single device.
>>> (Although even traditional filesystems can have external journal devices)
>>>
>>> So there is nothing to do with device at all.
>>>
>>> And you can have same inode numbers in different filesystems, but
>>> BTRFS_I(inode)->root->fs_info will point to different fs_infos, with
>>> different fsid.
>>>
>>> So return to your initial question:
 both btrfs filesystems will have same fsid ?
>>>
>>> No, different filesystems will have different fsid.
>>>
>>> (Unless you're SUUUPER lucky to have 2 filesystems with
>>> same fsid)
>>>
>>> Thanks,
>>> Qu
>>>
>>>
>>
>> Of course it is possible. Inodes are guaranteed to be unique only across
>> filesystem instances. In your case you are going to have 2 fs instances.
>>
>>>
>>> I need to create unique identifier for a file, I need to understand if
>>> the identifier would be: GlobalFSID_DeviceID_Inode or DeviceID_Inode
>>> is enough.
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Jan 14, 2018 at 11:13 AM, Qu Wenruo 
>>> wrote:


 On 2018年01月14日 16:33, Ilan Schwarts wrote:
> Hello btrfs developers/users,
>
> I was wondering regarding to fetching the correct fsid on btrfs from
> the context of a kernel module.

 There are two IDs for btrfs. (in fact more, but you properly won't need
 the extra ids)

 FSID: Global one, one fs one FSID.
 Device ID: Bonded to device, each device will have one.

 So in case of 2 devices btrfs, each device will has its own device 

Re: invalid files names, btrfs check can't repair it

2018-01-15 Thread Sebastian Andrzej Siewior
On 2018-01-15 09:26:27 [+0800], Qu Wenruo wrote:
> Please run the following command too:
> 
> # btrfs inspect dump-tree  | grep -C20 \(57923894

~# btrfs inspect dump-tree /dev/sdb4 | grep -C20 \(57923894
ctime 1515602448.66422211 (2018-01-10 17:40:48)
mtime 1515602448.66422211 (2018-01-10 17:40:48)
otime 1513266995.540343055 (2017-12-14 16:56:35)
item 10 key (57643872 INODE_REF 682) itemoff 3363 itemsize 13
index 58 namelen 3 name: tmp
item 11 key (57648595 INODE_ITEM 0) itemoff 3203 itemsize 160
generation 89045 transid 89423 size 8350 nbytes 0
block group 0 mode 40755 links 1 uid 1000 gid 1000 rdev 0
sequence 0 flags 0xc90(none)
atime 1513267009.164686143 (2017-12-14 16:56:49)
ctime 1513868329.753150507 (2017-12-21 15:58:49)
mtime 1513868329.753150507 (2017-12-21 15:58:49)
otime 1513267009.164686143 (2017-12-14 16:56:49)
item 12 key (57648595 INODE_REF 57643659) itemoff 3192 itemsize 11
index 113 namelen 1 name: d
item 13 key (57648595 DIR_ITEM 3331247447) itemoff 3123 itemsize 69
location key (58472210 INODE_ITEM 0) type FILE
transid 89418 data_len 0 name_len 8231
name: 454bf066ddfbf42e0f3b77ea71c82f-878732.oq
item 14 key (57648595 DIR_ITEM 3363354030) itemoff 3053 itemsize 70
location key (57923894 INODE_ITEM 0) type DIR_ITEM.33
transid 89142 data_len 0 name_len 40
name: 2f3f379b2a3d7499471edb74869efe-1948311.d
item 15 key (57648595 DIR_INDEX 435) itemoff 2983 itemsize 70
location key (57923894 INODE_ITEM 0) type FILE
transid 89142 data_len 0 name_len 40
name: 2f3f379b2a3d7499471edb74869efe-1948311.d
item 16 key (57648595 DIR_INDEX 1137) itemoff 2914 itemsize 69
location key (58472210 INODE_ITEM 0) type FILE
transid 89418 data_len 0 name_len 39
name: 454bf066ddfbf42e0f3b77ea71c82f-878732.o
item 17 key (57923894 INODE_ITEM 0) itemoff 2754 itemsize 160
generation 89142 transid 89142 size 36092 nbytes 36864
block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0
sequence 0 flags 0x91(none)
atime 1513278413.460486168 (2017-12-14 20:06:53)
ctime 1513278413.460486168 (2017-12-14 20:06:53)
mtime 1513278413.460486168 (2017-12-14 20:06:53)
otime 1513278413.460486168 (2017-12-14 20:06:53)
item 18 key (57923894 INODE_REF 57648595) itemoff 2704 itemsize 50
index 435 namelen 40 name: 
2f3f379b2a3d7499471edb74869efe-1948311.d
item 19 key (57923894 EXTENT_DATA 0) itemoff 2651 itemsize 53
generation 89142 type 1 (regular)
extent data disk byte 123290755072 nr 36864
extent data offset 0 nr 36864 ram 36864
extent compression 0 (none)
item 20 key (58191388 INODE_ITEM 0) itemoff 2491 itemsize 160
generation 89259 transid 89259 size 395280 nbytes 397312
block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0
sequence 0 flags 0xa4(none)
atime 1513332325.477020047 (2017-12-15 11:05:25)
ctime 1513332325.477020047 (2017-12-15 11:05:25)
mtime 1513332325.477020047 (2017-12-15 11:05:25)
otime 1513332325.477020047 (2017-12-15 11:05:25)
item 21 key (58191388 INODE_REF 40424284) itemoff 2470 itemsize 21
index 1094 namelen 11 name: bzImage.tsc
leaf 146426621952 items 46 free space 11 generation 89680 owner 5
leaf 146426621952 flags 0x1(WRITTEN) backref revision 1
fs uuid b3bfb56e-d445-4335-93f0-c1fb2d1f6df1
chunk uuid 732d73c9-d037-4406-8dcb-dfa101bc5a9b
item 0 key (58191388 EXTENT_DATA 0) itemoff 3942 itemsize 53
generation 87303 type 1 (regular)

> > transid 89142 data_len 0 name_len 40
> > name: 2f3f379b2a3d7499471edb74869efe-1948311.d
> > item 16 key (57648595 DIR_INDEX 1137) itemoff 2914 itemsize 69
> > location key (58472210 INODE_ITEM 0) type FILE
> 
> And this command too:
> 
> # btrfs inspect dump-tree  | grep -C20 \(58472210

~# btrfs inspect dump-tree /dev/sdb4 | grep -C20 \(58472210
generation 89044 transid 89699 size 0 nbytes 0
block group 0 mode 40755 links 1 uid 1000 gid 1000 rdev 0
sequence 0 flags 0x603b3(none)
atime 1513266995.540343055 (2017-12-14 16:56:35)
ctime 1515602448.66422211 (2018-01-10 17:40:48)
mtime 1515602448.66422211 (2018-01-10 17:40:48)
otime 1513266995.540343055 (2017-12-14 16:56:35)
item 10 key (57643872 

Re: [PATCH v4 2/4] btrfs: cleanup btrfs_mount() using btrfs_mount_root()

2018-01-15 Thread Misono, Tomohiro
On 2018/01/12 19:14, Anand Jain wrote:
> 
> Misono,
> 
>   This change is causing subsequent (subvol) mount to fail when device
>   option is specified. The simplest eg for failure is ..
> mkfs.btrfs -qf /dev/sdc /dev/sdb
> mount -o device=/dev/sdb /dev/sdc /btrfs
> mount -o device=/dev/sdb /dev/sdc /btrfs1
>mount: /dev/sdc is already mounted or /btrfs1 busy
> 
>Looks like
>  blkdev_get_by_path() <-- is failing.
>  btrfs_scan_one_device()
>  btrfs_parse_early_options()
>  btrfs_mount()
> 
>   Which is due to different holders (viz. btrfs_root_fs_type and
>   btrfs_fs_type) one is used for vfs_mount and other for scan,
>   so they form different holders and can't let EXCL open which
>   is needed for both scan and open.
> 
> Thanks, Anand

Thanks for the reporting.
I'm sorry but I will be busy today and tomorrow, and the investigation will be
after Wednesday.

Regards,
Tomohiro Misono

> 
> 
> On 12/14/2017 04:25 PM, Misono, Tomohiro wrote:
>> Cleanup btrfs_mount() by using btrfs_mount_root(). This avoids getting
>> btrfs_mount() called twice in mount path.
>>
>> Old btrfs_mount() will do:
>> 0. VFS layer calls vfs_kern_mount() with registered file_system_type
>> (for btrfs, btrfs_fs_type). btrfs_mount() is called on the way.
>> 1. btrfs_parse_early_options() parses "subvolid=" mount option and set the
>> value to subvol_objectid. Otherwise, subvol_objectid has the initial
>> value of 0
>> 2. check subvol_objectid is 5 or not. Assume this time id is not 5, then
>> btrfs_mount() returns by calling mount_subvol()
>> 3. In mount_subvol(), original mount options are modified to contain
>> "subvolid=0" in setup_root_args(). Then, vfs_kern_mount() is called with
>> btrfs_fs_type and new options
>> 4. btrfs_mount() is called again
>> 5. btrfs_parse_early_options() parses "subvolid=0" and set 5 (instead of 0)
>> to subvol_objectid
>> 6. check subvol_objectid is 5 or not. This time id is 5 and mount_subvol()
>> is not called. btrfs_mount() finishes mounting a root
>> 7. (in mount_subvol()) with using a return vale of vfs_kern_mount(), it
>> calls mount_subtree()
>> 8. return subvolume's dentry
>>
>> Reusing the same file_system_type (and btrfs_mount()) for vfs_kern_mount()
>> is the cause of complication.
>>
>> Instead, new btrfs_mount() will do:
>> 1. parse subvol id related options for later use in mount_subvol()
>> 2. mount device's root by calling vfs_kern_mount() with
>> btrfs_root_fs_type, which is not registered to VFS by
>> register_filesystem(). As a result, btrfs_mount_root() is called
>> 3. return by calling mount_subvol()
>>
>> The code of 2. is moved from the first part of mount_subvol().
>>
>> Signed-off-by: Tomohiro Misono 
>> ---
>>   fs/btrfs/super.c | 193 
>> +++
>>   1 file changed, 65 insertions(+), 128 deletions(-)
>>
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index 14189ad47466..ce93d87b2a69 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -66,6 +66,11 @@
>>   #include 
>>   
>>   static const struct super_operations btrfs_super_ops;
>> +/*
>> + * btrfs_root_fs_type is used internally while
>> + * btrfs_fs_type is used for VFS layer.
>> + * See the comment at btrfs_mount for more detail.
>> + */
>>   static struct file_system_type btrfs_root_fs_type;
>>   static struct file_system_type btrfs_fs_type;
>>   
>> @@ -1404,48 +1409,11 @@ static char *setup_root_args(char *args)
>>   
>>   static struct dentry *mount_subvol(const char *subvol_name, u64 
>> subvol_objectid,
>> int flags, const char *device_name,
>> -   char *data)
>> +   char *data, struct vfsmount *mnt)
>>   {
>>  struct dentry *root;
>> -struct vfsmount *mnt = NULL;
>> -char *newargs;
>>  int ret;
>>   
>> -newargs = setup_root_args(data);
>> -if (!newargs) {
>> -root = ERR_PTR(-ENOMEM);
>> -goto out;
>> -}
>> -
>> -mnt = vfs_kern_mount(_fs_type, flags, device_name, newargs);
>> -if (PTR_ERR_OR_ZERO(mnt) == -EBUSY) {
>> -if (flags & SB_RDONLY) {
>> -mnt = vfs_kern_mount(_fs_type, flags & ~SB_RDONLY,
>> - device_name, newargs);
>> -} else {
>> -mnt = vfs_kern_mount(_fs_type, flags | SB_RDONLY,
>> - device_name, newargs);
>> -if (IS_ERR(mnt)) {
>> -root = ERR_CAST(mnt);
>> -mnt = NULL;
>> -goto out;
>> -}
>> -
>> -down_write(>mnt_sb->s_umount);
>> -ret = btrfs_remount(mnt->mnt_sb, , NULL);
>> -up_write(>mnt_sb->s_umount);
>> -if (ret < 0) {
>> -