[PATCH v3] Btrfs: Show a warning message if one of objectid reaches its highest value

2016-03-08 Thread Satoru Takeuchi
It's better to show a warning message for the exceptional case
that one of objectid (in most case, inode number) reaches its
highest value. For example, if inode cache is off and this event
happens, we can't create any file even if there are not so many files.
This message ease detecting such problem.

Signed-off-by: Satoru Takeuchi 
---
This patch can be applied to 4.5-rc7

V3:
- Show this message every time when hitting this problem
  for simplicity (thanks to the comment from Goffredo).
- Reflect Naota's comment
  - Keep the error code as is.
V2:
 - Reflect Filipe's comment
   - Use btrfs_warn() instead of WARN_ONCE()
   - Print the id of the tree
---
 fs/btrfs/inode-map.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index e50316c..5d97ee7 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -556,6 +556,9 @@ int btrfs_find_free_objectid(struct btrfs_root *root, u64 
*objectid)
mutex_lock(>objectid_mutex);

if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
+   btrfs_warn(root->fs_info,
+  "The objectid of root %llu reaches its highest 
value.\n",
+  root->root_key.objectid);
ret = -ENOSPC;
goto out;
}
-- 
2.5.0
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: Show a warning message if one of objectid reaches its highest value

2016-03-08 Thread Satoru Takeuchi

On 2016/03/09 11:32, Naohiro Aota wrote:

2016-03-07 12:05 GMT+09:00 Satoru Takeuchi :

- It's better to show a warning message for the exceptional case
   that one of objectid (in most case, inode number) reaches its
   highest value. Show this message only once to avoid filling
   dmesg with it.
- EOVERFLOW is more proper return value for this case.
   ENOSPC is for "No space left on device" case and objectid isn't
   related to any device.


I have concern about EOVERFLOW. The value returned here will
go through to the user space via btrfs_find_free_ino() and btrfs_create().
It means that "creat" and "mkdir" can now return EOVERFLOW when it
failed to assign new inode number. Such behavior would disagree with
other file systems, which result in user space programs to be
confused.

Also, I don't think EOVERFLOW described in "creat(2)" (or open(2))
suits for this case. As far as I read the following man page from
creat(2),
giving ENOSPC is better option here.


I consider, as I read man, ENOSPC is also doesn't explain
this case. It's not related to pathname.

man 2 creat:

ENOSPC pathname was to be created but the device
containing pathname has no room for the new file.


However, I agree with the ENOSPC is better than EOVERFLOW
because existing code has worked with the former value
in such case.

Next patch will keep the error code as is.

Thank you for your comment, Naota.

Satoru




ENOSPC: pathname was to be created but the device containing pathname has  no  
room  for  the  new file.
EOVERFLOW:
  pathname refers to a regular file that is too large to be opened.
(snip)



Signed-off-by: Satoru Takeuchi 
---
This patch can be applied to 4.5-rc7
---
  fs/btrfs/inode-map.c | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index e50316c..f5e3228 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -556,7 +556,15 @@ int btrfs_find_free_objectid(struct btrfs_root *root, u64 
*objectid)
 mutex_lock(>objectid_mutex);

 if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
-   ret = -ENOSPC;
+   static bool __warned = false;
+
+   if (unlikely(!__warned)) {
+   btrfs_warn(root->fs_info,
+  "The objectid of root %llu reaches its highest 
value.\n",
+  root->root_key.objectid);
+   __warned = true;
+   }
+   ret = -EOVERFLOW;
 goto out;
 }

--
2.5.0
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: Show a warning message if one of objectid reaches its highest value

2016-03-08 Thread Satoru Takeuchi
On 2016/03/09 4:24, Goffredo Baroncelli wrote:
> On 2016-03-07 04:05, Satoru Takeuchi wrote:
>> - It's better to show a warning message for the exceptional case
>> that one of objectid (in most case, inode number) reaches its
>> highest value. Show this message only once to avoid filling
>> dmesg with it.
>> - EOVERFLOW is more proper return value for this case.
>> ENOSPC is for "No space left on device" case and objectid isn't
>> related to any device.
>>
>> Signed-off-by: Satoru Takeuchi 
>> ---
>> This patch can be applied to 4.5-rc7
>> ---
>>fs/btrfs/inode-map.c | 10 +-
>>1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
>> index e50316c..f5e3228 100644
>> --- a/fs/btrfs/inode-map.c
>> +++ b/fs/btrfs/inode-map.c
>> @@ -556,7 +556,15 @@ int btrfs_find_free_objectid(struct btrfs_root *root, 
>> u64 *objectid)
>>  mutex_lock(>objectid_mutex);
>>
>>  if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
>> -ret = -ENOSPC;
>> +static bool __warned = false;
> 
> 
> Please, don't use a static GLOBAL variable. I suggest to move to a "per 
> filesystem" variables for two main reasons:
> 
> 1) if in the (very unlikely) case where two different filesystems reach 
> BTRFS_LAST_FREE_OBJECTID, the first error will hide the second one.
> 
> 2) if you umount and remount the filesystem the error is not shown anymore. A 
> module unload/load or a reboot is required.
> If something strange happens, one of the first thing that the user does, is 
> to umount/remount the filesystem. But the error is not show anymore. This 
> could complicate the diagnosis of the problem.

OK, I see.

I rethink what should this patch be since it's
overkill to prepare per-filesystem variable just
for this warning message.

I'll resend patch which shows this message
every time when hitting this condition instead of
does it just once.

Thanks,
Satoru

> 
> 
> 
> 
>> +
>> +if (unlikely(!__warned)) {
>> +btrfs_warn(root->fs_info,
>> +   "The objectid of root %llu reaches its 
>> highest value.\n",
>> +   root->root_key.objectid);
>> +__warned = true;
>> +}
>> +ret = -EOVERFLOW;
>>  goto out;
>>  }
>>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: Show a warning message if one of objectid reaches its highest value

2016-03-08 Thread Naohiro Aota
2016-03-07 12:05 GMT+09:00 Satoru Takeuchi :
> - It's better to show a warning message for the exceptional case
>   that one of objectid (in most case, inode number) reaches its
>   highest value. Show this message only once to avoid filling
>   dmesg with it.
> - EOVERFLOW is more proper return value for this case.
>   ENOSPC is for "No space left on device" case and objectid isn't
>   related to any device.

I have concern about EOVERFLOW. The value returned here will
go through to the user space via btrfs_find_free_ino() and btrfs_create().
It means that "creat" and "mkdir" can now return EOVERFLOW when it
failed to assign new inode number. Such behavior would disagree with
other file systems, which result in user space programs to be
confused.

Also, I don't think EOVERFLOW described in "creat(2)" (or open(2))
suits for this case. As far as I read the following man page from
creat(2),
giving ENOSPC is better option here.

> ENOSPC: pathname was to be created but the device containing pathname has  no 
>  room  for  the  new file.
> EOVERFLOW:
>  pathname refers to a regular file that is too large to be opened.
> (snip)

> Signed-off-by: Satoru Takeuchi 
> ---
> This patch can be applied to 4.5-rc7
> ---
>  fs/btrfs/inode-map.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
> index e50316c..f5e3228 100644
> --- a/fs/btrfs/inode-map.c
> +++ b/fs/btrfs/inode-map.c
> @@ -556,7 +556,15 @@ int btrfs_find_free_objectid(struct btrfs_root *root, 
> u64 *objectid)
> mutex_lock(>objectid_mutex);
>
> if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
> -   ret = -ENOSPC;
> +   static bool __warned = false;
> +
> +   if (unlikely(!__warned)) {
> +   btrfs_warn(root->fs_info,
> +  "The objectid of root %llu reaches its 
> highest value.\n",
> +  root->root_key.objectid);
> +   __warned = true;
> +   }
> +   ret = -EOVERFLOW;
> goto out;
> }
>
> --
> 2.5.0
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: util: Fix a wrong unit of pretty_size

2016-03-08 Thread Qu Wenruo
If parameter for pretty_size is smaller than default base(1024),
pretty_size() will output wrong unit.
For example, pretty_size(1008) will output '0.98B' not '1008B' or
'0.98KiB'.

The cause is, for default base and auto-detect unit, base will be 1024
but num_divs is still 0, last result will still be divided by base,
causing the bug.

Fix it by checking num_divs in default case, and if num_divs is 0,
change base to 1.

Signed-off-by: Qu Wenruo 
---
 utils.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/utils.c b/utils.c
index f814017..52237db 100644
--- a/utils.c
+++ b/utils.c
@@ -1695,6 +1695,13 @@ int pretty_size_snprintf(u64 size, char *str, size_t 
str_size, unsigned unit_mod
size /= mult;
num_divs++;
}
+   /*
+* If the value is smaller than base, we didn't do any
+* division, in that case, base should be 1, not original
+* base, or the unit will be wrong
+*/
+   if (num_divs == 0)
+   base = 1;
}
 
if (num_divs >= ARRAY_SIZE(unit_suffix_binary)) {
-- 
2.7.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Documentation: btrfs: remove usage specific information

2016-03-08 Thread Satoru Takeuchi
On 2016/03/09 2:04, David Sterba wrote:
> The document in the kernel sources is yet another palce where the
> documentation would need to be updated, while it is not the primary
> source. We actively maintain the wiki pages.
> 
> Signed-off-by: David Sterba 
> ---
>   Documentation/filesystems/btrfs.txt | 261 
> ++--
>   1 file changed, 11 insertions(+), 250 deletions(-)
> 
> diff --git a/Documentation/filesystems/btrfs.txt 
> b/Documentation/filesystems/btrfs.txt
> index c772b47e7ef0..f9dad22d95ce 100644
> --- a/Documentation/filesystems/btrfs.txt
> +++ b/Documentation/filesystems/btrfs.txt
> @@ -1,20 +1,10 @@
> -
>   BTRFS
>   =
>   
...
> -* btrfs-convert: in-place conversion from ext2/3/4 filesystems
> +  https://btrfs.wiki.kernel.org
>   
> -* btrfs-image: dump filesystem metadata for debugging
> +that maintains information about administration tasks, frequently asked
> +questions, use cases, mount options, comprehensible changelogs, features,
> +manual pages, source code repositories, contacts etc.

About mount options, we also have "man 5 btrfs" and it's
newer than wiki page's one.

https://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg34545.html
commit 26341f734d6d ("btrfs-progs: add mount options to btrfs-mount.5")

Last update:
 - Btrfs wiki -> Mount options: Sep 19 2015
 - btrfs-man5.asciidoc: Jan 11 2016

So, how about refer to "man 5 btrfs" from
"Btrfs wiki -> Mount options -> List of options"
and/or this document?

Thanks,
Satoru
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix broken 'device scan' arguments parsing

2016-03-08 Thread Yauhen Kharuzhy
commit 52179e4fea41e55f31c92cd033a0b53a5107b4f4 'btrfs-progs: unify argc
min/max checking' brokes 'btrfs device scan' command when no argument
was given. Fix this.

Signed-off-by: Yauhen Kharuzhy 
---
 cmds-device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cmds-device.c b/cmds-device.c
index cb470af..94ffdc5 100644
--- a/cmds-device.c
+++ b/cmds-device.c
@@ -273,7 +273,7 @@ static int cmd_device_scan(int argc, char **argv)
if (all && check_argc_max(argc - optind, 1))
usage(cmd_device_scan_usage);
 
-   if (all || argc - optind == 1) {
+   if (all || argc - optind == 0) {
printf("Scanning for Btrfs filesystems\n");
ret = btrfs_scan_lblkid();
error_on(ret, "error %d while scanning", ret);
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfsck memory usage reduce idea

2016-03-08 Thread Satoru Takeuchi

On 2016/03/08 17:46, Qu Wenruo wrote:


Satoru Takeuchi wrote on 2016/03/08 17:28 +0900:

Hi Qu,

On 2016/03/07 14:42, Qu Wenruo wrote:

Hi,

As many have already known, "btrfs check" is a memory eater.

The problem is, btrfsck checks extent tree in a very comprehensive
method.
1) Create extent_ref for each extent item with backref
2) Iterate all other trees to add extent ref
3) If one extent_ref with all ref/backref matches, it's deleted.

The method is good, can found any extent mismatch problem when
checking extent tree. (Although it has already iterated the whole fs)
For a large enough filesystem, it may have tegas of extents, and
memory is easy eaten up.


We hope to fix it in the following method:
1) Only check extent backref when iterating extent tree
Unlike current implement, we check one extent item and its backref
only.

If one backref can't be reached, then it's an error and output (or
try to fix).
After iterating all backref of an extent item, all related memory is
freed and we won't bother recording anything for later use.

That's to say, we only care backref mismatch case when checking
extent tree.
Case like missing EXTENT_ITEM for some extent is not checked here.

2) Check extent ref while iterating other trees
We only check forward-ref while iterating one tree.

In this step, we only check forward-ref, so we can find the remaining
problem like missing EXTENT_ITEM for given extent.

Any further advice/suggestion? Or is there anyone already doing such
work?


Thank you for your effort. I have basic questions.

1. Could you tell me what you'd like to do?

a) Provide completely the same function with current
   implementation by other, more efficient way.


Same function, but less efficient.
It may takes longer time, more IO, but less memory.


I see.



And some error message will be output at different time.
E.g, error message for missing backref may be output at fs tree checking time, 
instead of extent tree checking time.


It's OK if, finally, all error messages the same as the current
implementation are shown.




b) Replace the current implementation with the quicker
   one that provides the limited function.
c) Any other

2. Do you have the estimation that how long does the
new algorithm take compare with the current one?


Depends on the fs hierarchy. But in all case, IO will be more than original 
implement.

The most efficient case would be, one subvolume and no dedup file.
(which means one file extent refer to one extent on data, no in-band or
out-band dedup).

In that case, old implement will iterate the whole metadata twice,
and new implement will iterate the whole metadata twice + extra.

For worst case, like inband dedup with multiple almost identical snapshot,

> things will be much much slower, more IO, more tree search, maybe O(n^2)
> or more. But memory usage should not be much different though.


In short, use more IO to trade for memory.

Anyway, for a large fs, it won't be possible to take a short time for a 
comprehensive fsck.


Got it.

Thanks,
Satoru



Thanks,
Qu


# Of course, "currently not sure" is OK at this stage.

I'm interested in it because there is the trade-off
between speed and memory consumption in many case,
and btrfsck takes very long time with a large filesystem.

Thanks,
Satoru



Thanks,
Qu



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html






--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs and containers

2016-03-08 Thread Chris Murphy
On Tue, Mar 8, 2016 at 12:58 PM, Liu Bo  wrote:
> On Mon, Mar 07, 2016 at 04:45:09PM -0700, Chris Murphy wrote:
>> On Mon, Mar 7, 2016 at 3:55 PM, Tobias Hunger  
>> wrote:
>> > Hi,
>> >
>> > I have been running systemd-nspawn containers on top of a btrfs
>> > filesystem for a while now.
>> >
>> > This works great: Snapshots are a huge help to manage containers!
>> >
>> > But today I ran btrfs subvol list . *inside* a container. To my
>> > surprise I got a list of *all* subvolumes on that drive. That is
>> > basically a complete list of containers running on the machine. I do
>> > not want to have that kind of information exposed to my containers.
>> >
>> > Is there a way to stop btrfs from listing subvolumes "above" the
>> > current location? So that "btrfs subvol list /" in a container will
>> > only show subvolumes that are set up in the container?
>
> That's a good question.
>
> Looks like that "btrfs subvolume list -o" match the needs here.
>
>>
>> I'm not sure whether this is something that goes in Btrfs proper,
>> since this is presumably a privileged container? The same thing
>> happens with Docker containers. One way to do this is if it's not
>> privileged, as non-root can't list subvolumes. I think some work is
>> needed to make it possible for users to list subvolumes they own.
>> Right now a user can create a subvolume but then now list or get
>> information on it. By default they can't delete it either unless a
>> special mount option is used. So I think there's work that's needed
>> one way or another, and maybe in more than one part.
>
> Unfortunately, btrfs subvolume list 's various usage is built on top of 
> TREE_SEARCH ioctl
> which requires CAP_SYS_ADMIN.
>
> So what we need here might be to teach 'btrfs sub list' to recognize
> container's CAP_SYS_XXX (if this is possible).


Yes, it's a bit peculiar I can create subvolumes and snapshot them,
but can't 'btrfs sub list/show'

It's an open question why the user needs a subvolume, but I'm not
thinking of a human user necessarily but rather some service, maybe
it's httpd. Or maybe with the xdg-app stuff the Gnome folks are
working on it makes sense to encapsulate applications and their
updates in their own subvolume. *shrug*  I'm open to the idea that the
use case needs to be more compelling and detailed in order to get the
implementation right.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs and containers

2016-03-08 Thread Liu Bo
On Mon, Mar 07, 2016 at 04:45:09PM -0700, Chris Murphy wrote:
> On Mon, Mar 7, 2016 at 3:55 PM, Tobias Hunger  wrote:
> > Hi,
> >
> > I have been running systemd-nspawn containers on top of a btrfs
> > filesystem for a while now.
> >
> > This works great: Snapshots are a huge help to manage containers!
> >
> > But today I ran btrfs subvol list . *inside* a container. To my
> > surprise I got a list of *all* subvolumes on that drive. That is
> > basically a complete list of containers running on the machine. I do
> > not want to have that kind of information exposed to my containers.
> >
> > Is there a way to stop btrfs from listing subvolumes "above" the
> > current location? So that "btrfs subvol list /" in a container will
> > only show subvolumes that are set up in the container?

That's a good question.

Looks like that "btrfs subvolume list -o" match the needs here.

> 
> I'm not sure whether this is something that goes in Btrfs proper,
> since this is presumably a privileged container? The same thing
> happens with Docker containers. One way to do this is if it's not
> privileged, as non-root can't list subvolumes. I think some work is
> needed to make it possible for users to list subvolumes they own.
> Right now a user can create a subvolume but then now list or get
> information on it. By default they can't delete it either unless a
> special mount option is used. So I think there's work that's needed
> one way or another, and maybe in more than one part.

Unfortunately, btrfs subvolume list 's various usage is built on top of 
TREE_SEARCH ioctl 
which requires CAP_SYS_ADMIN.

So what we need here might be to teach 'btrfs sub list' to recognize
container's CAP_SYS_XXX (if this is possible?) 

Thanks,

-liubo

> 
> -- 
> Chris Murphy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: Show a warning message if one of objectid reaches its highest value

2016-03-08 Thread Goffredo Baroncelli
On 2016-03-07 04:05, Satoru Takeuchi wrote:
> - It's better to show a warning message for the exceptional case
>that one of objectid (in most case, inode number) reaches its
>highest value. Show this message only once to avoid filling
>dmesg with it.
> - EOVERFLOW is more proper return value for this case.
>ENOSPC is for "No space left on device" case and objectid isn't
>related to any device.
> 
> Signed-off-by: Satoru Takeuchi 
> ---
> This patch can be applied to 4.5-rc7
> ---
>   fs/btrfs/inode-map.c | 10 +-
>   1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
> index e50316c..f5e3228 100644
> --- a/fs/btrfs/inode-map.c
> +++ b/fs/btrfs/inode-map.c
> @@ -556,7 +556,15 @@ int btrfs_find_free_objectid(struct btrfs_root *root, 
> u64 *objectid)
>   mutex_lock(>objectid_mutex);
> 
>   if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
> - ret = -ENOSPC;
> + static bool __warned = false;


Please, don't use a static GLOBAL variable. I suggest to move to a "per 
filesystem" variables for two main reasons:

1) if in the (very unlikely) case where two different filesystems reach 
BTRFS_LAST_FREE_OBJECTID, the first error will hide the second one.

2) if you umount and remount the filesystem the error is not shown anymore. A 
module unload/load or a reboot is required. 
If something strange happens, one of the first thing that the user does, is to 
umount/remount the filesystem. But the error is not show anymore. This could 
complicate the diagnosis of the problem. 




> +
> + if (unlikely(!__warned)) {
> + btrfs_warn(root->fs_info,
> +"The objectid of root %llu reaches its 
> highest value.\n",
> +root->root_key.objectid);
> + __warned = true;
> + }
> + ret = -EOVERFLOW;
>   goto out;
>   }
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Documentation: btrfs: remove usage specific information

2016-03-08 Thread David Sterba
The document in the kernel sources is yet another palce where the
documentation would need to be updated, while it is not the primary
source. We actively maintain the wiki pages.

Signed-off-by: David Sterba 
---
 Documentation/filesystems/btrfs.txt | 261 ++--
 1 file changed, 11 insertions(+), 250 deletions(-)

diff --git a/Documentation/filesystems/btrfs.txt 
b/Documentation/filesystems/btrfs.txt
index c772b47e7ef0..f9dad22d95ce 100644
--- a/Documentation/filesystems/btrfs.txt
+++ b/Documentation/filesystems/btrfs.txt
@@ -1,20 +1,10 @@
-
 BTRFS
 =
 
-Btrfs is a copy on write filesystem for Linux aimed at
-implementing advanced features while focusing on fault tolerance,
-repair and easy administration. Initially developed by Oracle, Btrfs
-is licensed under the GPL and open for contribution from anyone.
-
-Linux has a wealth of filesystems to choose from, but we are facing a
-number of challenges with scaling to the large storage subsystems that
-are becoming common in today's data centers. Filesystems need to scale
-in their ability to address and manage large storage, and also in
-their ability to detect, repair and tolerate errors in the data stored
-on disk.  Btrfs is under heavy development, and is not suitable for
-any uses other than benchmarking and review. The Btrfs disk format is
-not yet finalized.
+Btrfs is a copy on write filesystem for Linux aimed at implementing advanced
+features while focusing on fault tolerance, repair and easy administration.
+Jointly developed by several companies, licensed under the GPL and open for
+contribution from anyone.
 
 The main Btrfs features include:
 
@@ -28,243 +18,14 @@ The main Btrfs features include:
 * Checksums on data and metadata (multiple algorithms available)
 * Compression
 * Integrated multiple device support, with several raid algorithms
-* Online filesystem check (not yet implemented)
-* Very fast offline filesystem check
-* Efficient incremental backup and FS mirroring (not yet implemented)
+* Offline filesystem check
+* Efficient incremental backup and FS mirroring
 * Online filesystem defragmentation
 
+For more information please refer to the wiki
 
-Mount Options
-=
-
-When mounting a btrfs filesystem, the following option are accepted.
-Options with (*) are default options and will not show in the mount options.
-
-  alloc_start=
-   Debugging option to force all block allocations above a certain
-   byte threshold on each block device.  The value is specified in
-   bytes, optionally with a K, M, or G suffix, case insensitive.
-   Default is 1MB.
-
-  noautodefrag(*)
-  autodefrag
-   Disable/enable auto defragmentation.
-   Auto defragmentation detects small random writes into files and queue
-   them up for the defrag process.  Works best for small files;
-   Not well suited for large database workloads.
-
-  check_int
-  check_int_data
-  check_int_print_mask=
-   These debugging options control the behavior of the integrity checking
-   module (the BTRFS_FS_CHECK_INTEGRITY config option required).
-
-   check_int enables the integrity checker module, which examines all
-   block write requests to ensure on-disk consistency, at a large
-   memory and CPU cost.
-
-   check_int_data includes extent data in the integrity checks, and
-   implies the check_int option.
-
-   check_int_print_mask takes a bitmask of BTRFSIC_PRINT_MASK_* values
-   as defined in fs/btrfs/check-integrity.c, to control the integrity
-   checker module behavior.
-
-   See comments at the top of fs/btrfs/check-integrity.c for more info.
-
-  commit=
-   Set the interval of periodic commit, 30 seconds by default. Higher
-   values defer data being synced to permanent storage with obvious
-   consequences when the system crashes. The upper bound is not forced,
-   but a warning is printed if it's more than 300 seconds (5 minutes).
-
-  compress
-  compress=
-  compress-force
-  compress-force=
-   Control BTRFS file data compression.  Type may be specified as "zlib"
-   "lzo" or "no" (for no compression, used for remounting).  If no type
-   is specified, zlib is used.  If compress-force is specified,
-   all files will be compressed, whether or not they compress well.
-   If compression is enabled, nodatacow and nodatasum are disabled.
-
-  degraded
-   Allow mounts to continue with missing devices.  A read-write mount may
-   fail with too many devices missing, for example if a stripe member
-   is completely missing.
-
-  device=
-   Specify a device during mount so that ioctls on the control device
-   can be avoided.  Especially useful when trying to mount a multi-device
-   setup as root.  May be specified multiple times for multiple devices.
-
-  nodiscard(*)
-  discard
-   Disable/enable discard mount option.
-   

Re: btrfs-progs and btrfs(8) inconsistencies

2016-03-08 Thread David Sterba
Let me answer here for the whole thread. There are lot of issues to
cover, I think the github way would work here, so the project you've
created for the purpose seems ok to me.

https://github.com/btrfs8-revamp

I assume way more dicussions than actual patches and coding, the web
platform offers commenting at the exact locations of files, issue
tracking works better than over mail.

Initially we'd cover all the stuff mentioned in this thread, naming,
conventions etc. After we agree we can start changing code in btrfs-progs.

We can ask for feedback wider audience in the mailinglist if needed,
possibly with a draft that would be reviewed already.

After we write comprehensive guidelines for the UI, we can add it to
btrfs-progs, I really want to avoid high churn in that repository for
things that are not related to code. For that the revamp repository is
good, the only requirement is a github account.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][btrfs-progs] populate fs with small dataset for convert-tests

2016-03-08 Thread Lakshmipathi.G
Signed-off-by: Lakshmipathi.G 
---
 tests/convert-tests.sh | 84 +++---
 1 file changed, 80 insertions(+), 4 deletions(-)

diff --git a/tests/convert-tests.sh b/tests/convert-tests.sh
index 0bfb41f..c1f3de0 100644
--- a/tests/convert-tests.sh
+++ b/tests/convert-tests.sh
@@ -10,16 +10,91 @@ LANG=C
 SCRIPT_DIR=$(dirname $(readlink -f $0))
 TOP=$(readlink -f $SCRIPT_DIR/../)
 RESULTS="$TOP/tests/convert-tests-results.txt"
+DATASET_SIZE="50" # how many files to create.
 
 source $TOP/tests/common
 
 rm -f $RESULTS
 
 setup_root_helper
-prepare_test_dev 256M
+prepare_test_dev 512M
 
 CHECKSUMTMP=$(mktemp --tmpdir btrfs-progs-convert.XX)
 
+generate_dataset() {
+
+   dataset_type="$1"
+   dirpath=$TEST_MNT/$dataset_type
+   mkdir -p $dirpath
+
+   case $dataset_type in
+   small)
+   for num in $(seq 1 $DATASET_SIZE);do
+   run_check $SUDO_HELPER dd if=/dev/urandom 
of=$dirpath/$dataset_type.$num bs=10K \
+   count=1 1>/dev/null 2>&1
+   done
+   ;;
+
+   hardlink)
+   for num in $(seq 1 $DATASET_SIZE);do
+   run_check $SUDO_HELPER touch 
$dirpath/$dataset_type.$num
+   run_check $SUDO_HELPER ln 
$dirpath/$dataset_type.$num $dirpath/hlink.$num
+   done
+   ;;
+
+   symlink)
+   for num in $(seq 1 $DATASET_SIZE);do
+   run_check $SUDO_HELPER touch 
$dirpath/$dataset_type.$num
+   run_check $SUDO_HELPER ln -s 
$dirpath/$dataset_type.$num $dirpath/slink.$num
+   done
+   ;;
+
+   brokenlink)
+   for num in $(seq 1 $DATASET_SIZE);do
+   run_check $SUDO_HELPER ln -s 
$dirpath/$dataset_type.$num $dirpath/blink.$num
+   done
+   ;;
+
+   perm)
+   for modes in $(seq 1 );do
+   if [[ "$modes" == *9* ]] || [[ "$modes" == *8* 
]]
+   then
+   continue;
+   else
+   run_check $SUDO_HELPER touch 
$dirpath/$dataset_type.$modes
+   run_check $SUDO_HELPER chmod $modes 
$dirpath/$dataset_type.$modes
+   fi
+   done
+   ;;
+
+   sparse)
+   for num in $(seq 1 $DATASET_SIZE);do
+   run_check $SUDO_HELPER dd if=/dev/urandom 
of=$dirpath/$dataset_type.$num bs=10K \
+   count=1 1>/dev/null 2>&1
+   run_check $SUDO_HELPER truncate -s 500K 
$dirpath/$dataset_type.$num
+   run_check $SUDO_HELPER dd if=/dev/urandom 
of=$dirpath/$dataset_type.$num bs=10K \
+   oflag=append conv=notrunc count=1 1>/dev/null 
2>&1
+   run_check $SUDO_HELPER truncate -s 800K 
$dirpath/$dataset_type.$num
+   done
+   ;;
+
+   acls)
+   for num in $(seq 1 $DATASET_SIZE);do
+   run_check $SUDO_HELPER touch 
$dirpath/$dataset_type.$num
+   run_check $SUDO_HELPER setfacl -m "u:root:x" 
$dirpath/$dataset_type.$num
+   run_check $SUDO_HELPER setfattr -n user.foo -v 
bar$num $dirpath/$dataset_type.$num
+   done
+   ;;
+   esac
+}
+
+populate_fs() {
+
+for dataset_type in 'small' 'hardlink' 'symlink' 'brokenlink' 'perm' 
'sparse' 'acls' ; do
+   generate_dataset "$dataset_type"
+   done
+}
+
 convert_test() {
local features
local nodesize
@@ -39,15 +114,16 @@ convert_test() {
# when test image is on NFS and would not be writable for root
run_check truncate -s 0 $TEST_DEV
# 256MB is the smallest acceptable btrfs image.
-   run_check truncate -s 256M $TEST_DEV
+   run_check truncate -s 512M $TEST_DEV
run_check $* -F $TEST_DEV
 
# create a file to check btrfs-convert can convert regular file
# correct
run_check_mount_test_dev
+   populate_fs;
run_check $SUDO_HELPER dd if=/dev/zero of=$TEST_MNT/test bs=$nodesize \
count=1 1>/dev/null 2>&1
-   run_check_stdout md5sum $TEST_MNT/test > $CHECKSUMTMP
+   run_check_stdout find $TEST_MNT -type f ! -name 'image' -exec md5sum {} 
\+ > $CHECKSUMTMP
run_check_umount_test_dev
 
run_check $TOP/btrfs-convert ${features:+-O 

Re: [PATCH 1/2] Btrfs: make mapping->writeback_index point to the last written page

2016-03-08 Thread Holger Hoffstätte
On 03/08/16 01:56, Liu Bo wrote:
> If sequential writer is writing in the middle of the page and it just 
> redirties
> the last written page by continuing from it.
> 
> In the above case this can end up with seeking back to that firstly redirtied
> page after writing all the pages at the end of file because btrfs updates
> mapping->writeback_index to 1 past the current one.
> 
> For non-cow filesystems, the cost is only about extra seek, while for cow 
> filesystems such as btrfs, it means unnecessary fragments.
> 
> To avoid it, we just need to continue writeback from the last written page.
> 
> This also updates btrfs to behave like what write_cache_pages() does, ie, bail
>  out immediately if there is an error in writepage().
> 
> https://www.spinics.net/lists/linux-btrfs/msg52628.html>
> 
> Reported-by: Holger Hoffstätte 
> Signed-off-by: Liu Bo 

Very nice, seems to work as advertised. Can't speak for the data integrity
implications, but for the functionality of both patches:

Tested-by: Holger Hoffstätte 

Thank you!
Holger

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs and containers

2016-03-08 Thread Austin S. Hemmelgarn

On 2016-03-07 17:55, Tobias Hunger wrote:

Hi,

I have been running systemd-nspawn containers on top of a btrfs
filesystem for a while now.

This works great: Snapshots are a huge help to manage containers!

But today I ran btrfs subvol list . *inside* a container. To my
surprise I got a list of *all* subvolumes on that drive. That is
basically a complete list of containers running on the machine. I do
not want to have that kind of information exposed to my containers.

Is there a way to stop btrfs from listing subvolumes "above" the
current location? So that "btrfs subvol list /" in a container will
only show subvolumes that are set up in the container?

There is not currently a way to do this.  My personal recommendation 
until there is would be to use LVM or something similar and have each 
container on it's own FS (this has other advantages too, like being able 
to use seed devices to quickly spin up containers in a known state.


Ideally though, we should be checking the current root directory when in 
a mount namespace, and not list subvolumes outside that tree.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfsck memory usage reduce idea

2016-03-08 Thread Qu Wenruo



Satoru Takeuchi wrote on 2016/03/08 17:28 +0900:

Hi Qu,

On 2016/03/07 14:42, Qu Wenruo wrote:

Hi,

As many have already known, "btrfs check" is a memory eater.

The problem is, btrfsck checks extent tree in a very comprehensive
method.
1) Create extent_ref for each extent item with backref
2) Iterate all other trees to add extent ref
3) If one extent_ref with all ref/backref matches, it's deleted.

The method is good, can found any extent mismatch problem when
checking extent tree. (Although it has already iterated the whole fs)
For a large enough filesystem, it may have tegas of extents, and
memory is easy eaten up.


We hope to fix it in the following method:
1) Only check extent backref when iterating extent tree
Unlike current implement, we check one extent item and its backref
only.

If one backref can't be reached, then it's an error and output (or
try to fix).
After iterating all backref of an extent item, all related memory is
freed and we won't bother recording anything for later use.

That's to say, we only care backref mismatch case when checking
extent tree.
Case like missing EXTENT_ITEM for some extent is not checked here.

2) Check extent ref while iterating other trees
We only check forward-ref while iterating one tree.

In this step, we only check forward-ref, so we can find the remaining
problem like missing EXTENT_ITEM for given extent.

Any further advice/suggestion? Or is there anyone already doing such
work?


Thank you for your effort. I have basic questions.

1. Could you tell me what you'd like to do?

a) Provide completely the same function with current
   implementation by other, more efficient way.


Same function, but less efficient.
It may takes longer time, more IO, but less memory.

And some error message will be output at different time.
E.g, error message for missing backref may be output at fs tree checking 
time, instead of extent tree checking time.



b) Replace the current implementation with the quicker
   one that provides the limited function.
c) Any other

2. Do you have the estimation that how long does the
new algorithm take compare with the current one?


Depends on the fs hierarchy. But in all case, IO will be more than 
original implement.


The most efficient case would be, one subvolume and no dedup file.
(which means one file extent refer to one extent on data, no in-band or
out-band dedup).

In that case, old implement will iterate the whole metadata twice,
and new implement will iterate the whole metadata twice + extra.

For worst case, like inband dedup with multiple almost identical 
snapshot, things will be much much slower, more IO, more tree search, 
maybe O(n^2) or more. But memory usage should not be much different though.


In short, use more IO to trade for memory.

Anyway, for a large fs, it won't be possible to take a short time for a 
comprehensive fsck.


Thanks,
Qu


# Of course, "currently not sure" is OK at this stage.

I'm interested in it because there is the trade-off
between speed and memory consumption in many case,
and btrfsck takes very long time with a large filesystem.

Thanks,
Satoru



Thanks,
Qu



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html






--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfsck memory usage reduce idea

2016-03-08 Thread Satoru Takeuchi

Hi Qu,

On 2016/03/07 14:42, Qu Wenruo wrote:

Hi,

As many have already known, "btrfs check" is a memory eater.

The problem is, btrfsck checks extent tree in a very comprehensive method.
1) Create extent_ref for each extent item with backref
2) Iterate all other trees to add extent ref
3) If one extent_ref with all ref/backref matches, it's deleted.

The method is good, can found any extent mismatch problem when checking extent 
tree. (Although it has already iterated the whole fs)
For a large enough filesystem, it may have tegas of extents, and memory is easy 
eaten up.


We hope to fix it in the following method:
1) Only check extent backref when iterating extent tree
Unlike current implement, we check one extent item and its backref
only.

If one backref can't be reached, then it's an error and output (or
try to fix).
After iterating all backref of an extent item, all related memory is
freed and we won't bother recording anything for later use.

That's to say, we only care backref mismatch case when checking
extent tree.
Case like missing EXTENT_ITEM for some extent is not checked here.

2) Check extent ref while iterating other trees
We only check forward-ref while iterating one tree.

In this step, we only check forward-ref, so we can find the remaining
problem like missing EXTENT_ITEM for given extent.

Any further advice/suggestion? Or is there anyone already doing such work?


Thank you for your effort. I have basic questions.

1. Could you tell me what you'd like to do?

   a) Provide completely the same function with current
  implementation by other, more efficient way.
   b) Replace the current implementation with the quicker
  one that provides the limited function.
   c) Any other

2. Do you have the estimation that how long does the
   new algorithm take compare with the current one?

   # Of course, "currently not sure" is OK at this stage.

   I'm interested in it because there is the trade-off
   between speed and memory consumption in many case,
   and btrfsck takes very long time with a large filesystem.

Thanks,
Satoru



Thanks,
Qu



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html