Re: snapshot deletion / unmount slowness

2013-03-11 Thread Roman Mamedov
On Sun, 10 Mar 2013 22:31:08 -0700
Michael Johnson - MJ m...@revmj.com wrote:

 What I now suspect is going on is that while deleting the snapshots
 was quick, that probably kicks of a background thread which actually
 does the heavy lifting.

Exactly that, the snapshot deletion only syncs on unmount, there is no
other way to ensure it is complete.

If you have some patience and let it unmount properly and then remount it, you
may find that you have gained much more free space, due to all the snapshots
being actually deleted and the space they were occupying freed only just now.

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: snapshot deletion / unmount slowness

2013-03-11 Thread Liu Bo
On Mon, Mar 11, 2013 at 12:11:43PM +0600, Roman Mamedov wrote:
 On Sun, 10 Mar 2013 22:31:08 -0700
 Michael Johnson - MJ m...@revmj.com wrote:
 
  What I now suspect is going on is that while deleting the snapshots
  was quick, that probably kicks of a background thread which actually
  does the heavy lifting.
 
 Exactly that, the snapshot deletion only syncs on unmount, there is no
 other way to ensure it is complete.
 
 If you have some patience and let it unmount properly and then remount it, you
 may find that you have gained much more free space, due to all the snapshots
 being actually deleted and the space they were occupying freed only just now.

A recent commit(commit fa6ac8765c48a06dfed914e8c8c3a903f9d313a0
Btrfs: fix cleaner thread not working with inode cache option)
may improve the situation.

You may want to try it.

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-11 Thread Hugo Mills
On Sun, Mar 10, 2013 at 11:55:10PM +, sam tygier wrote:
 On 09/03/13 20:31, Hugo Mills wrote:
 Some time ago, and occasionally since, we've discussed altering the
  RAID-n terminology to change it to an nCmSpP format, where n is the
  number of copies, m is the number of (data) devices in a stripe per copy,
  and p is the number of parity devices in a stripe.
  
 The current kernel implementation uses as many devices as it can in the
  striped modes (RAID-0, -10, -5, -6), and in this implementation, that is
  written as mS (with a literal m). The mS and pP sections are omitted
  if the value is 1S or 0P.
  
 The magic look-up table for old-style / new-style is:
  
  single   1C (or omitted, in btrfs fi df output)
  RAID-0   1CmS
  RAID-1   2C
  DUP  2CD
  RAID-10  2CmS
  RAID-5   1CmS1P
  RAID-6   1CmS2P
 
 Are these the only valid options?

   Currently, yes.

 Are 'sensible' new levels (eg 3C, mirrored to 3 disk or 1CmS3P, like
 raid6 with but with 3 parity blocks) allowed?

   Not right now, but:

 - I don't know if the forthcoming 3c code will allow arbitrary values
   or not, but Chris has promised 3c.

 - Fixed S will definitely happen for the parity-RAID levels. I'm not
   sure about the stripe-RAID levels.

 - Higher P are mathematically possible, but (AIUI) awkward to
   construct efficient and effective ones (and it's a manual process
   to do so). I suspect that 3p may happen, but 4p may not for a long
   time.

 Are any arbitrary levels allowed (some other comments in the thread
 suggest no)?

   Currently, no, and I don't think there are immediate plans to
generalise it, but I'd like to see that happen eventually.

 Will there be a recommended (or supported) set?

   Quite likely, even with the limited (forthcoming) set of
parameters. Using mSpP on an array of larger than some particular size
is probably not going to be particularly good for performance, for
example.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- You stay in the theatre because you're afraid of having no ---
 money? There's irony... 


signature.asc
Description: Digital signature


[PATCH] Btrfs: get better concurrency for snapshot-aware defrag work

2013-03-11 Thread Liu Bo
Using spinning case instead of blocking will result in better concurrency
overall.

Signed-off-by: Liu Bo bo.li@oracle.com
---
 fs/btrfs/inode.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 13ab4de..1f26 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2312,6 +2312,7 @@ again:
key.type = BTRFS_EXTENT_DATA_KEY;
key.offset = start;
 
+   path-leave_spinning = 1;
if (merge) {
struct btrfs_file_extent_item *fi;
u64 extent_len;
@@ -2368,6 +2369,7 @@ again:
 
btrfs_mark_buffer_dirty(leaf);
inode_add_bytes(inode, len);
+   btrfs_release_path(path);
 
ret = btrfs_inc_extent_ref(trans, root, new-bytenr,
new-disk_len, 0,
@@ -2381,6 +2383,7 @@ again:
ret = 1;
 out_free_path:
btrfs_release_path(path);
+   path-leave_spinning = 0;
btrfs_end_transaction(trans, root);
 out_unlock:
unlock_extent_cached(BTRFS_I(inode)-io_tree, lock_start, lock_end,
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: remove btrfs_try_spin_lock

2013-03-11 Thread Liu Bo
Remove an useless function claim.

Signed-off-by: Liu Bo bo.li@oracle.com
---
 fs/btrfs/locking.h |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/locking.h b/fs/btrfs/locking.h
index ca52681..b81e0e9 100644
--- a/fs/btrfs/locking.h
+++ b/fs/btrfs/locking.h
@@ -26,7 +26,6 @@
 
 void btrfs_tree_lock(struct extent_buffer *eb);
 void btrfs_tree_unlock(struct extent_buffer *eb);
-int btrfs_try_spin_lock(struct extent_buffer *eb);
 
 void btrfs_tree_read_lock(struct extent_buffer *eb);
 void btrfs_tree_read_unlock(struct extent_buffer *eb);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: add options for changing size representations

2013-03-11 Thread Audrius Butkevicius
Add '--si', '-h'/'--human-readable' and '--block-size' global options,
which allow users to customize the way sizes are displayed.

Options and their format tries to mimic GNU ls utility.

Signed-off-by: Audrius Butkevicius audrius.butkevic...@elastichosts.com
---
 btrfs.c |3 ++
 utils.c |  146 +++
 utils.h |6 +++
 3 files changed, 138 insertions(+), 17 deletions(-)

diff --git a/btrfs.c b/btrfs.c
index 691adef..6a8fc30 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -22,6 +22,8 @@
 #include crc32c.h
 #include commands.h
 #include version.h
+#include ctree.h
+#include utils.h
 
 static const char * const btrfs_cmd_group_usage[] = {
btrfs [--help] [--version] group [group...] command [args],
@@ -291,6 +293,7 @@ int main(int argc, char **argv)
 
crc32c_optimization_init();
 
+   handle_size_unit_args(argc, argv);
fixup_argv0(argv, cmd-token);
exit(cmd-fn(argc, argv));
 }
diff --git a/utils.c b/utils.c
index d660507..58c1919 100644
--- a/utils.c
+++ b/utils.c
@@ -16,6 +16,7 @@
  * Boston, MA 021110-1307, USA.
  */
 
+#define _GNU_SOURCE
 #define _XOPEN_SOURCE 700
 #define __USE_XOPEN2K8
 #define __XOPEN2K8 /* due to an error in dirent.h, to get dirfd() */
@@ -1095,33 +1096,144 @@ out:
return ret;
 }
 
-static char *size_strs[] = { , KB, MB, GB, TB,
-   PB, EB, ZB, YB};
+static int sizes_format = SIZES_FORMAT_BYTES;
+static u64 sizes_divisor = 1;
+
+void remove_arg(int i, int *argc, char ***argv)
+{
+   while (i++  *argc)
+   (*argv)[i - 1] = (*argv)[i];
+   (*argc)--;
+}
+
+void handle_size_unit_args(int *argc, char ***argv)
+{
+   int k;
+   int base = 1024;
+   char *suffix;
+   char *block_size;
+   u64 value;
+
+   for (k = *argc - 1; k = 0; k--) {
+if (!strcmp((*argv)[k], -h) ||
+!strcmp((*argv)[k], --human-readable)) {
+   sizes_format = SIZES_FORMAT_HUMAN;
+   remove_arg(k, argc, argv);
+} else if (!strcmp((*argv)[k], --si)) {
+   sizes_format = SIZES_FORMAT_SI;
+   remove_arg(k, argc, argv);
+} else if (!strncmp((*argv)[k], --block-size, 12)) {
+   if (strlen((*argv)[k])  14 || (*argv)[k][12] != '=') {
+   fprintf(stderr,
+--block-size requires an argument\n);
+   exit(1);
+   }
+
+   sizes_format = SIZES_FORMAT_BLOCK;
+   block_size = strchr((*argv)[k], '=');
+
+   errno = 0;
+   value = strtoull(++block_size, suffix, 10);
+   if (errno == ERANGE  value == ULLONG_MAX) {
+   fprintf(stderr,
+--block-size argument '%s' too 
large\n,
+block_size);
+   exit(1);
+   }
+   if (suffix == block_size)
+   value = 1;
+
+   if (strlen(suffix) == 1  value  0) {
+   base = 1024;
+   } else if (strlen(suffix) == 2  suffix[1] == 'B'
+value  0) {
+   base = 1000;
+   /* Allow non-zero values without a suffix */
+   } else if (strlen(suffix) != 0 || value == 0) {
+   fprintf(stderr,
+invalid --block-size argument '%s'\n,
+block_size);
+   exit(1);
+   }
+
+   if (strlen(suffix)  0) {
+   switch(suffix[0]) {
+   case 'E':
+   sizes_divisor *= base;
+   case 'P':
+   sizes_divisor *= base;
+   case 'T':
+   sizes_divisor *= base;
+   case 'G':
+   sizes_divisor *= base;
+   case 'M':
+   sizes_divisor *= base;
+   case 'K':
+   sizes_divisor *= base;
+   break;
+   default:
+   fprintf(stderr,
+invalid --block-size \
+argument '%s'\n,
+

Re: snapshot deletion / unmount slowness

2013-03-11 Thread Swâmi Petaramesh
Le 11/03/2013 07:47, Liu Bo a écrit :
 A recent commit(commit fa6ac8765c48a06dfed914e8c8c3a903f9d313a0
 Btrfs: fix cleaner thread not working with inode cache option)
 may improve the situation.

Hi Liu,

I have never seen this issue with btrfs-cleaner not working, when I
delete snapshots it typically kicks in a few seconds later and works
until done.

Does the bug you mention affect only specific kernel versions ?

AFAIK I use inode_cache (it's not in my fstab but I mounted my FSes
using it manually, and I believe it's a persistent option ? - I may
possibly be wrong...)

TIA

Kind regards.

-- 
Swâmi Petaramesh sw...@petaramesh.org http://petaramesh.org PGP 9076E32E
Ne cherchez pas : Je ne suis pas sur Facebook.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS and ionice

2013-03-11 Thread Swâmi Petaramesh
Hi,

I use ionice -c 3 command to run some low-priority background tasks
(i.e. tar, big file copies, or performing checksums sur very big files)
using the disk only when idle, which would be supposed to have very
little impact on my system performance meanwhile.

I can be pretty sure that those tasks are I/O-bound and use very little
CPU (and they are niced as well, anyway).

However, when such tasks are running my BTRFS system slows down to a
crawl, becomes very very unresponsive, and it seems to me that disk I/O
is completely saturated (LED is fixed lit...)

So I wonder if BTRFS correctly support ionice, or if it's plain useless ?

TIA

Kind regards.

-- 
Swâmi Petaramesh sw...@petaramesh.org http://petaramesh.org PGP 9076E32E
Ne cherchez pas : Je ne suis pas sur Facebook.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: about btrfs quota issues

2013-03-11 Thread Wang Shilong
ping..

 Hello, Arne
 
  Steps to reproduce:
 
 
mkfs.btrfs disk
mount disk mnt
btrfs quota enable mnt
 
btrfs sub create mnt/sub
btrfs qgroup create 1/1 mnt
btrfs qgroup assign sub_qgroupid 1/1 mnt
 
 
dd if=/dev/zero of=mnt/sub/data bs=1M count=1
sync
btrfs qgroup show  mnt
#until now, every thing goes well, however, if snapshot happens
#the quota accounting will go wrong
 
   btrfs sub snapshot mnt/sub mnt/snap
   sync
   btrfs qgroup show mnt
   #the accounting information of group(1/1) is not expected
   #here exclusive of group (1/1) do not change as expected.
 
 So i took a close look at the algorithm of quota accounting, the 3
 steps of algorithm don't
 consider some cases like the above example.
 
 In fact, i think you try to put some work on users, especially when
 snapshot happens.
 It is complex to track all the group's accounting when having
 snapshots..See the following
 commands.
 
 btrfs sub snapshot -c src_qgroupid:dst_qgroupid  mnt
 btrfs sub snapshot  -x src_qgroupid:dst_qgroupid mnt
 
 
 Are these commands designed for some cases regarding to
 snapshots/subvolume cases?
 If so, i think it really confusing and too complex for users to do
 such work, is't it?...
 
 BTW, i have a question about the function btrfs_qgroup_inherit(),
 when copying exclusive value from src_qgroup to dst_qgroup:
 
   dst_qgroup-exclusive = src_qgroup-exclusive + level_size
 
 while copying referenced value from src_qgroup to dot_qgroup:
 
   dst_qgroup-referenced = src_qgroup-referenced -level_size
 
 I can't really figure out...~_~
 
 Thanks,
 Wang

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: add options for changing size representations

2013-03-11 Thread Wang Shilong
Hello,

 Add '--si', '-h'/'--human-readable' and '--block-size' global options,
 which allow users to customize the way sizes are displayed.

why not use the function getopt_long()  to complete the parsing.
Never Re-inventing the wheel  again.

As discussed  before, better not use 'exit(1)'  in the parsing process,
I think it better to implement the parse_function like this:

int  parse_str(char *str,  u64 *size)

Thanks,
Wang

 
 Options and their format tries to mimic GNU ls utility.
 
 Signed-off-by: Audrius Butkevicius audrius.butkevic...@elastichosts.com
 ---
 btrfs.c |3 ++
 utils.c |  146 +++
 utils.h |6 +++
 3 files changed, 138 insertions(+), 17 deletions(-)
 
 diff --git a/btrfs.c b/btrfs.c
 index 691adef..6a8fc30 100644
 --- a/btrfs.c
 +++ b/btrfs.c
 @@ -22,6 +22,8 @@
 #include crc32c.h
 #include commands.h
 #include version.h
 +#include ctree.h
 +#include utils.h
 
 static const char * const btrfs_cmd_group_usage[] = {
   btrfs [--help] [--version] group [group...] command [args],
 @@ -291,6 +293,7 @@ int main(int argc, char **argv)
 
   crc32c_optimization_init();
 
 + handle_size_unit_args(argc, argv);
   fixup_argv0(argv, cmd-token);
   exit(cmd-fn(argc, argv));
 }
 diff --git a/utils.c b/utils.c
 index d660507..58c1919 100644
 --- a/utils.c
 +++ b/utils.c
 @@ -16,6 +16,7 @@
  * Boston, MA 021110-1307, USA.
  */
 
 +#define _GNU_SOURCE
 #define _XOPEN_SOURCE 700
 #define __USE_XOPEN2K8
 #define __XOPEN2K8 /* due to an error in dirent.h, to get dirfd() */
 @@ -1095,33 +1096,144 @@ out:
   return ret;
 }
 
 -static char *size_strs[] = { , KB, MB, GB, TB,
 - PB, EB, ZB, YB};
 +static int sizes_format = SIZES_FORMAT_BYTES;
 +static u64 sizes_divisor = 1;
 +
 +void remove_arg(int i, int *argc, char ***argv)
 +{
 + while (i++  *argc)
 + (*argv)[i - 1] = (*argv)[i];
 + (*argc)--;
 +}
 +
 +void handle_size_unit_args(int *argc, char ***argv)
 +{
 + int k;
 + int base = 1024;
 + char *suffix;
 + char *block_size;
 + u64 value;
 +
 + for (k = *argc - 1; k = 0; k--) {
 +  if (!strcmp((*argv)[k], -h) ||
 +  !strcmp((*argv)[k], --human-readable)) {
 + sizes_format = SIZES_FORMAT_HUMAN;
 + remove_arg(k, argc, argv);
 +  } else if (!strcmp((*argv)[k], --si)) {
 + sizes_format = SIZES_FORMAT_SI;
 + remove_arg(k, argc, argv);
 +  } else if (!strncmp((*argv)[k], --block-size, 12)) {
 + if (strlen((*argv)[k])  14 || (*argv)[k][12] != '=') {
 + fprintf(stderr,
 +  --block-size requires an argument\n);
 + exit(1);
 + }
 +
 + sizes_format = SIZES_FORMAT_BLOCK;
 + block_size = strchr((*argv)[k], '=');
 +
 + errno = 0;
 + value = strtoull(++block_size, suffix, 10);
 + if (errno == ERANGE  value == ULLONG_MAX) {
 + fprintf(stderr,
 +  --block-size argument '%s' too 
 large\n,
 +  block_size);
 + exit(1);
 + }
 + if (suffix == block_size)
 + value = 1;
 +
 + if (strlen(suffix) == 1  value  0) {
 + base = 1024;
 + } else if (strlen(suffix) == 2  suffix[1] == 'B'
 +  value  0) {
 + base = 1000;
 + /* Allow non-zero values without a suffix */
 + } else if (strlen(suffix) != 0 || value == 0) {
 + fprintf(stderr,
 +  invalid --block-size argument '%s'\n,
 +  block_size);
 + exit(1);
 + }
 +
 + if (strlen(suffix)  0) {
 + switch(suffix[0]) {
 + case 'E':
 + sizes_divisor *= base;
 + case 'P':
 + sizes_divisor *= base;
 + case 'T':
 + sizes_divisor *= base;
 + case 'G':
 + sizes_divisor *= base;
 + case 'M':
 + sizes_divisor *= base;
 + case 'K':
 + sizes_divisor *= base;
 +  

Re: BTRFS and ionice

2013-03-11 Thread Dan van der Ster
Hi,
Which IO scheduler do you use? I used to have terrible read
performance during a btrfs scrub until I switched the disk scheduler
from deadline to cfq.
Cheers, Dan


On Mon, Mar 11, 2013 at 11:26 AM, Swâmi Petaramesh sw...@petaramesh.org wrote:
 Hi,

 I use ionice -c 3 command to run some low-priority background tasks
 (i.e. tar, big file copies, or performing checksums sur very big files)
 using the disk only when idle, which would be supposed to have very
 little impact on my system performance meanwhile.

 I can be pretty sure that those tasks are I/O-bound and use very little
 CPU (and they are niced as well, anyway).

 However, when such tasks are running my BTRFS system slows down to a
 crawl, becomes very very unresponsive, and it seems to me that disk I/O
 is completely saturated (LED is fixed lit...)

 So I wonder if BTRFS correctly support ionice, or if it's plain useless ?

 TIA

 Kind regards.

 --
 Swāmi Petaramesh sw...@petaramesh.org http://petaramesh.org PGP 9076E32E
 Ne cherchez pas : Je ne suis pas sur Facebook.

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: about btrfs quota issues

2013-03-11 Thread Arne Jansen
On 10.03.2013 05:21, Shilong Wang wrote:
 Hello, Arne
 
   Steps to reproduce:
 
 
 mkfs.btrfs disk
 mount disk mnt
 btrfs quota enable mnt
 
 btrfs sub create mnt/sub
 btrfs qgroup create 1/1 mnt
 btrfs qgroup assign sub_qgroupid 1/1 mnt
 
 
 dd if=/dev/zero of=mnt/sub/data bs=1M count=1
 sync
 btrfs qgroup show  mnt
 #until now, every thing goes well, however, if snapshot 
 happens
 #the quota accounting will go wrong
 
btrfs sub snapshot mnt/sub mnt/snap
sync
btrfs qgroup show mnt
#the accounting information of group(1/1) is not expected
#here exclusive of group (1/1) do not change as expected.
 
 So i took a close look at the algorithm of quota accounting, the 3
 steps of algorithm don't
 consider some cases like the above example.
 
 In fact, i think you try to put some work on users, especially when
 snapshot happens.
 It is complex to track all the group's accounting when having
 snapshots..See the following
 commands.
 
 btrfs sub snapshot -c src_qgroupid:dst_qgroupid  mnt
 btrfs sub snapshot  -x src_qgroupid:dst_qgroupid mnt
 
 
 Are these commands designed for some cases regarding to
 snapshots/subvolume cases?

Yes, these commands would have helped you in the above case. You need to
create an empty qgroup and copy the exclusive from there on snapshot
creation.

 If so, i think it really confusing and too complex for users to do
 such work, is't it?...

It is complex. That is why I always point anyone asking to do some work
on btrfs or qgroups to writing an enhanced interface to simplify this
task for the user. I don't think the kernel should handle this.
And that's why I took the effort to write a pdf to explain the
concepts :)
But the current interface is not only complex, it also is very powerful.
You can solve problems with it that no other quota system I know of can
solve.

 
 BTW, i have a question about the function btrfs_qgroup_inherit(),
 when copying exclusive value from src_qgroup to dst_qgroup:
 
dst_qgroup-exclusive = src_qgroup-exclusive + level_size
 
 while copying referenced value from src_qgroup to dot_qgroup:
 
dst_qgroup-referenced = src_qgroup-referenced -level_size
 
 I can't really figure out...~_~

level_size is just a small correction for the space the tree root
occupies. The tree root is never shared between subvolumes.

-Arne

 
 Thanks,
 Wang

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: about btrfs quota issues

2013-03-11 Thread Wang Shilong

Hello,

 On 10.03.2013 05:21, Shilong Wang wrote:
 Hello, Arne
 
  Steps to reproduce:
 
 
mkfs.btrfs disk
mount disk mnt
btrfs quota enable mnt
 
btrfs sub create mnt/sub
btrfs qgroup create 1/1 mnt
btrfs qgroup assign sub_qgroupid 1/1 mnt
 
 
dd if=/dev/zero of=mnt/sub/data bs=1M count=1
sync
btrfs qgroup show  mnt
#until now, every thing goes well, however, if snapshot 
 happens
#the quota accounting will go wrong
 
   btrfs sub snapshot mnt/sub mnt/snap
   sync
   btrfs qgroup show mnt
   #the accounting information of group(1/1) is not expected
   #here exclusive of group (1/1) do not change as expected.
 
 So i took a close look at the algorithm of quota accounting, the 3
 steps of algorithm don't
 consider some cases like the above example.
 
 In fact, i think you try to put some work on users, especially when
 snapshot happens.
 It is complex to track all the group's accounting when having
 snapshots..See the following
 commands.
 
 btrfs sub snapshot -c src_qgroupid:dst_qgroupid  mnt
 btrfs sub snapshot  -x src_qgroupid:dst_qgroupid mnt
 
 
 Are these commands designed for some cases regarding to
 snapshots/subvolume cases?
 
 Yes, these commands would have helped you in the above case. You need to
 create an empty qgroup and copy the exclusive from there on snapshot
 creation.

I am wondering why we need the concept of exclusive.
Maybe it helps to some extent

How about just  kicking it off, since the concepts of exclusive
adds the complexity of btrfs quota.

The worst thing is that i don't think users can master this magic
concept very well.

 
 If so, i think it really confusing and too complex for users to do
 such work, is't it?...
 
 It is complex. That is why I always point anyone asking to do some work
 on btrfs or qgroups to writing an enhanced interface to simplify this
 task for the user. I don't think the kernel should handle this.
 And that's why I took the effort to write a pdf to explain the
 concepts :)

I don't have any  good ideas about this yet..

 But the current interface is not only complex, it also is very powerful.
 You can solve problems with it that no other quota system I know of can
 solve.
 
 
 BTW, i have a question about the function btrfs_qgroup_inherit(),
 when copying exclusive value from src_qgroup to dst_qgroup:
 
   dst_qgroup-exclusive = src_qgroup-exclusive + level_size
 
 while copying referenced value from src_qgroup to dot_qgroup:
 
   dst_qgroup-referenced = src_qgroup-referenced -level_size
 
 I can't really figure out...~_~
 
 level_size is just a small correction for the space the tree root
 occupies. The tree root is never shared between sub volumes.

O.K. I  got it..

Thanks,
Wang

 
 -Arne
 
 
 Thanks,
 Wang
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: about btrfs quota issues

2013-03-11 Thread Arne Jansen
On 11.03.2013 14:31, Wang Shilong wrote:
 
 Hello,
 
snip


 In fact, i think you try to put some work on users, especially when
 snapshot happens.
 It is complex to track all the group's accounting when having
 snapshots..See the following
 commands.

 btrfs sub snapshot -c src_qgroupid:dst_qgroupid  mnt
 btrfs sub snapshot  -x src_qgroupid:dst_qgroupid mnt


 Are these commands designed for some cases regarding to
 snapshots/subvolume cases?

 Yes, these commands would have helped you in the above case. You need to
 create an empty qgroup and copy the exclusive from there on snapshot
 creation.
 
 I am wondering why we need the concept of exclusive.
 Maybe it helps to some extent
 

It is needed to answer the question 'how many space can I gain but
deleting this subvol or this set of subvolumes?'

 How about just  kicking it off, since the concepts of exclusive
 adds the complexity of btrfs quota.

If you don't need that value, just ignore the tracking error.

 
 The worst thing is that i don't think users can master this magic
 concept very well.

Normally users don't need very sophisticated scenarios. In fact, they
don't even need higher level quota groups, the basic tracking is
enough. In this case, everything just works as expected for the user.
If you start creating and assigning qgroups manually, prepare to handle
the complexity.

-Arne

 

 If so, i think it really confusing and too complex for users to do
 such work, is't it?...

 It is complex. That is why I always point anyone asking to do some work
 on btrfs or qgroups to writing an enhanced interface to simplify this
 task for the user. I don't think the kernel should handle this.
 And that's why I took the effort to write a pdf to explain the
 concepts :)
 
 I don't have any  good ideas about this yet..
 
 But the current interface is not only complex, it also is very powerful.
 You can solve problems with it that no other quota system I know of can
 solve.


 BTW, i have a question about the function btrfs_qgroup_inherit(),
 when copying exclusive value from src_qgroup to dst_qgroup:

   dst_qgroup-exclusive = src_qgroup-exclusive + level_size

 while copying referenced value from src_qgroup to dot_qgroup:

   dst_qgroup-referenced = src_qgroup-referenced -level_size

 I can't really figure out...~_~

 level_size is just a small correction for the space the tree root
 occupies. The tree root is never shared between sub volumes.
 
 O.K. I  got it..
 
 Thanks,
 Wang
 

 -Arne


 Thanks,
 Wang

 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-11 Thread David Sterba
On Sun, Mar 10, 2013 at 11:49:53PM +, Hugo Mills wrote:
  Using an asterisk '*' in something will be used as a command line argument 
  risks having the shell expand it. Sticking to pure alphanumeric names would 
  be better.
 
Yeah, David's just pointed this out on IRC. After a bit of fiddling
 around with various options, I like using X.

I'd like to see something that can exist as na identifier or can be
copy-pasted in one click, but '*' being a shell meta-character is IMO
stronger argument against using it.

I'm also going to use lowercase c,s,p, because it seems to be
 easier to read with the different-height characters. So we end up
 with, e.g.
 
 1c  (single)
 2cXs(RAID-10)
 1cXs2p  (RAID-6)

This form looks ok to me.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: about btrfs quota issues

2013-03-11 Thread Wang Shilong

snip

 The worst thing is that i don't think users can master this magic
 concept very well.
 
 Normally users don't need very sophisticated scenarios. In fact, they
 don't even need higher level quota groups, the basic tracking is
 enough. In this case, everything just works as expected for the user.
 If you start creating and assigning qgroups manually, prepare to handle
 the complexity.
 
Considering this case:

a subvolume related to a user, we limit the space by limiting every subvolume
qgroup, but  we also want to limit  the total space all the users can use. So 
we create
a parent qgroup(1/1 for example) and assign all subvolume group to this parent 
group.

The above case is regularly used i think, What's more, many snapshots may be 
done.
So  i think what i am concerning is not a corner case..

Thanks,
Wang
 
 
 
 If so, i think it really confusing and too complex for users to do
 such work, is't it?...
 
 It is complex. That is why I always point anyone asking to do some work
 on btrfs or qgroups to writing an enhanced interface to simplify this
 task for the user. I don't think the kernel should handle this.
 And that's why I took the effort to write a pdf to explain the
 concepts :)
 
 I don't have any  good ideas about this yet..
 
 But the current interface is not only complex, it also is very powerful.
 You can solve problems with it that no other quota system I know of can
 solve.
 
 
 BTW, i have a question about the function btrfs_qgroup_inherit(),
 when copying exclusive value from src_qgroup to dst_qgroup:
 
  dst_qgroup-exclusive = src_qgroup-exclusive + level_size
 
 while copying referenced value from src_qgroup to dot_qgroup:
 
  dst_qgroup-referenced = src_qgroup-referenced -level_size
 
 I can't really figure out...~_~
 
 level_size is just a small correction for the space the tree root
 occupies. The tree root is never shared between sub volumes.
 
 O.K. I  got it..
 
 Thanks,
 Wang
 
 
 -Arne
 
 
 Thanks,
 Wang
 
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: about btrfs quota issues

2013-03-11 Thread Arne Jansen
On 11.03.2013 15:15, Wang Shilong wrote:
 
 snip
 
 The worst thing is that i don't think users can master this magic
 concept very well.

 Normally users don't need very sophisticated scenarios. In fact, they
 don't even need higher level quota groups, the basic tracking is
 enough. In this case, everything just works as expected for the user.
 If you start creating and assigning qgroups manually, prepare to handle
 the complexity.

 Considering this case:
 
 a subvolume related to a user, we limit the space by limiting every subvolume
 qgroup, but  we also want to limit  the total space all the users can use. So 
 we create
 a parent qgroup(1/1 for example) and assign all subvolume group to this 
 parent group.
 
 The above case is regularly used i think, What's more, many snapshots may be 
 done.
 So  i think what i am concerning is not a corner case..

So you just missed to assign the new subvolume to 1/1 by using -i on
snapshot creation.

-Arne

 
 Thanks,
 Wang



 If so, i think it really confusing and too complex for users to do
 such work, is't it?...

 It is complex. That is why I always point anyone asking to do some work
 on btrfs or qgroups to writing an enhanced interface to simplify this
 task for the user. I don't think the kernel should handle this.
 And that's why I took the effort to write a pdf to explain the
 concepts :)

 I don't have any  good ideas about this yet..

 But the current interface is not only complex, it also is very powerful.
 You can solve problems with it that no other quota system I know of can
 solve.


 BTW, i have a question about the function btrfs_qgroup_inherit(),
 when copying exclusive value from src_qgroup to dst_qgroup:

  dst_qgroup-exclusive = src_qgroup-exclusive + level_size

 while copying referenced value from src_qgroup to dot_qgroup:

  dst_qgroup-referenced = src_qgroup-referenced -level_size

 I can't really figure out...~_~

 level_size is just a small correction for the space the tree root
 occupies. The tree root is never shared between sub volumes.

 O.K. I  got it..

 Thanks,
 Wang


 -Arne


 Thanks,
 Wang



 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: about btrfs quota issues

2013-03-11 Thread Wang Shilong

 On 11.03.2013 15:15, Wang Shilong wrote:
 
 snip
 
 The worst thing is that i don't think users can master this magic
 concept very well.
 
 Normally users don't need very sophisticated scenarios. In fact, they
 don't even need higher level quota groups, the basic tracking is
 enough. In this case, everything just works as expected for the user.
 If you start creating and assigning qgroups manually, prepare to handle
 the complexity.
 
 Considering this case:
 
 a subvolume related to a user, we limit the space by limiting every subvolume
 qgroup, but  we also want to limit  the total space all the users can use. 
 So we create
 a parent qgroup(1/1 for example) and assign all subvolume group to this 
 parent group.
 
 The above case is regularly used i think, What's more, many snapshots may be 
 done.
 So  i think what i am concerning is not a corner case..
 
 So you just missed to assign the new subvolume to 1/1 by using -i on
 snapshot creation.
 

When snapshot happens,  the exclusive of 1/1 will go wrong even with  this 
simple case..

However, thanks very much for your patience and kindly reply ^_^

Thanks, 
Wang

 -Arne
 
 
 Thanks,
 Wang
 
 
 
 If so, i think it really confusing and too complex for users to do
 such work, is't it?...
 
 It is complex. That is why I always point anyone asking to do some work
 on btrfs or qgroups to writing an enhanced interface to simplify this
 task for the user. I don't think the kernel should handle this.
 And that's why I took the effort to write a pdf to explain the
 concepts :)
 
 I don't have any  good ideas about this yet..
 
 But the current interface is not only complex, it also is very powerful.
 You can solve problems with it that no other quota system I know of can
 solve.
 
 
 BTW, i have a question about the function btrfs_qgroup_inherit(),
 when copying exclusive value from src_qgroup to dst_qgroup:
 
 dst_qgroup-exclusive = src_qgroup-exclusive + level_size
 
 while copying referenced value from src_qgroup to dot_qgroup:
 
 dst_qgroup-referenced = src_qgroup-referenced -level_size
 
 I can't really figure out...~_~
 
 level_size is just a small correction for the space the tree root
 occupies. The tree root is never shared between sub volumes.
 
 O.K. I  got it..
 
 Thanks,
 Wang
 
 
 -Arne
 
 
 Thanks,
 Wang
 
 
 
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: about btrfs quota issues

2013-03-11 Thread Arne Jansen
On 11.03.2013 15:35, Wang Shilong wrote:
 
 On 11.03.2013 15:15, Wang Shilong wrote:

 snip

 The worst thing is that i don't think users can master this magic
 concept very well.

 Normally users don't need very sophisticated scenarios. In fact, they
 don't even need higher level quota groups, the basic tracking is
 enough. In this case, everything just works as expected for the user.
 If you start creating and assigning qgroups manually, prepare to handle
 the complexity.

 Considering this case:

 a subvolume related to a user, we limit the space by limiting every 
 subvolume
 qgroup, but  we also want to limit  the total space all the users can use. 
 So we create
 a parent qgroup(1/1 for example) and assign all subvolume group to this 
 parent group.

 The above case is regularly used i think, What's more, many snapshots may 
 be done.
 So  i think what i am concerning is not a corner case..

 So you just missed to assign the new subvolume to 1/1 by using -i on
 snapshot creation.

 
 When snapshot happens,  the exclusive of 1/1 will go wrong even with  this 
 simple case..

Your example does not describe your use case. If you want to account the
snapshot to the user, you also have to assign the snapshot to 1/1. If you
do so, the exclusive will be correct.

-Arne

 
 However, thanks very much for your patience and kindly reply ^_^
 
 Thanks, 
 Wang
 
 -Arne


 Thanks,
 Wang



 If so, i think it really confusing and too complex for users to do
 such work, is't it?...

 It is complex. That is why I always point anyone asking to do some work
 on btrfs or qgroups to writing an enhanced interface to simplify this
 task for the user. I don't think the kernel should handle this.
 And that's why I took the effort to write a pdf to explain the
 concepts :)

 I don't have any  good ideas about this yet..

 But the current interface is not only complex, it also is very powerful.
 You can solve problems with it that no other quota system I know of can
 solve.


 BTW, i have a question about the function btrfs_qgroup_inherit(),
 when copying exclusive value from src_qgroup to dst_qgroup:

 dst_qgroup-exclusive = src_qgroup-exclusive + level_size

 while copying referenced value from src_qgroup to dot_qgroup:

 dst_qgroup-referenced = src_qgroup-referenced -level_size

 I can't really figure out...~_~

 level_size is just a small correction for the space the tree root
 occupies. The tree root is never shared between sub volumes.

 O.K. I  got it..

 Thanks,
 Wang


 -Arne


 Thanks,
 Wang





 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Integration branch of btrfs-progs 2013-03-11

2013-03-11 Thread David Sterba
Hi,

this set contains help text updates, the series from Eric and my fix to mkfs
superblock checksum (now needed to mount a new filesystem when the kernel-side
check is in place -- applies to current btrfs-next).

git://repo.or.cz/btrfs-progs-unstable/devel.git integration-20130311

david

--
Anand Jain (3):
  btrfs-progs: from troubleshooting point of view messages must be unique
  btrfs-progs: usage should match what is coded
  btrfs-progs: update the .gitignore file

David Sterba (1):
  btrfs-progs: separate super_copy out of fs_info

Eric Sandeen (14):
  btrfs-progs: close fd on cmd_subvol_list return
  btrfs-progs: close fd on do_convert error returns
  btrfs-progs: free resources on do_rollback error returns
  btrfs-progs: free allocated metadump structure on restore failure
  btrfs-progs: check for null string in parse_size
  btrfs-progs: tidy up cmd_snapshot() whitespace  returns
  btrfs-progs: Free resources when returning error from cmd_snapshot()
  btrfs-progs: tidy up cmd_subvol_create() whitespace  returns
  btrfs-progs: Free resources when returning error from cmd_subvol_create()
  btrfs-progs: check return of posix_fadvise
  btrfs-progs: Issue warnings if ioctls fail in sigint handlers
  btrfs-progs: better option/error handling for btrfs-vol
  btrfs-progs: Error handling in scrub_progress_cycle() thread
  btrfs-progs: fix scrub error return from pthread_mutex_lock

Zhi Yong Wu (1):
  btrfs-progs: update mkfs.btrfs help info for raid5/6
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: snapshot deletion / unmount slowness

2013-03-11 Thread Liu Bo
On Sun, Mar 10, 2013 at 10:31:08PM -0700, Michael Johnson - MJ wrote:
 I currently have a btrfs filesystem that I am unmounting and it has
 been has been unmounting for the last 20 minutes.
 
 I'm pretty sure I know exactly what is going on and in my current
 situation it's not a huge issues, but it would be a problem if this
 was a production system and I was trying to do a maintenance.
 
 Here is how I got into this situation:
 
 I am migrating my data from one pair of disks (mirrored with btrfs) to
 another pair of disks.  I rsync'd my data from the original btrfs file
 system to the other.  When it completed, my new filesystem showed
 165GB used. The original show 1.8TB used.  I came to the conclusion
 that it must be the daily snapshots I have that were using the
 majority of the space and because I was going to destroy the
 filesystem, I decided, what the heck, let me destroy the snapshots and
 see what it looks like.
 
 To my surprise, removing all the snapshots resulted in the usage
 dropping from 1.8TB to 1.7TB.  I re-ran my rsync, it complete without
 transferring any new data.  I then did a du -s in the mountpoint for
 the original filesystem and is reported back 165GB which agrees with
 what rsync and df on the new filesystem reports.
 
 My first thought was that I must have some sort of bizarre corruption
 on the original filesystem.  And then I went to unmount it and it
 still has not returned.
 
 What I now suspect is going on is that while deleting the snapshots
 was quick, that probably kicks of a background thread which actually
 does the heavy lifting.  I noticed a btrfs-cleaner process that was in
 an io wait state, which I presumed was the process in question.
 However, now 40 minutes later, my unmount is still hung and the
 btrfs-cleaner process is sleeping, so perhaps I am wrong.

You're right, umount will wake up cleaner kthread to do 'real work' of cleanup
marked 'delete' snapshot/subvolume.

but while btrfs-cleaner is sleeping, could you please show what unmount is
waiting for?

Maybe 'cat /proc//stack' will be helpful on figuring out why.

thanks,
liubo

 
 At this point I am going to powercycle my system, but I figured I
 would check and see if anyone else knew for certain it this was the
 type of behavior one would expect to see when removing large snapshots
 and then immediately trying to unmount the filesystem.  If so, it
 seems like this is something that would need to change before someone
 would want to seriously consider using btrfs w/ snapshots in a
 production environment.  I know btrfs is not considered production
 ready yet (well, at least not by the developers, regardless of what
 Oracle and Suse say).  At the same time, I've not been able to find
 any mention of similar problems, so I figured it was worth mentioning.
 
 --
 Michael Johnson - MJ
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] btrfs-progs: use BTRFS_SCAN_BACKUP_SB flag in btrfs_scan_one_device

2013-03-11 Thread Eric Sandeen
On 3/8/13 9:25 AM, Anand Jain wrote:
 bug:
 ---
 mkfs.btrfs /dev/sdb -f  yes| mkfs.ext4 /dev/sdb  mount /dev/sdb /ext4
 mkfs.btrfs -f /dev/sdc /dev/sdd (run twice)
 mkfs.btrfs -f /dev/sdc /dev/sdd
 ::
 ERROR: unable to scan the device '/dev/sdb' - Device or resource busy
 ERROR: unable to scan the device '/dev/sdb' - Device or resource busy
 adding device /dev/sdd id 2
 fs created label (null) on /dev/sdc
   nodesize 4096 leafsize 4096 sectorsize 4096 size 3.11GB
 
 
 Since we run mkfs.btrfs twice above, there is already a stale
 btrfs when mkfs.btrfs is run for the 2nd time. which kicks in
 btrfs_scan_for_fsid() to perform a system-wide scan to find the
 stale btrfs's partner (to check if that by any chance is mounted)
 which in process comes across /dev/sdb. Now when it finds
 /dev/sdb it finds that primary SB is not present and we need
 to stop him there.
 This is done by NOT setting BTRFS_SCAN_BACKUP_SB for the function
 btrfs_scan_for_fsid(). To ensure rest of the logic is unaffected,
 this patch will ensure BTRFS_SCAN_BACKUP_SB is set for all other
 places except at check_mounted_where().

Thanks, this seems like progress in the right direction.

But that means that many other paths will still scan backups, right?

In the following case sdb1 is an ext4-mounted partition w/ a stale btrfs
backup superblock present in the middle.

# mount /dev/sdb1 /mnt/test
# mount | grep sdb1
/dev/sdb1 on /mnt/test type ext4 (rw)

# btrfs device scan /dev/sdb1
Scanning for Btrfs filesystems in '/dev/sdb1'
ERROR: unable to scan the device '/dev/sdb1' - Device or resource busy

Perhaps this is ok since we explicitly told it to scan an
ext4-mounted device.

[[But, then if I unmount it:

# btrfs device scan /dev/sdb1
Scanning for Btrfs filesystems in '/dev/sdb1'
ERROR: unable to scan the device '/dev/sdb1' - Invalid argument

weird, not sure where that came from.  :(  Unrelated to this question
though.]]

Also:

# btrfs filesystem show /dev/sdb1
Label: none  uuid: a96ea6e6-d3d5-444d-9aaf-057ec579dffe
Total devices 1 FS bytes used 28.00KB
devid1 size 4.00GB used 445.50MB path /dev/sdb1

whoa, ok, so it's a currently mounted ext4 device, but filesystem
show tells me it's btrfs?

How about this one:
# mount /dev/sdb1 /mnt/test
# mount | grep sdb1
/dev/sdb1 on /mnt/test type ext4 (rw)

# btrfs check /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: a96ea6e6-d3d5-444d-9aaf-057ec579dffe
checking extents
checking fs roots
checking root refs
found 28672 bytes used err is 0
total csum bytes: 0
total tree bytes: 28672
total fs tree bytes: 8192
btree space waste bytes: 22875
file data blocks allocated: 0
 referenced 0
Btrfs v0.20-rc1-194-g3deeb4c

So my mountged ext4 fs is also a perfectly consistent btrfs fs?

and I think the list goes on.

IMHO, nothing should be checking the backup superblocks unless explicitly
told to.  i.e. in e2fsprogs, e2fsck has:

   -b superblock
  Instead  of using the normal superblock, use an alternative
  superblock specified by superblock.

and debugfs has:

   -s superblock
  Causes the file system superblock to be read from the given
  block  number,  instead  of  using  the  primary superblock

I think the backups need to be used for explicit recovery (and maybe to
be checked once the primary has been confirmed) and never used during any
normal operation, if the first one is found to be missing.

-Eric



 Signed-off-by: Anand Jain anand.j...@oracle.com
 ---
  cmds-device.c | 3 ++-
  cmds-filesystem.c | 2 +-
  cmds-replace.c| 3 ++-
  disk-io.c | 7 ---
  find-root.c   | 5 +++--
  utils.c   | 9 ++---
  volumes.c | 4 ++--
  volumes.h | 2 +-
  8 files changed, 21 insertions(+), 14 deletions(-)
 
 diff --git a/cmds-device.c b/cmds-device.c
 index 1b8f378..9447e7f 100644
 --- a/cmds-device.c
 +++ b/cmds-device.c
 @@ -203,7 +203,8 @@ static int cmd_scan_dev(int argc, char **argv)
  
   printf(Scanning for Btrfs filesystems\n);
   if(checklist)
 - ret = btrfs_scan_block_devices(BTRFS_SCAN_REGISTER);
 + ret = btrfs_scan_block_devices(BTRFS_SCAN_REGISTER|
 +BTRFS_SCAN_BACKUP_SB);
   else
   ret = btrfs_scan_one_dir(/dev, BTRFS_SCAN_REGISTER);
   if (ret){
 diff --git a/cmds-filesystem.c b/cmds-filesystem.c
 index 2210020..d2e708d 100644
 --- a/cmds-filesystem.c
 +++ b/cmds-filesystem.c
 @@ -257,7 +257,7 @@ static int cmd_show(int argc, char **argv)
   usage(cmd_show_usage);
  
   if(checklist)
 - ret = btrfs_scan_block_devices(0);
 + ret = btrfs_scan_block_devices(BTRFS_SCAN_BACKUP_SB);
   else
   ret = btrfs_scan_one_dir(/dev, 0);
  
 diff --git a/cmds-replace.c b/cmds-replace.c
 index 4cc32df..f6e1619 100644
 --- a/cmds-replace.c
 +++ b/cmds-replace.c
 

Re: snapshot deletion / unmount slowness

2013-03-11 Thread Liu Bo
On Mon, Mar 11, 2013 at 11:20:15AM +0100, Swâmi Petaramesh wrote:
 Le 11/03/2013 07:47, Liu Bo a écrit :
  A recent commit(commit fa6ac8765c48a06dfed914e8c8c3a903f9d313a0
  Btrfs: fix cleaner thread not working with inode cache option)
  may improve the situation.
 
 Hi Liu,
 
 I have never seen this issue with btrfs-cleaner not working, when I
 delete snapshots it typically kicks in a few seconds later and works
 until done.
 

The 'not working' is a little confused, sorry.

It means that cleaner thread does not do its work in time.  When we delete a
snapshot/subvolume, we a)invalidate all of inodes that belong to it and then
b)add it to a list for cleaner thread to do the real work if the last inode is
destroyed from memory.

What the commit tries to fix is that the inode cache inode will remain in
memory so that keeps the snapshot/subvolume from adding to the cleanup list.
And this'd result in the situation that our space is not freed as we wish.

So back to the thread, if you notice that even cleaner thread does not help you
get free space after you've delete the snapshot/subvolume, there should be some
inodes of snapshot/subvolume remaining in memory.

 Does the bug you mention affect only specific kernel versions ?

After we have inode cache.

 
 AFAIK I use inode_cache (it's not in my fstab but I mounted my FSes
 using it manually, and I believe it's a persistent option ? - I may
 possibly be wrong...)

It's only working when you mount with it, it helps you reuse inode id.

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs: merge save_error_info helpers into one

2013-03-11 Thread David Sterba
Signed-off-by: David Sterba dste...@suse.cz
---
 fs/btrfs/super.c |7 +--
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 68a29a1..eed1464 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -91,7 +91,7 @@ static const char *btrfs_decode_error(int errno, char 
nbuf[16])
return errstr;
 }
 
-static void __save_error_info(struct btrfs_fs_info *fs_info)
+static void save_error_info(struct btrfs_fs_info *fs_info)
 {
/*
 * today we only save the error info into ram.  Long term we'll
@@ -100,11 +100,6 @@ static void __save_error_info(struct btrfs_fs_info 
*fs_info)
set_bit(BTRFS_FS_STATE_ERROR, fs_info-fs_state);
 }
 
-static void save_error_info(struct btrfs_fs_info *fs_info)
-{
-   __save_error_info(fs_info);
-}
-
 /* btrfs handle error by forcing the filesystem readonly */
 static void btrfs_handle_error(struct btrfs_fs_info *fs_info)
 {
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs: clean up transaction abort messages

2013-03-11 Thread David Sterba
The transaction abort stacktrace is printed only once per module
lifetime, but we'd like to see it each time it happens per filesystem.
Introduce a fs_state flag that records the state.

Tweak the messages around abort:
* add error number to the first abor
* print the exact negative errno from btrfs_decode_error and
  don't expect a simple snprintf to fail
* no dots at the end of the messages

Signed-off-by: David Sterba dste...@suse.cz
---
 fs/btrfs/ctree.h   |1 +
 fs/btrfs/super.c   |   19 +++
 fs/btrfs/transaction.c |5 ++---
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index e391d6b..14d8f8d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -340,6 +340,7 @@ static inline unsigned long btrfs_chunk_item_size(int 
num_stripes)
  */
 #define BTRFS_FS_STATE_ERROR   0
 #define BTRFS_FS_STATE_REMOUNTING  1
+#define BTRFS_FS_STATE_TRANS_ABORTED   2
 
 /* Super block flags */
 /* Errors detected */
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index eed1464..fe0d6ce 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -65,7 +65,7 @@ static struct file_system_type btrfs_fs_type;
 
 static const char *btrfs_decode_error(int errno, char nbuf[16])
 {
-   char *errstr = NULL;
+   char *errstr = nbuf;
 
switch (errno) {
case -EIO:
@@ -81,10 +81,7 @@ static const char *btrfs_decode_error(int errno, char 
nbuf[16])
errstr = Object already exists;
break;
default:
-   if (nbuf) {
-   if (snprintf(nbuf, 16, error %d, -errno) = 0)
-   errstr = nbuf;
-   }
+   snprintf(nbuf, 16, error %d, errno);
break;
}
 
@@ -121,7 +118,6 @@ static void btrfs_handle_error(struct btrfs_fs_info 
*fs_info)
 * mounted writeable again, the device replace
 * operation continues.
 */
-// WARN_ON(1);
}
 }
 
@@ -247,7 +243,14 @@ void __btrfs_abort_transaction(struct btrfs_trans_handle 
*trans,
   struct btrfs_root *root, const char *function,
   unsigned int line, int errno)
 {
-   WARN_ONCE(1, KERN_DEBUG btrfs: Transaction aborted\n);
+   /*
+* Report first abort since mount
+*/
+   if (!test_and_set_bit(BTRFS_FS_STATE_TRANS_ABORTED,
+   root-fs_info-fs_state)) {
+   WARN(1, KERN_DEBUG btrfs: Transaction aborted (error %d)\n,
+   errno);
+   }
trans-aborted = errno;
/* Nothing used. The other threads that have joined this
 * transaction may be able to continue. */
@@ -257,7 +260,7 @@ void __btrfs_abort_transaction(struct btrfs_trans_handle 
*trans,
 
errstr = btrfs_decode_error(errno, nbuf);
btrfs_printk(root-fs_info,
-%s:%d: Aborting unused transaction(%s).\n,
+%s:%d: Aborting unused transaction (%s)\n,
 function, line, errstr);
return;
}
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index a0467eb..a5bbda1 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1810,7 +1810,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans,
ret = btrfs_write_and_wait_transaction(trans, root);
if (ret) {
btrfs_error(root-fs_info, ret,
-   Error while writing out transaction.);
+   Error while writing out transaction);
mutex_unlock(root-fs_info-tree_log_mutex);
goto cleanup_transaction;
}
@@ -1866,8 +1866,7 @@ cleanup_transaction:
btrfs_qgroup_free(root, trans-qgroup_reserved);
trans-qgroup_reserved = 0;
}
-   btrfs_printk(root-fs_info, Skipping commit of aborted 
transaction.\n);
-// WARN_ON(1);
+   btrfs_printk(root-fs_info, Skipping commit of aborted transaction\n);
if (current-journal_info == trans)
current-journal_info = NULL;
cleanup_transaction(trans, root, ret);
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Creating zero-filled file aborts after 20GB in a 4GB volume with compress=lzo

2013-03-11 Thread David Sterba
On Mon, Mar 11, 2013 at 04:16:34PM +0100, Clemens Eisserer wrote:
 When running ...
 
  dd if=/dev/zero of=testfile bs=1M
 
 on a compressed btrfs volume of 4GB mounted with compress=lzo, dd
 aborts after about 20GB written.

# mkfs 4g
# dd if=/dev/zero of=testfile bs=1M
dd: writing `testfile': No space left on device
58623787008 bytes (59 GB) copied, 154.548 s, 379 MB/s
# btrfs fi df .
Data: total=2.04GB, used=1.71GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=204.75MB, used=77.56MB
Metadata: total=8.00MB, used=0.00
# pretty $(filesize testfile)
58,623,787,008 bytes (54.6 GiB)

I'm not sure why the enospc came so early (never seen that before *cough*),
maybe the other tests running in parallel, so a usuall

# cat /dev/zero  zerofill
cat: write error: No space left on device
# pretty $(filesize zerofill)
47648243712 bytes (44.4 GiB)

# btrfs fi df .
Data: total=3.32GB, used=3.10GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=332.75MB, used=144.19MB
Metadata: total=8.00MB, used=0.00

Now that looks like a full fs, with ~100GB worth of compressed zeros.

 As I don't think this can be attributed to metadata consumed ... Any
 ideas why lzo achieves such a poor ratio for even highly compressible
 data?

What does your 'fi df' report before and after the test?

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: get better concurrency for snapshot-aware defrag work

2013-03-11 Thread David Sterba
On Mon, Mar 11, 2013 at 05:20:58PM +0800, Liu Bo wrote:
 Using spinning case instead of blocking will result in better concurrency
 overall.

Do you have numbers to support that?

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: snapshot deletion / unmount slowness

2013-03-11 Thread David Sterba
On Sun, Mar 10, 2013 at 10:31:08PM -0700, Michael Johnson - MJ wrote:
 I currently have a btrfs filesystem that I am unmounting and it has
 been has been unmounting for the last 20 minutes.
 
 I'm pretty sure I know exactly what is going on and in my current
 situation it's not a huge issues, but it would be a problem if this
 was a production system and I was trying to do a maintenance.
 
 Here is how I got into this situation:
 
 What I now suspect is going on is that while deleting the snapshots
 was quick, that probably kicks of a background thread which actually
 does the heavy lifting.  I noticed a btrfs-cleaner process that was in
 an io wait state, which I presumed was the process in question.
 However, now 40 minutes later, my unmount is still hung and the
 btrfs-cleaner process is sleeping, so perhaps I am wrong.

The umount blocked by cleaner is known and I have now a patch ready to
improve that

http://thread.gmane.org/gmane.comp.file-systems.btrfs/23212

cleaner does not wait to do the background work for all deleted
snapshots and is able to return in the middle of processing the current
one when the fs si going down.

There's another umount blocker, when a huge orphan file is being cleaned
up, but from first look it also seems to be possible exit early if
umount is detected.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] btrfs-progs: use BTRFS_SCAN_BACKUP_SB flag in btrfs_scan_one_device

2013-03-11 Thread David Sterba
On Mon, Mar 11, 2013 at 10:03:46AM -0500, Eric Sandeen wrote:
 IMHO, nothing should be checking the backup superblocks unless explicitly
 told to.

That's the whole point I believe.

update the infrastructure, every SB access looks to the first copy
unless told by command line options.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs: clean up transaction abort messages

2013-03-11 Thread Zach Brown
 * print the exact negative errno from btrfs_decode_error and
   don't expect a simple snprintf to fail

What an.. odd function.  Looks like it was inherited from ext*.  And the
callers over in that neck of the woods also don't check for the
implemented-but-basically-impossible snprintf failure that leads to
returning null.

 + snprintf(nbuf, 16, error %d, errno);

The buffer is only used to print the error number for unknown errors?

If changing this function anyway, maybe you can find a few minutes to:

 - drop the nbuf arugment
 - just return static strings for known errnos
 - return unknown for unknown errors
 - and have the callers always print the string and error
   : %s (errno %d)

No worries if you're not keen to fix it up, but it'd be nice.  One less
wart to be distracted by when stumbling through the code.

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


converting to raid5

2013-03-11 Thread Remco Hosman
Hi,

Just installed 3.9.0-rc2 and the latest btrfs-progs. 

filesystem is a 4 disk raid1 array.

first, i did the following: `btrfs val start -dconvert=raid5,usage=1` to 
convert the mostly empty chunks.
This resulted in a lot of allocated space (10's of gigs), with only a few 100 
meg used.
i did `btrfs val start -dusage=75` to clean things up.

then i ran `btrfs bal start -dconvert=raid5,soft`.
I noticed how the difference between total and used for raid5 kept growing. 
My guess is that its taking 1 raid1 chunk (2x1 gig disk space, 1 gig data), and 
moving it to 1 raid5 chunk (4gig disk space, 3gig data), leaving all chunks 33% 
used.

This is what 3 calls of `btrfs file df /` looks like a few minutes after each 
other, with the balance still running:

Data, RAID1: total=807.00GB, used=805.70GB
Data, RAID5: total=543.00GB, used=192.81GB
System, RAID1: total=32.00MB, used=192.00KB
Metadata, RAID1: total=6.00GB, used=3.54GB
--
Data, RAID1: total=800.00GB, used=798.70GB
Data, RAID5: total=564.00GB, used=199.30GB
System, RAID1: total=32.00MB, used=192.00KB
Metadata, RAID1: total=6.00GB, used=3.53GB
--
Data, RAID1: total=795.00GB, used=793.70GB
Data, RAID5: total=579.00GB, used=204.81GB
System, RAID1: total=32.00MB, used=192.00KB
Metadata, RAID1: total=6.00GB, used=3.54GB


Remco--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/5] Add man page description for NcMsPp replication levels

2013-03-11 Thread Hugo Mills
Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 man/btrfs.8.in  |   16 
 man/mkfs.btrfs.8.in |   24 +++-
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index 94f4ffe..4072510 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -25,6 +25,8 @@ btrfs \- control a btrfs filesystem
 [-s \fIstart\fR] [-t \fIsize\fR] -[vf] \fIfile\fR|\fIdir\fR \
 [\fIfile\fR|\fIdir\fR...]
 .PP
+\fBbtrfs\fP \fBfilesystem df\fP [-r|-e]\fI path \fP
+.PP
 \fBbtrfs\fP \fBfilesystem sync\fP\fI path \fP
 .PP
 \fBbtrfs\fP \fBfilesystem resize\fP\fI [devid:][+/\-]size[gkm]|[devid:]max 
filesystem\fP
@@ -217,6 +219,20 @@ don't use it if you use snapshots, have de-duplicated your 
data or made
 copies with \fBcp --reflink\fP.
 .TP
 
+\fBfilesystem df\fR [-r|-e] \fIpath\fR
+Show usage information for the filesystem identified by \fIpath\fR.
+
+\fB-r, --raid\fP Use old-style RAID-n terminology to show replication types
+
+\fB-e, --explain\fP Explain the new-style NcMsPp terminology in more
+detail: Nc shows the number of copies of data; a trailing d
+indicates reduced device redundancy (e.g. more than one of the copies
+may live on a single device), Ms shows the number of data stripes per
+copy (with Xs indicating as many as will fit across the available
+devices), and Pp shows the number of parity stripes.
+
+.TP
+
 \fBfilesystem sync\fR\fI path \fR
 Force a sync for the filesystem identified by \fIpath\fR.
 .TP
diff --git a/man/mkfs.btrfs.8.in b/man/mkfs.btrfs.8.in
index 41163e0..6d1f5d0 100644
--- a/man/mkfs.btrfs.8.in
+++ b/man/mkfs.btrfs.8.in
@@ -37,7 +37,29 @@ mkfs.btrfs uses all the available storage for the filesystem.
 .TP
 \fB\-d\fR, \fB\-\-data \fItype\fR
 Specify how the data must be spanned across the devices specified. Valid
-values are raid0, raid1, raid10 or single.
+values are of the form nc[d][ms[pp]], where n is the number of copies
+of data, m is the number of stripes per copy, and p is the number of parity
+stripes. The m parameter must (currently) be a literal X, indicating that
+as many stripes as possible will be used. The letter d may be added to the
+number of copies, to indicate non-redundant copies (e.g. on the same device).
+
+The following deprecated values may also be used:
+.RS 16
+.P
+single 1c
+.P
+raid0  1cXs
+.P
+raid1  2c
+.P
+dup2cd
+.P
+raid10 2cXsS
+.P
+raid5  1cXs1p
+.P
+raid6  1cXs2p
+.RS -16
 .TP
 \fB\-f\fR
 Force overwrite when an existing filesystem is detected on the device.
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/5] Convert balance filter parser to use common NcMsPp replication-level parser

2013-03-11 Thread Hugo Mills
Balance filters are the second location which takes user input of
replication levels. Update this to use the common parser so that we can
provide nCmSpP-style names.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 cmds-balance.c |   23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/cmds-balance.c b/cmds-balance.c
index f5dc317..6186963 100644
--- a/cmds-balance.c
+++ b/cmds-balance.c
@@ -42,23 +42,16 @@ static const char balance_cmd_group_info[] =
 
 static int parse_one_profile(const char *profile, u64 *flags)
 {
-   if (!strcmp(profile, raid0)) {
-   *flags |= BTRFS_BLOCK_GROUP_RAID0;
-   } else if (!strcmp(profile, raid1)) {
-   *flags |= BTRFS_BLOCK_GROUP_RAID1;
-   } else if (!strcmp(profile, raid10)) {
-   *flags |= BTRFS_BLOCK_GROUP_RAID10;
-   } else if (!strcmp(profile, raid5)) {
-   *flags |= BTRFS_BLOCK_GROUP_RAID5;
-   } else if (!strcmp(profile, raid6)) {
-   *flags |= BTRFS_BLOCK_GROUP_RAID6;
-   } else if (!strcmp(profile, dup)) {
-   *flags |= BTRFS_BLOCK_GROUP_DUP;
-   } else if (!strcmp(profile, single)) {
-   *flags |= BTRFS_AVAIL_ALLOC_BIT_SINGLE;
-   } else {
+   u64 result;
+
+   result = parse_profile(profile);
+   if (result == (u64)-1) {
fprintf(stderr, Unknown profile '%s'\n, profile);
return 1;
+   } else if (result == 0) {
+   *flags |= BTRFS_AVAIL_ALLOC_BIT_SINGLE;
+   } else {
+   *flags |= result;
}
 
return 0;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/5] Move parse_profile to utils.c

2013-03-11 Thread Hugo Mills
Make parse_profile a shared function so it can be used across the
code-base.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 mkfs.c  |   94 ---
 utils.c |   94 +++
 utils.h |1 +
 3 files changed, 95 insertions(+), 94 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index 70df5db..0facf13 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -348,100 +348,6 @@ static void print_version(void)
exit(0);
 }
 
-static u64 make_profile(int copies, int dup, int stripes, int parity)
-{
-   if(copies == 1  !dup  stripes == 0  parity == 0)
-   return 0;
-   else if(copies == 2  dup  stripes == 0  parity == 0)
-   return BTRFS_BLOCK_GROUP_DUP;
-   else if(copies == 2  !dup  stripes == 0  parity == 0)
-   return BTRFS_BLOCK_GROUP_RAID1;
-   else if(copies == 2  !dup  stripes == -1  parity == 0)
-   return BTRFS_BLOCK_GROUP_RAID10;
-   else if(copies == 1  !dup  stripes == -1  parity == 0)
-   return BTRFS_BLOCK_GROUP_RAID0;
-   else if(copies == 1  !dup  stripes == -1  parity == 1)
-   return BTRFS_BLOCK_GROUP_RAID5;
-   else if(copies == 1  !dup  stripes == -1  parity == 2)
-   return BTRFS_BLOCK_GROUP_RAID6;
-
-   return (u64)-1;
-}
-
-static u64 parse_profile(const char *s)
-{
-   char *pos, *parse_end;
-   int copies = 1;
-   int stripes = 0;
-   int parity = 0;
-   int dup = 0;
-   u64 profile = (u64)-1;
-
-   /* Look for exact match with historical forms first */
-   if (strcmp(s, raid0) == 0) {
-   return BTRFS_BLOCK_GROUP_RAID0;
-   } else if (strcmp(s, raid1) == 0) {
-   return BTRFS_BLOCK_GROUP_RAID1;
-   } else if (strcmp(s, raid5) == 0) {
-   return BTRFS_BLOCK_GROUP_RAID5;
-   } else if (strcmp(s, raid6) == 0) {
-   return BTRFS_BLOCK_GROUP_RAID6;
-   } else if (strcmp(s, raid10) == 0) {
-   return BTRFS_BLOCK_GROUP_RAID10;
-   } else if (strcmp(s, dup) == 0) {
-   return BTRFS_BLOCK_GROUP_DUP;
-   } else if (strcmp(s, single) == 0) {
-   return 0;
-   }
-
-   /* Attempt to parse new ncmspp form */
-   /* nc is required and n must be an unsigned decimal number */
-   copies = strtoul(s, parse_end, 10);
-   if(parse_end == s || (*parse_end != 'c'  *parse_end != 'C'))
-   goto unknown;
-
-   /* c may be followed by d to indicate non-redundant/DUP */
-   pos = parse_end + 1;
-   if(*pos == 'd' || *pos == 'D') {
-   dup = 1;
-   pos++;
-   }
-   if(*pos == 0)
-   goto done;
-
-   /* ms is optional, and m may be an integer, or a literal x */
-   if(*pos == 'x' || *pos == 'X') {
-   stripes = -1;
-   parse_end = pos+1;
-   } else {
-   stripes = strtoul(pos, parse_end, 10);
-   }
-   if(parse_end == pos || (*parse_end != 's'  *parse_end != 'S'))
-   goto unknown;
-
-   pos = parse_end + 1;
-   if(*pos == 0)
-   goto done;
-
-   /* pp is optional, and p must be an integer */
-   parity = strtoul(pos, parse_end, 10);
-   if(parse_end == pos || (*parse_end != 'p'  *parse_end != 'P'))
-   goto unknown;
-   pos = parse_end + 1;
-   if(*pos != 0)
-   goto unknown;
-
-done:
-   profile = make_profile(copies, dup, stripes, parity);
-   if(profile == (u64)-1)
-   fprintf(stderr, Unknown or unavailable profile '%s'\n, s);
-   return profile;
-
-unknown:
-   fprintf(stderr, Unparseable profile '%s'\n, s);
-   return (u64)-1;
-}
-
 static char *parse_label(char *input)
 {
int len = strlen(input);
diff --git a/utils.c b/utils.c
index f68436d..f1d2432 100644
--- a/utils.c
+++ b/utils.c
@@ -1420,6 +1420,100 @@ u64 parse_size(char *s)
return strtoull(s, NULL, 10) * mult;
 }
 
+static u64 make_profile(int copies, int dup, int stripes, int parity)
+{
+   if(copies == 1  !dup  stripes == 0  parity == 0)
+   return 0;
+   else if(copies == 2  dup  stripes == 0  parity == 0)
+   return BTRFS_BLOCK_GROUP_DUP;
+   else if(copies == 2  !dup  stripes == 0  parity == 0)
+   return BTRFS_BLOCK_GROUP_RAID1;
+   else if(copies == 2  !dup  stripes == -1  parity == 0)
+   return BTRFS_BLOCK_GROUP_RAID10;
+   else if(copies == 1  !dup  stripes == -1  parity == 0)
+   return BTRFS_BLOCK_GROUP_RAID0;
+   else if(copies == 1  !dup  stripes == -1  parity == 1)
+   return BTRFS_BLOCK_GROUP_RAID5;
+   else if(copies == 1  !dup  stripes == -1  parity == 2)
+   return BTRFS_BLOCK_GROUP_RAID6;
+
+   return (u64)-1;
+}
+
+u64 parse_profile(const char *s)
+{
+   char *pos, *parse_end;
+   int copies = 1;
+

[PATCH v2 4/5] Change output of btrfs fi df to report new (or old) RAID names

2013-03-11 Thread Hugo Mills
Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 cmds-filesystem.c |  173 ++---
 1 file changed, 152 insertions(+), 21 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 2210020..3150ff7 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -18,6 +18,7 @@
 #include stdlib.h
 #include string.h
 #include unistd.h
+#include getopt.h
 #include sys/ioctl.h
 #include errno.h
 #include uuid/uuid.h
@@ -39,11 +40,129 @@ static const char * const filesystem_cmd_group_usage[] = {
 };
 
 static const char * const cmd_df_usage[] = {
-   btrfs filesystem df path,
+   btrfs filesystem df [options] path,
Show space usage information for a mount point,
+   ,
+   -r  Use old-style RAID-n terminology,
+   -e  Explain new-style NcMsPp terminology,
NULL
 };
 
+static const char *cmd_df_short_options = re;
+static const struct option cmd_df_options[] = {
+   { raid,no_argument, NULL, 'r' },
+   { explain, no_argument, NULL, 'e' },
+   { NULL, 0, NULL, 0 }
+};
+
+#define RAID_NAMES_NEW 0
+#define RAID_NAMES_OLD 1
+#define RAID_NAMES_LONG 2
+
+static int write_raid_name(char* buffer, int size, u64 flags, int raid_format)
+{
+   int copies, stripes, parity;
+   int out;
+   int written = 0;
+
+   if (raid_format == RAID_NAMES_OLD) {
+   if (flags  BTRFS_BLOCK_GROUP_RAID0) {
+   return snprintf(buffer, size, %s, RAID0);
+   } else if (flags  BTRFS_BLOCK_GROUP_RAID1) {
+   return snprintf(buffer, size, %s, RAID1);
+   } else if (flags  BTRFS_BLOCK_GROUP_DUP) {
+   return snprintf(buffer, size, %s, DUP);
+   } else if (flags  BTRFS_BLOCK_GROUP_RAID10) {
+   return snprintf(buffer, size, %s, RAID10);
+   } else if (flags  BTRFS_BLOCK_GROUP_RAID5) {
+   return snprintf(buffer, size, %s, RAID5);
+   } else if (flags  BTRFS_BLOCK_GROUP_RAID6) {
+   return snprintf(buffer, size, %s, RAID6);
+   }
+   return 0;
+   }
+
+   if (flags  (BTRFS_BLOCK_GROUP_RAID1
+| BTRFS_BLOCK_GROUP_RAID10
+| BTRFS_BLOCK_GROUP_DUP)) {
+   copies = 2;
+   } else {
+   copies = 1;
+   }
+
+   if (raid_format == RAID_NAMES_LONG)
+   out = snprintf(buffer, size, %d copies, copies);
+   else
+   out = snprintf(buffer, size, %dc, copies);
+   if (size  out)
+   return written + size;
+   written += out;
+   size -= out;
+
+   if (flags  BTRFS_BLOCK_GROUP_DUP) {
+   if (raid_format == RAID_NAMES_LONG)
+   out = snprintf(buffer+written, size,  low redundancy);
+   else
+   out = snprintf(buffer+written, size, d);
+   if (size  out)
+   return written + size;
+   written += out;
+   size -= out;
+   }
+
+   if (flags  (BTRFS_BLOCK_GROUP_RAID0
+| BTRFS_BLOCK_GROUP_RAID10
+| BTRFS_BLOCK_GROUP_RAID5
+| BTRFS_BLOCK_GROUP_RAID6)) {
+   stripes = -1;
+   } else {
+   stripes = 0;
+   }
+
+   if (stripes == -1) {
+   if (raid_format == RAID_NAMES_LONG)
+   out = snprintf(buffer+written, size, , fit stripes);
+   else
+   out = snprintf(buffer+written, size, Xs);
+   } else if (stripes == 0) {
+   out = 0;
+   } else {
+   if (raid_format == RAID_NAMES_LONG)
+   out = snprintf(buffer+written, size, , %d stripes, 
stripes);
+   else
+   out = snprintf(buffer+written, size, %ds, stripes);
+   }
+
+   if (size  out)
+   return written + size;
+   written += out;
+   size -= out;
+
+   if (flags  BTRFS_BLOCK_GROUP_RAID5) {
+   parity = 1;
+   } else if (flags  BTRFS_BLOCK_GROUP_RAID6) {
+   parity = 2;
+   } else {
+   parity = 0;
+   }
+
+   if (parity == 0) {
+   out = 0;
+   } else {
+   if (raid_format == RAID_NAMES_LONG)
+   out = snprintf(buffer+written, size, , %d parity, 
parity);
+   else
+   out = snprintf(buffer+written, size, %dp, parity);
+   }
+
+   if (size  out)
+   return written + size;
+   written += out;
+   size -= out;
+
+   return written;
+}
+
 static int cmd_df(int argc, char **argv)
 {
struct btrfs_ioctl_space_args *sargs, *sargs_orig;
@@ -52,11 +171,32 @@ static int cmd_df(int argc, char **argv)
int fd;
int e;

[PATCH v2 0/5] RAID-level terminology change

2013-03-11 Thread Hugo Mills
   Some time ago, and occasionally since, we've discussed altering the
RAID-n terminology to change it to an NcMsPp format, where N is the
number of copies, M is the number of (data) devices in a stripe per copy,
and P is the number of parity devices in a stripe.

   The current kernel implementation uses as many devices as it can in the
striped modes (RAID-0, -10, -5, -6), and in this implementation, that is
written as Xs (with a literal X). The Ms and Pp sections are omitted
if the value is 1s or 0p.

   The magic look-up table for old-style / new-style is:

single   1c (or omitted, in btrfs fi df output)
RAID-0   1cXs
RAID-1   2c
DUP  2cd
RAID-10  2cXs
RAID-5   1cXs1p
RAID-6   1cXs2p

   The following patch set modifies userspace tools to accept c/s/p formats
in input (mkfs and the restriper). The older formats are also accepted. It
also prints the newer formats by default in btrfs fi df, with an option to
show the older format for the traditionalists, and to expand the abbreviation
verbosely for those unfamiliar with it.

v1 - v2: Changed to use lower-case letters for c/s/p, for readability
  Changed mS to Xs for readability
  Added explain option to df
  Switched option parsing for df to getopt_long

   Hugo.

Hugo Mills (5):
  Use NcMsPp format for mkfs
  Move parse_profile to utils.c
  Convert balance filter parser to use common NcMsPp replication-level
parser
  Change output of btrfs fi df to report new (or old) RAID names
  Add man page description for NcMsPp replication levels

 cmds-balance.c  |   23 +++
 cmds-filesystem.c   |  173 ---
 man/btrfs.8.in  |   16 +
 man/mkfs.btrfs.8.in |   24 ++-
 mkfs.c  |   35 +++
 utils.c |   94 
 utils.h |1 +
 7 files changed, 303 insertions(+), 63 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/5] Use NcMsPp format for mkfs

2013-03-11 Thread Hugo Mills
Teach mkfs.btrfs about ncmspp format for replication levels, which avoids
the semantic uncertainty over the RAID-XYZ naming.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 mkfs.c |   91 +++-
 1 file changed, 84 insertions(+), 7 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index b2520ce..70df5db 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -326,7 +326,9 @@ static void print_usage(void)
fprintf(stderr, options:\n);
fprintf(stderr, \t -A --alloc-start the offset to start the FS\n);
fprintf(stderr, \t -b --byte-count total number of bytes in the FS\n);
-   fprintf(stderr, \t -d --data data profile, raid0, raid1, raid5, raid6, 
raid10, dup or single\n);
+   fprintf(stderr, \t -d --data data profile: nc[d][ms[pp]]\n);
+   fprintf(stderr, \t\tfor n copies (d=reduced dev redundancy), m 
stripes, p parity stripes\n);
+   fprintf(stderr, \t\tor raid0, raid1, raid10, dup or single 
(deprecated)\n);
fprintf(stderr, \t -l --leafsize size of btree leaves\n);
fprintf(stderr, \t -L --label set a label\n);
fprintf(stderr, \t -m --metadata metadata profile, values like data 
profile\n);
@@ -346,8 +348,36 @@ static void print_version(void)
exit(0);
 }
 
-static u64 parse_profile(char *s)
+static u64 make_profile(int copies, int dup, int stripes, int parity)
 {
+   if(copies == 1  !dup  stripes == 0  parity == 0)
+   return 0;
+   else if(copies == 2  dup  stripes == 0  parity == 0)
+   return BTRFS_BLOCK_GROUP_DUP;
+   else if(copies == 2  !dup  stripes == 0  parity == 0)
+   return BTRFS_BLOCK_GROUP_RAID1;
+   else if(copies == 2  !dup  stripes == -1  parity == 0)
+   return BTRFS_BLOCK_GROUP_RAID10;
+   else if(copies == 1  !dup  stripes == -1  parity == 0)
+   return BTRFS_BLOCK_GROUP_RAID0;
+   else if(copies == 1  !dup  stripes == -1  parity == 1)
+   return BTRFS_BLOCK_GROUP_RAID5;
+   else if(copies == 1  !dup  stripes == -1  parity == 2)
+   return BTRFS_BLOCK_GROUP_RAID6;
+
+   return (u64)-1;
+}
+
+static u64 parse_profile(const char *s)
+{
+   char *pos, *parse_end;
+   int copies = 1;
+   int stripes = 0;
+   int parity = 0;
+   int dup = 0;
+   u64 profile = (u64)-1;
+
+   /* Look for exact match with historical forms first */
if (strcmp(s, raid0) == 0) {
return BTRFS_BLOCK_GROUP_RAID0;
} else if (strcmp(s, raid1) == 0) {
@@ -362,12 +392,54 @@ static u64 parse_profile(char *s)
return BTRFS_BLOCK_GROUP_DUP;
} else if (strcmp(s, single) == 0) {
return 0;
+   }
+
+   /* Attempt to parse new ncmspp form */
+   /* nc is required and n must be an unsigned decimal number */
+   copies = strtoul(s, parse_end, 10);
+   if(parse_end == s || (*parse_end != 'c'  *parse_end != 'C'))
+   goto unknown;
+
+   /* c may be followed by d to indicate non-redundant/DUP */
+   pos = parse_end + 1;
+   if(*pos == 'd' || *pos == 'D') {
+   dup = 1;
+   pos++;
+   }
+   if(*pos == 0)
+   goto done;
+
+   /* ms is optional, and m may be an integer, or a literal x */
+   if(*pos == 'x' || *pos == 'X') {
+   stripes = -1;
+   parse_end = pos+1;
} else {
-   fprintf(stderr, Unknown profile %s\n, s);
-   print_usage();
+   stripes = strtoul(pos, parse_end, 10);
}
-   /* not reached */
-   return 0;
+   if(parse_end == pos || (*parse_end != 's'  *parse_end != 'S'))
+   goto unknown;
+
+   pos = parse_end + 1;
+   if(*pos == 0)
+   goto done;
+
+   /* pp is optional, and p must be an integer */
+   parity = strtoul(pos, parse_end, 10);
+   if(parse_end == pos || (*parse_end != 'p'  *parse_end != 'P'))
+   goto unknown;
+   pos = parse_end + 1;
+   if(*pos != 0)
+   goto unknown;
+
+done:
+   profile = make_profile(copies, dup, stripes, parity);
+   if(profile == (u64)-1)
+   fprintf(stderr, Unknown or unavailable profile '%s'\n, s);
+   return profile;
+
+unknown:
+   fprintf(stderr, Unparseable profile '%s'\n, s);
+   return (u64)-1;
 }
 
 static char *parse_label(char *input)
@@ -1447,6 +1519,11 @@ int main(int ac, char **av)
printf(\nWARNING! - %s IS EXPERIMENTAL\n, BTRFS_BUILD_VERSION);
printf(WARNING! - see http://btrfs.wiki.kernel.org before using\n\n);
 
+   if (data_profile == (u64)-1 || metadata_profile == (u64)-1) {
+   fprintf(stderr, Cannot handle requested replication profile. 
Aborting\n);
+   exit(1);
+   }
+
if (source_dir == 0) {
file = av[optind++];
ret = is_swap_device(file);
@@ -1666,7 +1743,7 @@ raid_groups:
 
 

Re: [PATCH] btrfs-progs: add options for changing size representations

2013-03-11 Thread Mike Fleetwood
On 11 March 2013 10:12, Audrius Butkevicius
audrius.butkevic...@elastichosts.com wrote:
 Add '--si', '-h'/'--human-readable' and '--block-size' global options,
 which allow users to customize the way sizes are displayed.

 Options and their format tries to mimic GNU ls utility.

 Signed-off-by: Audrius Butkevicius audrius.butkevic...@elastichosts.com
 ---
  btrfs.c |3 ++
  utils.c |  146 
 +++
  utils.h |6 +++
  3 files changed, 138 insertions(+), 17 deletions(-)

 diff --git a/btrfs.c b/btrfs.c
 index 691adef..6a8fc30 100644
 --- a/btrfs.c
 +++ b/btrfs.c
 @@ -22,6 +22,8 @@
  #include crc32c.h
  #include commands.h
  #include version.h
 +#include ctree.h
 +#include utils.h

  static const char * const btrfs_cmd_group_usage[] = {
 btrfs [--help] [--version] group [group...] command [args],
 @@ -291,6 +293,7 @@ int main(int argc, char **argv)

 crc32c_optimization_init();

 +   handle_size_unit_args(argc, argv);
 fixup_argv0(argv, cmd-token);
 exit(cmd-fn(argc, argv));
  }
 diff --git a/utils.c b/utils.c
 index d660507..58c1919 100644
 --- a/utils.c
 +++ b/utils.c
 @@ -16,6 +16,7 @@
   * Boston, MA 021110-1307, USA.
   */

 +#define _GNU_SOURCE
  #define _XOPEN_SOURCE 700
  #define __USE_XOPEN2K8
  #define __XOPEN2K8 /* due to an error in dirent.h, to get dirfd() */
 @@ -1095,33 +1096,144 @@ out:
 return ret;
  }

 -static char *size_strs[] = { , KB, MB, GB, TB,
 -   PB, EB, ZB, YB};
 +static int sizes_format = SIZES_FORMAT_BYTES;
 +static u64 sizes_divisor = 1;
 +
 +void remove_arg(int i, int *argc, char ***argv)
 +{
 +   while (i++  *argc)
 +   (*argv)[i - 1] = (*argv)[i];
 +   (*argc)--;
 +}
 +
 +void handle_size_unit_args(int *argc, char ***argv)
 +{
 +   int k;
 +   int base = 1024;
 +   char *suffix;
 +   char *block_size;
 +   u64 value;
 +
 +   for (k = *argc - 1; k = 0; k--) {
 +if (!strcmp((*argv)[k], -h) ||
 +!strcmp((*argv)[k], --human-readable)) {
 +   sizes_format = SIZES_FORMAT_HUMAN;
 +   remove_arg(k, argc, argv);
 +} else if (!strcmp((*argv)[k], --si)) {
 +   sizes_format = SIZES_FORMAT_SI;
 +   remove_arg(k, argc, argv);
 +} else if (!strncmp((*argv)[k], --block-size, 12)) {
 +   if (strlen((*argv)[k])  14 || (*argv)[k][12] != '=') 
 {
 +   fprintf(stderr,
 +--block-size requires an 
 argument\n);
 +   exit(1);
 +   }
 +
 +   sizes_format = SIZES_FORMAT_BLOCK;
 +   block_size = strchr((*argv)[k], '=');
 +
 +   errno = 0;
 +   value = strtoull(++block_size, suffix, 10);
 +   if (errno == ERANGE  value == ULLONG_MAX) {
 +   fprintf(stderr,
 +--block-size argument '%s' too 
 large\n,
 +block_size);
 +   exit(1);
 +   }
 +   if (suffix == block_size)
 +   value = 1;
 +
 +   if (strlen(suffix) == 1  value  0) {
 +   base = 1024;
 +   } else if (strlen(suffix) == 2  suffix[1] == 'B'
 +value  0) {
 +   base = 1000;
 +   /* Allow non-zero values without a suffix */
 +   } else if (strlen(suffix) != 0 || value == 0) {
 +   fprintf(stderr,
 +invalid --block-size argument 
 '%s'\n,
 +block_size);
 +   exit(1);
 +   }
 +
 +   if (strlen(suffix)  0) {
 +   switch(suffix[0]) {
 +   case 'E':
 +   sizes_divisor *= base;
 +   case 'P':
 +   sizes_divisor *= base;
 +   case 'T':
 +   sizes_divisor *= base;
 +   case 'G':
 +   sizes_divisor *= base;
 +   case 'M':
 +   sizes_divisor *= base;
 +   case 'K':
 +   sizes_divisor *= base;
 +   break;
 +

[PATCH] btrfs-progs: Add a rule to build a static mkfs.btrfs

2013-03-11 Thread Antoine Sirinelli
Static mkfs.btrfs can be used to bootstrap a system from a live CD
which does not provide mkfs.btrfs.

The executable produced is named mkfs.btrfs.static and built by invoking
the static make rule.

Signed-off-by: Antoine Sirinelli anto...@monte-stello.com
---
 Makefile |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index bea8ae9..e986e51 100644
--- a/Makefile
+++ b/Makefile
@@ -72,7 +72,7 @@ all: version.h $(progs) manpages
 # NOTE: For static compiles, you need to have all the required libs
 #  static equivalent available
 #
-static: version.h btrfs.static
+static: version.h btrfs.static mkfs.btrfs.static
 
 version.h:
$(Q)bash version.sh
@@ -116,6 +116,11 @@ mkfs.btrfs: $(objects) mkfs.o
@echo [LD] $@
$(Q)$(CC) $(CFLAGS) -o mkfs.btrfs $(objects) mkfs.o $(LDFLAGS) $(LIBS) 
-lblkid
 
+mkfs.btrfs.static: $(static_objects) mkfs.static.o
+   @echo [LD] $@
+   $(Q)$(CC) $(STATIC_CFLAGS) -o mkfs.btrfs.static mkfs.static.o \
+   $(static_objects) $(STATIC_LDFLAGS) $(STATIC_LIBS)
+
 btrfs-debug-tree: $(objects) debug-tree.o
@echo [LD] $@
$(Q)$(CC) $(CFLAGS) -o btrfs-debug-tree $(objects) debug-tree.o 
$(LDFLAGS) $(LIBS)
@@ -178,7 +183,7 @@ clean :
@echo Cleaning
$(Q)rm -f $(progs) cscope.out *.o .*.d btrfs-convert btrfs-image 
btrfs-select-super \
  btrfs-zero-log btrfstune dir-test ioctl-test quick-test send-test 
btrfs.static btrfsck \
- version.h
+ version.h mkfs.btrfs.static
$(Q)$(MAKE) $(MAKEOPTS) -C man $@
 
 install: $(progs) install-man
-- 
1.7.10.4



signature.asc
Description: Digital signature


WARNING: at fs/btrfs/extent_map.c:77 free_extent_map

2013-03-11 Thread Johannes Hirte
Since the updates for linux-3.9 I've had three or four times a system
freeze and only a reset (Magic SysRq) helped. After the reboot I found
a bunch of this in syslog:

Mar 11 21:56:09 localhost kernel: [ cut here ]
Mar 11 21:56:09 localhost kernel: WARNING: at fs/btrfs/extent_map.c:77 
free_extent_map+0x64/0x76()
Mar 11 21:56:09 localhost kernel: Hardware name: EasyNote TK81
Mar 11 21:56:09 localhost kernel: Modules linked in: nfsv4 nfsd exportfs 
auth_rpcgss nfs_acl fuse nfs lockd sunrpc snd_hda_codec_hdmi 
snd_hda_codec_realtek snd_hda_intel ath9k snd_hda_codec ath9k_common ath9k_hw 
acer_wmi snd_hwdep snd_pcm ath sr_mod wmi broadcom snd_page_alloc snd_timer 
cdrom tg3 k10temp snd acpi_cpufreq ohci_hcd soundcore i2c_piix4 mperf
Mar 11 21:56:09 localhost kernel: Pid: 11260, comm: bogofilter Tainted: G   
 W3.9.0-rc2 #293
Mar 11 21:56:09 localhost kernel: Call Trace:
Mar 11 21:56:09 localhost kernel: [8102abc2] ? 
warn_slowpath_common+0x76/0x8c
Mar 11 21:56:09 localhost kernel: [8115dcff] ? 
free_extent_map+0x64/0x76
Mar 11 21:56:09 localhost kernel: [8115bc57] ? 
btrfs_drop_extent_cache+0x363/0x39f
Mar 11 21:56:09 localhost kernel: [81152db4] ? 
__cow_file_range+0x175/0x3c1
Mar 11 21:56:09 localhost kernel: [8114bb02] ? 
join_transaction.isra.34+0x30f/0x31a
Mar 11 21:56:09 localhost kernel: [8114d9f7] ? 
start_transaction+0x2d8/0x3e8
Mar 11 21:56:09 localhost kernel: [8115383e] ? 
cow_file_range+0xa9/0xc5
Mar 11 21:56:09 localhost kernel: [811538f7] ? 
run_delalloc_range+0x9d/0x33b
Mar 11 21:56:09 localhost kernel: [8116139b] ? 
free_extent_state+0x12/0x21
Mar 11 21:56:09 localhost kernel: [81163fa3] ? 
__extent_writepage+0x1a8/0x5d8
Mar 11 21:56:09 localhost kernel: [811635ae] ? 
end_extent_writepage+0x5d/0x5d
Mar 11 21:56:09 localhost kernel: [8116451d] ? 
extent_write_cache_pages.isra.29.constprop.47+0x14a/0x255
Mar 11 21:56:09 localhost kernel: [81164836] ? 
extent_writepages+0x49/0x60
Mar 11 21:56:09 localhost kernel: [81150146] ? 
btrfs_update_inode_item+0xde/0xde
Mar 11 21:56:09 localhost kernel: [8108fc58] ? 
__filemap_fdatawrite_range+0x4d/0x52
Mar 11 21:56:09 localhost kernel: [8115a192] ? 
btrfs_sync_file+0x48/0x203
Mar 11 21:56:09 localhost kernel: [810c85ff] ? vfs_write+0xaf/0xf8
Mar 11 21:56:09 localhost kernel: [810e783b] ? do_fsync+0x2b/0x50
Mar 11 21:56:09 localhost kernel: [810e7a42] ? sys_fdatasync+0xb/0xf
Mar 11 21:56:09 localhost kernel: [814877d2] ? 
system_call_fastpath+0x16/0x1b
Mar 11 21:56:09 localhost kernel: ---[ end trace 3eaea449d8d56f92 ]---

As far as I remeber, it happend when fetching emails with claws. But
it's not a reliable testcase. 

Another trace from the first time I found it in the logs. But here the
system didn't hang:

Mar  4 14:28:35 localhost kernel: [ cut here ]
Mar  4 14:28:35 localhost kernel: WARNING: at fs/btrfs/extent_map.c:77 
free_extent_map+0x64/0x76()
Mar  4 14:28:35 localhost kernel: Hardware name: EasyNote TK81
Mar  4 14:28:35 localhost kernel: Modules linked in: nfsd exportfs auth_rpcgss 
nfs_acl fuse nfs lockd sunrpc snd_hda_codec_hdmi snd_hda_codec_realtek 
snd_hda_intel ath9k snd_hda_codec ath9k_common snd_hwdep snd_pcm broadcom 
ath9k_hw snd_page_alloc ath sr_mod snd_timer acer_wmi snd cdrom wmi tg3 
ohci_hcd soundcore k10temp edac_core acpi_cpufreq i2c_piix4 mperf
Mar  4 14:28:35 localhost kernel: Pid: 1574, comm: flush-btrfs-1 Not tainted 
3.9.0-rc1 #289
Mar  4 14:28:35 localhost kernel: Call Trace:
Mar  4 14:28:35 localhost kernel: [8102ab92] ? 
warn_slowpath_common+0x76/0x8c
Mar  4 14:28:35 localhost kernel: [8115dc7b] ? 
free_extent_map+0x64/0x76
Mar  4 14:28:35 localhost kernel: [8115bbd3] ? 
btrfs_drop_extent_cache+0x363/0x39f
Mar  4 14:28:35 localhost kernel: [81152d2d] ? 
__cow_file_range+0x175/0x3c1
Mar  4 14:28:36 localhost kernel: [81487830] ? 
_raw_spin_unlock+0x1c/0x28
Mar  4 14:28:36 localhost kernel: [81160de3] ? 
release_extent_buffer.isra.25+0x90/0x97
Mar  4 14:28:36 localhost kernel: [81153673] ? 
run_delalloc_nocow+0x6fa/0x795
Mar  4 14:28:36 localhost kernel: [81153837] ? 
run_delalloc_range+0x64/0x33b
Mar  4 14:28:36 localhost kernel: [81161317] ? 
free_extent_state+0x12/0x21
Mar  4 14:28:36 localhost kernel: [81163f1f] ? 
__extent_writepage+0x1a8/0x5d8
Mar  4 14:28:36 localhost kernel: [8116352a] ? 
end_extent_writepage+0x5d/0x5d
Mar  4 14:28:36 localhost kernel: [811d4b69] ? 
cpumask_any_but+0x25/0x34
Mar  4 14:28:36 localhost kernel: [810a5259] ? 
vma_interval_tree_subtree_search+0x33/0x55
Mar  4 14:28:36 localhost kernel: [810b07b8] ? 
page_mkclean+0x107/0x119
Mar  4 14:28:36 localhost kernel: [81164499] ? 
extent_write_cache_pages.isra.29.constprop.47+0x14a/0x255
Mar  4 14:28:36 

Re: [PATCH 2/2] btrfs: clean up transaction abort messages

2013-03-11 Thread David Sterba
On Mon, Mar 11, 2013 at 12:02:09PM -0700, Zach Brown wrote:
 No worries if you're not keen to fix it up, but it'd be nice.  One less
 wart to be distracted by when stumbling through the code.

I'll gladly update the code, thanks for the hints and comments.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] btrfs-progs: three new device/path helpers

2013-03-11 Thread Eric Sandeen
Add 3 new helpers:

* is_block_device(), to test if a path is a block device.
* get_btrfs_mount(), to get the mountpoint of a device,
  if mounted.
* open_path_or_dev_mnt(path), to open either the pathname
  or, if it's a mounted btrfs dev, the mountpoint.  Useful
  for some commands which can take either type of arg.

Signed-off-by: Eric Sandeen sand...@redhat.com
---
 utils.c |   84 +++
 utils.h |3 ++
 2 files changed, 87 insertions(+), 0 deletions(-)

diff --git a/utils.c b/utils.c
index 1c73d67..4bf457f 100644
--- a/utils.c
+++ b/utils.c
@@ -640,6 +640,90 @@ error:
return ret;
 }
 
+/*
+ * checks if a path is a block device node
+ * Returns negative errno on failure, otherwise
+ * returns 1 for blockdev, 0 for not-blockdev
+ */
+int is_block_device (const char *path) {
+   struct stat statbuf;
+
+   if (stat(path, statbuf)  0)
+   return -errno;
+
+   return (S_ISBLK(statbuf.st_mode));
+}
+
+/*
+ * Find the mount point for a mounted device.
+ * On success, returns 0 with mountpoint in *mp.
+ * On failure, returns -errno (not mounted yields -EINVAL)
+ * Is noisy on failures, expects to be given a mounted device.
+ */
+int get_btrfs_mount(const char *dev, char *mp, size_t mp_size) {
+   int ret;
+   int fd = -1;
+
+   ret = is_block_device(dev);
+   if (ret = 0) {
+   if (!ret) {
+   fprintf(stderr, %s is not a block device\n, dev);
+   ret = -EINVAL;
+   } else
+   fprintf(stderr, Could not check %s: %s\n,
+   dev, strerror(-ret));
+   goto out;
+   }
+
+   fd = open(dev, O_RDONLY);
+   if (fd  0) {
+   ret = -errno;
+   fprintf(stderr, Could not open %s: %s\n, dev, 
strerror(errno));
+   goto out;
+   }
+
+   ret = check_mounted_where(fd, dev, mp, mp_size, NULL);
+   if (!ret) {
+   fprintf(stderr, %s is not a mounted btrfs device\n, dev);
+   ret = -EINVAL;
+   } else /* mounted, all good */
+   ret = 0;
+out:
+   if (fd != -1)
+   close(fd);
+   if (ret)
+   fprintf(stderr, Could not get mountpoint for %s\n, dev);
+   return ret;
+}
+
+/*
+ * Given a pathname, return a filehandle to:
+ * the original pathname or,
+ * if the pathname is a mounted btrfs device, to its mountpoint.
+ *
+ * On error, return -1, errno should be set.
+ */
+int open_path_or_dev_mnt(const char *path)
+{
+   char mp[BTRFS_PATH_NAME_MAX + 1];
+   int fdmnt;
+
+   if (is_block_device(path)) {
+   int ret;
+
+   ret = get_btrfs_mount(path, mp, sizeof(mp));
+   if (ret  0) {
+   /* not a mounted btrfs dev */
+   errno = EINVAL;
+   return -1;
+   }
+   fdmnt = open(mp, O_RDWR);
+   } else
+   fdmnt = open_file_or_dir(path);
+
+   return fdmnt;
+}
+
 /* checks if a device is a loop device */
 int is_loop_device (const char* device) {
struct stat statbuf;
diff --git a/utils.h b/utils.h
index 0b681ed..8e0252b 100644
--- a/utils.h
+++ b/utils.h
@@ -56,6 +56,9 @@ int get_label(const char *btrfs_dev);
 int set_label(const char *btrfs_dev, const char *label);
 
 char *__strncpy__null(char *dest, const char *src, size_t n);
+int is_block_device(const char *file);
+int get_btrfs_mount(const char *path, char *mp, size_t mp_size);
+int open_path_or_dev_mnt(const char *path);
 int is_swap_device(const char *file);
 /* Helper to always get proper size of the destination string */
 #define strncpy_null(dest, src) __strncpy__null(dest, src, sizeof(dest))
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] btrfs-progs: rework get_fs_info to remove side effects

2013-03-11 Thread Eric Sandeen
get_fs_info() has been silently switching from a device to a mounted
path as needed; the caller's filehandle was unexpectedly closed 
reopened outside the caller's scope.  Not so great.

The callers do want fdmnt to be the filehandle for the mount point
in all cases, though - the various ioctls act on this (not on an fd
for the device).  But switching it in the local scope of get_fs_info
is incorrect; it just so happens that *usually* the fd number is
unchanged.

So - use the new helpers to detect when an argument is a block
device, and open the the mounted path more obviously / explicitly
for ioctl use, storing the filehandle in fdmnt.

Then, in get_fs_info, ignore the fd completely, and use the path on
the argument to determine if the caller wanted to act on just that
device, or on all devices for the filesystem.

Affects those commands which are documented to accept either
a block device or a path:

* btrfs device stats
* btrfs replace start
* btrfs scrub start
* btrfs scrub status

Signed-off-by: Eric Sandeen sand...@redhat.com
---
 cmds-device.c  |5 ++-
 cmds-replace.c |6 +++-
 cmds-scrub.c   |   10 ---
 utils.c|   73 +++
 utils.h|2 +-
 5 files changed, 66 insertions(+), 30 deletions(-)

diff --git a/cmds-device.c b/cmds-device.c
index 58df6da..41e79d3 100644
--- a/cmds-device.c
+++ b/cmds-device.c
@@ -321,13 +321,14 @@ static int cmd_dev_stats(int argc, char **argv)
 
path = argv[optind];
 
-   fdmnt = open_file_or_dir(path);
+   fdmnt = open_path_or_dev_mnt(path);
+
if (fdmnt  0) {
fprintf(stderr, ERROR: can't access '%s'\n, path);
return 12;
}
 
-   ret = get_fs_info(fdmnt, path, fi_args, di_args);
+   ret = get_fs_info(path, fi_args, di_args);
if (ret) {
fprintf(stderr, ERROR: getting dev info for devstats failed: 
%s\n, strerror(-ret));
diff --git a/cmds-replace.c b/cmds-replace.c
index 10030f6..6397bb5 100644
--- a/cmds-replace.c
+++ b/cmds-replace.c
@@ -168,7 +168,9 @@ static int cmd_start_replace(int argc, char **argv)
if (check_argc_exact(argc - optind, 3))
usage(cmd_start_replace_usage);
path = argv[optind + 2];
-   fdmnt = open_file_or_dir(path);
+
+   fdmnt = open_path_or_dev_mnt(path);
+
if (fdmnt  0) {
fprintf(stderr, ERROR: can't access \%s\: %s\n,
path, strerror(errno));
@@ -215,7 +217,7 @@ static int cmd_start_replace(int argc, char **argv)
}
start_args.start.srcdevid = (__u64)atoi(srcdev);
 
-   ret = get_fs_info(fdmnt, path, fi_args, di_args);
+   ret = get_fs_info(path, fi_args, di_args);
if (ret) {
fprintf(stderr, ERROR: getting dev info for devstats 
failed: 
%s\n, strerror(-ret));
diff --git a/cmds-scrub.c b/cmds-scrub.c
index e5fccc7..52264f1 100644
--- a/cmds-scrub.c
+++ b/cmds-scrub.c
@@ -1101,13 +1101,14 @@ static int scrub_start(int argc, char **argv, int 
resume)
 
path = argv[optind];
 
-   fdmnt = open_file_or_dir(path);
+   fdmnt = open_path_or_dev_mnt(path);
+
if (fdmnt  0) {
ERR(!do_quiet, ERROR: can't access '%s'\n, path);
return 12;
}
 
-   ret = get_fs_info(fdmnt, path, fi_args, di_args);
+   ret = get_fs_info(path, fi_args, di_args);
if (ret) {
ERR(!do_quiet, ERROR: getting dev info for scrub failed: 
%s\n, strerror(-ret));
@@ -1558,13 +1559,14 @@ static int cmd_scrub_status(int argc, char **argv)
 
path = argv[optind];
 
-   fdmnt = open_file_or_dir(path);
+   fdmnt = open_path_or_dev_mnt(path);
+
if (fdmnt  0) {
fprintf(stderr, ERROR: can't access to '%s'\n, path);
return 12;
}
 
-   ret = get_fs_info(fdmnt, path, fi_args, di_args);
+   ret = get_fs_info(path, fi_args, di_args);
if (ret) {
fprintf(stderr, ERROR: getting dev info for scrub failed: 
%s\n, strerror(-ret));
diff --git a/utils.c b/utils.c
index 4bf457f..27cec56 100644
--- a/utils.c
+++ b/utils.c
@@ -717,7 +717,7 @@ int open_path_or_dev_mnt(const char *path)
errno = EINVAL;
return -1;
}
-   fdmnt = open(mp, O_RDWR);
+   fdmnt = open_file_or_dir(mp);
} else
fdmnt = open_file_or_dir(path);
 
@@ -1544,9 +1544,20 @@ int get_device_info(int fd, u64 devid,
return ret ? -errno : 0;
 }
 
-int get_fs_info(int fd, char *path, struct btrfs_ioctl_fs_info_args *fi_args,
+/*
+ * For a given path, fill in the ioctl fs_ and info_ args.
+ * If the path is a btrfs mountpoint, fill info for all devices.
+ * If the path is a btrfs 

[PATCH 1/4] btrfs-progs: close fd on return from label get/set functions

2013-03-11 Thread Eric Sandeen
Somehow missed these 2 in the last round.

Signed-off-by: Eric Sandeen sand...@redhat.com
---
 utils.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/utils.c b/utils.c
index f68436d..1c73d67 100644
--- a/utils.c
+++ b/utils.c
@@ -1217,6 +1217,7 @@ static int set_label_mounted(const char *mount_path, 
const char *label)
return -1;
}
 
+   close(fd);
return 0;
 }
 
@@ -1274,6 +1275,7 @@ static int get_label_mounted(const char *mount_path)
}
 
fprintf(stdout, %s\n, label);
+   close(fd);
return 0;
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] smalle cleanup + get_fs_info rework

2013-03-11 Thread Eric Sandeen
The first patch is a trival close of fd on function returns, somehow
missed that last go-round.

The next 3 are a little more substantial, working to avoid the nasty
behavior of get_fs_info, closing  re-opening the callers' filehandle
out of scope, if it needs to switch from device node to mountpoint.

(I suppose we could pass in *fd by reference, but this behavior just
seems like a wrong, magical side effect for get_fs_info).

So instead, the callers use a helper to *always* wind up with the
mountpoint opened, and get_fs_info() now *only* - well - only gets
fs info.  The previous behavior of if given a device act only on
that device; if given a mountpoint act on all devices should persist;
I guess that's the original intent.

It's really only lightly tested; it should mostly affect:

* btrfs device stats
* btrfs replace start
* btrfs scrub start
* btrfs scrub status

so any independent sanity testing of that would be great.  It'l be
nice if/when we get xfstests coverage of some of this to make it easier. :)

Thanks,
-Eric

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] btrfs-progs: don't open-code mountpoint discovery in scrub cancel

2013-03-11 Thread Eric Sandeen
cmd_scrub_cancel had its own mountpoint discovery routine;
just use open_path_or_dev_mnt() for that now.

Signed-off-by: Eric Sandeen sand...@redhat.com
---
 cmds-scrub.c |   53 +
 1 files changed, 17 insertions(+), 36 deletions(-)

diff --git a/cmds-scrub.c b/cmds-scrub.c
index b0fcde6..e5fccc7 100644
--- a/cmds-scrub.c
+++ b/cmds-scrub.c
@@ -1459,56 +1459,37 @@ static int cmd_scrub_cancel(int argc, char **argv)
 {
char *path;
int ret;
-   int fdmnt;
-   int err;
-   char mp[BTRFS_PATH_NAME_MAX + 1];
-   struct btrfs_fs_devices *fs_devices_mnt = NULL;
+   int fdmnt = -1;
 
if (check_argc_exact(argc, 2))
usage(cmd_scrub_cancel_usage);
 
path = argv[1];
 
-again:
-   fdmnt = open_file_or_dir(path);
-   if (fdmnt  0) {
-   perror(ERROR: scrub cancel failed:);
-   return 1;
-   }
+   fdmnt = open_path_or_dev_mnt(path);
+   if (fdmnt  0) {
+   fprintf(stderr, ERROR: could not open %s: %s\n,
+   path, strerror(errno));
+   ret = 1;
+   goto out;
+   }
 
ret = ioctl(fdmnt, BTRFS_IOC_SCRUB_CANCEL, NULL);
-   err = errno;
-
-   if (ret  err == EINVAL) {
-   /* path is not a btrfs mount point.  See if it's a device. */
-   ret = check_mounted_where(fdmnt, path, mp, sizeof(mp),
- fs_devices_mnt);
-   if (ret  0) {
-   /* It's a mounted btrfs device; retry w/ mountpoint. */
-   close(fdmnt);
-   path = mp;
-   goto again;
-   } else {
-   /* It's not a mounted btrfs device either */
-   fprintf(stderr,
-   ERROR: %s is not a mounted btrfs device\n,
-   path);
-   ret = 1;
-   err = EINVAL;
-   }
-   }
 
-   close(fdmnt);
-
-   if (ret) {
+   if (ret  0) {
fprintf(stderr, ERROR: scrub cancel failed on %s: %s\n, path,
-   err == ENOTCONN ? not running : strerror(err));
-   return 1;
+   errno == ENOTCONN ? not running : strerror(errno));
+   ret = 1;
+   goto out;
}
 
+   ret = 0;
printf(scrub cancelled\n);
 
-   return 0;
+out:
+   if (fdmnt != -1)
+   close(fdmnt);
+   return ret;
 }
 
 static const char * const cmd_scrub_resume_usage[] = {
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] btrfs-progs: rework get_fs_info to remove side effects

2013-03-11 Thread Eric Sandeen
On 3/11/13 6:13 PM, Eric Sandeen wrote:
 get_fs_info() has been silently switching from a device to a mounted
 path as needed; the caller's filehandle was unexpectedly closed 
 reopened outside the caller's scope.  Not so great.
 
 The callers do want fdmnt to be the filehandle for the mount point
 in all cases, though - the various ioctls act on this (not on an fd
 for the device).  But switching it in the local scope of get_fs_info
 is incorrect; it just so happens that *usually* the fd number is
 unchanged.
 
 So - use the new helpers to detect when an argument is a block
 device, and open the the mounted path more obviously / explicitly
 for ioctl use, storing the filehandle in fdmnt.
 
 Then, in get_fs_info, ignore the fd completely, and use the path on
 the argument to determine if the caller wanted to act on just that
 device, or on all devices for the filesystem.
 
 Affects those commands which are documented to accept either
 a block device or a path:

Following my tradition I'll (immediately) self-nak this one for now.

After I sent this I thought to test:

# mkfs.btrfs /dev/sdb1 /dev/sdb2; mount /dev/sdb1 /mnt/test; btrfs stats 
/dev/sdb2

after I tested it, and that fails where it used to work.  So

a) we could use a test for this, and 
b) I broke something

If the overall idea of the change seems decent, I'll get it fixed up after I 
sort
out what I broke.  :/

-Eric

 * btrfs device stats
 * btrfs replace start
 * btrfs scrub start
 * btrfs scrub status
 
 Signed-off-by: Eric Sandeen sand...@redhat.com
 ---
  cmds-device.c  |5 ++-
  cmds-replace.c |6 +++-
  cmds-scrub.c   |   10 ---
  utils.c|   73 +++
  utils.h|2 +-
  5 files changed, 66 insertions(+), 30 deletions(-)
 
 diff --git a/cmds-device.c b/cmds-device.c
 index 58df6da..41e79d3 100644
 --- a/cmds-device.c
 +++ b/cmds-device.c
 @@ -321,13 +321,14 @@ static int cmd_dev_stats(int argc, char **argv)
  
   path = argv[optind];
  
 - fdmnt = open_file_or_dir(path);
 + fdmnt = open_path_or_dev_mnt(path);
 +
   if (fdmnt  0) {
   fprintf(stderr, ERROR: can't access '%s'\n, path);
   return 12;
   }
  
 - ret = get_fs_info(fdmnt, path, fi_args, di_args);
 + ret = get_fs_info(path, fi_args, di_args);
   if (ret) {
   fprintf(stderr, ERROR: getting dev info for devstats failed: 
   %s\n, strerror(-ret));
 diff --git a/cmds-replace.c b/cmds-replace.c
 index 10030f6..6397bb5 100644
 --- a/cmds-replace.c
 +++ b/cmds-replace.c
 @@ -168,7 +168,9 @@ static int cmd_start_replace(int argc, char **argv)
   if (check_argc_exact(argc - optind, 3))
   usage(cmd_start_replace_usage);
   path = argv[optind + 2];
 - fdmnt = open_file_or_dir(path);
 +
 + fdmnt = open_path_or_dev_mnt(path);
 +
   if (fdmnt  0) {
   fprintf(stderr, ERROR: can't access \%s\: %s\n,
   path, strerror(errno));
 @@ -215,7 +217,7 @@ static int cmd_start_replace(int argc, char **argv)
   }
   start_args.start.srcdevid = (__u64)atoi(srcdev);
  
 - ret = get_fs_info(fdmnt, path, fi_args, di_args);
 + ret = get_fs_info(path, fi_args, di_args);
   if (ret) {
   fprintf(stderr, ERROR: getting dev info for devstats 
 failed: 
   %s\n, strerror(-ret));
 diff --git a/cmds-scrub.c b/cmds-scrub.c
 index e5fccc7..52264f1 100644
 --- a/cmds-scrub.c
 +++ b/cmds-scrub.c
 @@ -1101,13 +1101,14 @@ static int scrub_start(int argc, char **argv, int 
 resume)
  
   path = argv[optind];
  
 - fdmnt = open_file_or_dir(path);
 + fdmnt = open_path_or_dev_mnt(path);
 +
   if (fdmnt  0) {
   ERR(!do_quiet, ERROR: can't access '%s'\n, path);
   return 12;
   }
  
 - ret = get_fs_info(fdmnt, path, fi_args, di_args);
 + ret = get_fs_info(path, fi_args, di_args);
   if (ret) {
   ERR(!do_quiet, ERROR: getting dev info for scrub failed: 
   %s\n, strerror(-ret));
 @@ -1558,13 +1559,14 @@ static int cmd_scrub_status(int argc, char **argv)
  
   path = argv[optind];
  
 - fdmnt = open_file_or_dir(path);
 + fdmnt = open_path_or_dev_mnt(path);
 +
   if (fdmnt  0) {
   fprintf(stderr, ERROR: can't access to '%s'\n, path);
   return 12;
   }
  
 - ret = get_fs_info(fdmnt, path, fi_args, di_args);
 + ret = get_fs_info(path, fi_args, di_args);
   if (ret) {
   fprintf(stderr, ERROR: getting dev info for scrub failed: 
   %s\n, strerror(-ret));
 diff --git a/utils.c b/utils.c
 index 4bf457f..27cec56 100644
 --- a/utils.c
 +++ b/utils.c
 @@ -717,7 +717,7 @@ int open_path_or_dev_mnt(const char *path)
   errno = EINVAL;
   return -1;
   }
 - 

Unable to boot btrfs filesystem, and btrfsck aborts

2013-03-11 Thread Matthew Booth
My laptop crashed hard earlier today. It reset immediately to a black
screen followed by the BIOS. I have no idea why.

However, it now fails to boot. I took a picture of the kernel panic
that results from trying to mount the root filesystem:
https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi

To make things worse, btrfsck aborts with a double free, without
fixing it. I took a picture of that, too:
https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT

As the kernel panic mentions btrfs_remove_free_space, I also tried
mounting with clear_cache. Unfortunately it didn't dislodge anything.

This is on a fully updated Fedora 18 system. I would really like to
get this data back. If anybody could offer a suggestion I'd be very
grateful.

Thanks,

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to boot btrfs filesystem, and btrfsck aborts

2013-03-11 Thread Harald Glatt
On Mon, Mar 11, 2013 at 11:44 PM, Matthew Booth matt...@heisenbug.com wrote:
 My laptop crashed hard earlier today. It reset immediately to a black
 screen followed by the BIOS. I have no idea why.

 However, it now fails to boot. I took a picture of the kernel panic
 that results from trying to mount the root filesystem:
 https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi

 To make things worse, btrfsck aborts with a double free, without
 fixing it. I took a picture of that, too:
 https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT

 As the kernel panic mentions btrfs_remove_free_space, I also tried
 mounting with clear_cache. Unfortunately it didn't dislodge anything.

 This is on a fully updated Fedora 18 system. I would really like to
 get this data back. If anybody could offer a suggestion I'd be very
 grateful.

 Thanks,

 Matt
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

If you can make a complete image backup of the drive before trying any
things to bring it back.
Try mounting with -o nospace_cache, also try -o ro and -o recovery as
well as -o recovery,ro.

If you can bringt it back in ro mode you can at least copy your data
out of it if all else fails...

I'm not a dev, just a random guy having an interest in btrfs, so if
you don't have a backup and aren't able to create a dd copy of it
right now you might wanna wait for a reply of someone who actually
knows the code...

Good luck
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to boot btrfs filesystem, and btrfsck aborts

2013-03-11 Thread Jan Steffens
On Mon, Mar 11, 2013 at 11:49 PM, Harald Glatt m...@hachre.de wrote:
 On Mon, Mar 11, 2013 at 11:44 PM, Matthew Booth matt...@heisenbug.com wrote:
 My laptop crashed hard earlier today. It reset immediately to a black
 screen followed by the BIOS. I have no idea why.

 However, it now fails to boot. I took a picture of the kernel panic
 that results from trying to mount the root filesystem:
 https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi

 To make things worse, btrfsck aborts with a double free, without
 fixing it. I took a picture of that, too:
 https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT

 As the kernel panic mentions btrfs_remove_free_space, I also tried
 mounting with clear_cache. Unfortunately it didn't dislodge anything.

 This is on a fully updated Fedora 18 system. I would really like to
 get this data back. If anybody could offer a suggestion I'd be very
 grateful.
 If you can make a complete image backup of the drive before trying any
 things to bring it back.
 Try mounting with -o nospace_cache, also try -o ro and -o recovery as
 well as -o recovery,ro.

I think the bug happens during log recovery, so btrfs-zero-log might
get it mountable again, with the caveat of losing the most recently
fsynced changes.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to boot btrfs filesystem, and btrfsck aborts

2013-03-11 Thread Harald Glatt
If you are going to use btrfs-zero-log please create a btrfs-image
first that you can then upload to a bug report so that this can be
fixed.

# btrfs-image -c 9 -t 8 /dev/yourbtrfs /tmp/fs_image

On Mon, Mar 11, 2013 at 11:53 PM, Jan Steffens jan.steff...@gmail.com wrote:
 On Mon, Mar 11, 2013 at 11:49 PM, Harald Glatt m...@hachre.de wrote:
 On Mon, Mar 11, 2013 at 11:44 PM, Matthew Booth matt...@heisenbug.com 
 wrote:
 My laptop crashed hard earlier today. It reset immediately to a black
 screen followed by the BIOS. I have no idea why.

 However, it now fails to boot. I took a picture of the kernel panic
 that results from trying to mount the root filesystem:
 https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi

 To make things worse, btrfsck aborts with a double free, without
 fixing it. I took a picture of that, too:
 https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT

 As the kernel panic mentions btrfs_remove_free_space, I also tried
 mounting with clear_cache. Unfortunately it didn't dislodge anything.

 This is on a fully updated Fedora 18 system. I would really like to
 get this data back. If anybody could offer a suggestion I'd be very
 grateful.
 If you can make a complete image backup of the drive before trying any
 things to bring it back.
 Try mounting with -o nospace_cache, also try -o ro and -o recovery as
 well as -o recovery,ro.

 I think the bug happens during log recovery, so btrfs-zero-log might
 get it mountable again, with the caveat of losing the most recently
 fsynced changes.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL] Re: Integration branch of btrfs-progs 2013-02-27

2013-03-11 Thread David Sterba
Hi Chris,

please pull this integration branch

git://repo.or.cz/btrfs-progs-unstable/devel.git integration-20130227

so far no problems reported (which may also mean that nobody is using
it), worked in my test setups and I've tested the label get/set patches
specifically.

thanks,
david

On Wed, Feb 27, 2013 at 06:14:45PM +0100, David Sterba wrote:
 Anand Jain (1):
   Btrfs-progs: add correct indentation
 
 David Sterba (1):
   btrfs-progs: don't link binaries to a dynamic library
 
 Eric Sandeen (16):
   btrfs-progs: fix btrfs_get_subvol cut/paste error
   btrfs-progs: Remove write-only var fdres in cmd_dev_stats()
   btrfs-progs: btrfs_list_get_path_rootid error handling
   btrfs-progs: avoid double-free in __btrfs_map_block
   btrfs-progs: fix open error test in cmd_start_replace
   btrfs-progs: fix close of error fd in scrub cancel
   btrfs-progs: more scrub cancel error handling
   btrfs-progs: free memory before error exit in read_whole_eb
   btrfs-progs: don't call close on error fd
   btrfs-progs: provide positive errno to strerror in cmd_restore
   btrfs-progs: free allocated di_args in cmd_start_replace
   btrfs-progs: close fd on cmd_subvol_get_default return
   btrfs-progs: fix mem leak in resolve_root
   btrfs-progs: Tidy up resolve_root
   btrfs-progs: fix fd leak in cmd_subvol_set_default
   btrfs-progs: initialize save_ptr prior to strtok_r
 
 Jeff Liu (5):
   Btrfs-progs: Change the label of a mounted file system
   Btrfs-progs: Fix set_label_unmounted() with label length validation
   Btrfs-progs: fix cmd_label_usage to reflect this change.
   btrfs-progs: refactor check_label()
   btrfs-progs: move btrfslabel.[c|h] stuff to utils.[c|h]
 
 Mark Fasheh (1):
   btrfs-progs: libify some parts of btrfs-progs
 
 Tsutomu Itoh (1):
   Btrfs-progs: fix segmentation fault of btrfs check
 
 Wang Shilong (2):
   Btrfs-progs: let the error message outputed only once
   Btrfs-progs: output the error reason when qgroup_show fails
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to boot btrfs filesystem, and btrfsck aborts

2013-03-11 Thread Josef Bacik
On Mon, Mar 11, 2013 at 04:44:58PM -0600, Matthew Booth wrote:
 My laptop crashed hard earlier today. It reset immediately to a black
 screen followed by the BIOS. I have no idea why.
 
 However, it now fails to boot. I took a picture of the kernel panic
 that results from trying to mount the root filesystem:
 https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi
 
 To make things worse, btrfsck aborts with a double free, without
 fixing it. I took a picture of that, too:
 https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT
 
 As the kernel panic mentions btrfs_remove_free_space, I also tried
 mounting with clear_cache. Unfortunately it didn't dislodge anything.
 
 This is on a fully updated Fedora 18 system. I would really like to
 get this data back. If anybody could offer a suggestion I'd be very
 grateful.


This is fixed in 3.9, I'll send those patches back to -stable, sorry I should
have done that before now.  If you can't get a 3.9 kernel to boot then just use
btrfs-zero-log and you'll be good to go.  Thanks,

Josef 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at fs/btrfs/extent_map.c:77 free_extent_map

2013-03-11 Thread Liu Bo
On Mon, Mar 11, 2013 at 10:34:04PM +0100, Johannes Hirte wrote:
 Since the updates for linux-3.9 I've had three or four times a system
 freeze and only a reset (Magic SysRq) helped. After the reboot I found
 a bunch of this in syslog:
 
 Mar 11 21:56:09 localhost kernel: [ cut here ]
 Mar 11 21:56:09 localhost kernel: WARNING: at fs/btrfs/extent_map.c:77 
 free_extent_map+0x64/0x76()
 Mar 11 21:56:09 localhost kernel: Hardware name: EasyNote TK81
 Mar 11 21:56:09 localhost kernel: Modules linked in: nfsv4 nfsd exportfs 
 auth_rpcgss nfs_acl fuse nfs lockd sunrpc snd_hda_codec_hdmi 
 snd_hda_codec_realtek snd_hda_intel ath9k snd_hda_codec ath9k_common ath9k_hw 
 acer_wmi snd_hwdep snd_pcm ath sr_mod wmi broadcom snd_page_alloc snd_timer 
 cdrom tg3 k10temp snd acpi_cpufreq ohci_hcd soundcore i2c_piix4 mperf
 Mar 11 21:56:09 localhost kernel: Pid: 11260, comm: bogofilter Tainted: G 
W3.9.0-rc2 #293
 Mar 11 21:56:09 localhost kernel: Call Trace:
 Mar 11 21:56:09 localhost kernel: [8102abc2] ? 
 warn_slowpath_common+0x76/0x8c
 Mar 11 21:56:09 localhost kernel: [8115dcff] ? 
 free_extent_map+0x64/0x76
 Mar 11 21:56:09 localhost kernel: [8115bc57] ? 
 btrfs_drop_extent_cache+0x363/0x39f
 Mar 11 21:56:09 localhost kernel: [81152db4] ? 
 __cow_file_range+0x175/0x3c1
 Mar 11 21:56:09 localhost kernel: [8114bb02] ? 
 join_transaction.isra.34+0x30f/0x31a
 Mar 11 21:56:09 localhost kernel: [8114d9f7] ? 
 start_transaction+0x2d8/0x3e8
 Mar 11 21:56:09 localhost kernel: [8115383e] ? 
 cow_file_range+0xa9/0xc5
 Mar 11 21:56:09 localhost kernel: [811538f7] ? 
 run_delalloc_range+0x9d/0x33b
 Mar 11 21:56:09 localhost kernel: [8116139b] ? 
 free_extent_state+0x12/0x21
 Mar 11 21:56:09 localhost kernel: [81163fa3] ? 
 __extent_writepage+0x1a8/0x5d8
 Mar 11 21:56:09 localhost kernel: [811635ae] ? 
 end_extent_writepage+0x5d/0x5d
 Mar 11 21:56:09 localhost kernel: [8116451d] ? 
 extent_write_cache_pages.isra.29.constprop.47+0x14a/0x255
 Mar 11 21:56:09 localhost kernel: [81164836] ? 
 extent_writepages+0x49/0x60
 Mar 11 21:56:09 localhost kernel: [81150146] ? 
 btrfs_update_inode_item+0xde/0xde
 Mar 11 21:56:09 localhost kernel: [8108fc58] ? 
 __filemap_fdatawrite_range+0x4d/0x52
 Mar 11 21:56:09 localhost kernel: [8115a192] ? 
 btrfs_sync_file+0x48/0x203
 Mar 11 21:56:09 localhost kernel: [810c85ff] ? vfs_write+0xaf/0xf8
 Mar 11 21:56:09 localhost kernel: [810e783b] ? do_fsync+0x2b/0x50
 Mar 11 21:56:09 localhost kernel: [810e7a42] ? sys_fdatasync+0xb/0xf
 Mar 11 21:56:09 localhost kernel: [814877d2] ? 
 system_call_fastpath+0x16/0x1b
 Mar 11 21:56:09 localhost kernel: ---[ end trace 3eaea449d8d56f92 ]---
 
 As far as I remeber, it happend when fetching emails with claws. But
 it's not a reliable testcase. 

Hi Johannes,

Could you please tell us what mount options you're with?

thanks,
liubo

 
 Another trace from the first time I found it in the logs. But here the
 system didn't hang:
 
 Mar  4 14:28:35 localhost kernel: [ cut here ]
 Mar  4 14:28:35 localhost kernel: WARNING: at fs/btrfs/extent_map.c:77 
 free_extent_map+0x64/0x76()
 Mar  4 14:28:35 localhost kernel: Hardware name: EasyNote TK81
 Mar  4 14:28:35 localhost kernel: Modules linked in: nfsd exportfs 
 auth_rpcgss nfs_acl fuse nfs lockd sunrpc snd_hda_codec_hdmi 
 snd_hda_codec_realtek snd_hda_intel ath9k snd_hda_codec ath9k_common 
 snd_hwdep snd_pcm broadcom ath9k_hw snd_page_alloc ath sr_mod snd_timer 
 acer_wmi snd cdrom wmi tg3 ohci_hcd soundcore k10temp edac_core acpi_cpufreq 
 i2c_piix4 mperf
 Mar  4 14:28:35 localhost kernel: Pid: 1574, comm: flush-btrfs-1 Not tainted 
 3.9.0-rc1 #289
 Mar  4 14:28:35 localhost kernel: Call Trace:
 Mar  4 14:28:35 localhost kernel: [8102ab92] ? 
 warn_slowpath_common+0x76/0x8c
 Mar  4 14:28:35 localhost kernel: [8115dc7b] ? 
 free_extent_map+0x64/0x76
 Mar  4 14:28:35 localhost kernel: [8115bbd3] ? 
 btrfs_drop_extent_cache+0x363/0x39f
 Mar  4 14:28:35 localhost kernel: [81152d2d] ? 
 __cow_file_range+0x175/0x3c1
 Mar  4 14:28:36 localhost kernel: [81487830] ? 
 _raw_spin_unlock+0x1c/0x28
 Mar  4 14:28:36 localhost kernel: [81160de3] ? 
 release_extent_buffer.isra.25+0x90/0x97
 Mar  4 14:28:36 localhost kernel: [81153673] ? 
 run_delalloc_nocow+0x6fa/0x795
 Mar  4 14:28:36 localhost kernel: [81153837] ? 
 run_delalloc_range+0x64/0x33b
 Mar  4 14:28:36 localhost kernel: [81161317] ? 
 free_extent_state+0x12/0x21
 Mar  4 14:28:36 localhost kernel: [81163f1f] ? 
 __extent_writepage+0x1a8/0x5d8
 Mar  4 14:28:36 localhost kernel: [8116352a] ? 
 end_extent_writepage+0x5d/0x5d
 Mar  4 14:28:36 localhost kernel: [811d4b69] ? 
 cpumask_any_but+0x25/0x34
 Mar  4 14:28:36 localhost kernel: [810a5259] ? 
 

Re: [PATCH] Btrfs: get better concurrency for snapshot-aware defrag work

2013-03-11 Thread Liu Bo
On Mon, Mar 11, 2013 at 06:26:40PM +0100, David Sterba wrote:
 On Mon, Mar 11, 2013 at 05:20:58PM +0800, Liu Bo wrote:
  Using spinning case instead of blocking will result in better concurrency
  overall.
 
 Do you have numbers to support that?
 

Sorry, I don't, just judging from what leave_spinning is desiged for and the
similar usecases, like insert_reserved_file_extents(), which is also involved in
endio write worker.

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4 V2] btrfs-progs: rework get_fs_info to remove side effects

2013-03-11 Thread Eric Sandeen
get_fs_info() has been silently switching from a device to a mounted
path as needed; the caller's filehandle was unexpectedly closed 
reopened outside the caller's scope.  Not so great.

The callers do want fdmnt to be the filehandle for the mount point
in all cases, though - the various ioctls act on this (not on an fd
for the device).  But switching it in the local scope of get_fs_info
is incorrect; it just so happens that *usually* the fd number is
unchanged.

So - use the new helpers to detect when an argument is a block
device, and open the the mounted path more obviously / explicitly
for ioctl use, storing the filehandle in fdmnt.

Then, in get_fs_info, ignore the fd completely, and use the path on
the argument to determine if the caller wanted to act on just that
device, or on all devices for the filesystem.

Affects those commands which are documented to accept either
a block device or a path:

* btrfs device stats
* btrfs replace start
* btrfs scrub start
* btrfs scrub status

Signed-off-by: Eric Sandeen sand...@redhat.com
---

V2: don't call BTRFS_IOC_FS_INFO in the single device case
after we change path/fd to be for the fs mount point.

In the single device case we manually filled in fi_args;
calling this ioctl after switching fd/path to the mount point
overwrites that setup.

diff --git a/cmds-device.c b/cmds-device.c
index 58df6da..41e79d3 100644
--- a/cmds-device.c
+++ b/cmds-device.c
@@ -321,13 +321,14 @@ static int cmd_dev_stats(int argc, char **argv)
 
path = argv[optind];
 
-   fdmnt = open_file_or_dir(path);
+   fdmnt = open_path_or_dev_mnt(path);
+
if (fdmnt  0) {
fprintf(stderr, ERROR: can't access '%s'\n, path);
return 12;
}
 
-   ret = get_fs_info(fdmnt, path, fi_args, di_args);
+   ret = get_fs_info(path, fi_args, di_args);
if (ret) {
fprintf(stderr, ERROR: getting dev info for devstats failed: 
%s\n, strerror(-ret));
diff --git a/cmds-replace.c b/cmds-replace.c
index 10030f6..6397bb5 100644
--- a/cmds-replace.c
+++ b/cmds-replace.c
@@ -168,7 +168,9 @@ static int cmd_start_replace(int argc, char **argv)
if (check_argc_exact(argc - optind, 3))
usage(cmd_start_replace_usage);
path = argv[optind + 2];
-   fdmnt = open_file_or_dir(path);
+
+   fdmnt = open_path_or_dev_mnt(path);
+
if (fdmnt  0) {
fprintf(stderr, ERROR: can't access \%s\: %s\n,
path, strerror(errno));
@@ -215,7 +217,7 @@ static int cmd_start_replace(int argc, char **argv)
}
start_args.start.srcdevid = (__u64)atoi(srcdev);
 
-   ret = get_fs_info(fdmnt, path, fi_args, di_args);
+   ret = get_fs_info(path, fi_args, di_args);
if (ret) {
fprintf(stderr, ERROR: getting dev info for devstats 
failed: 
%s\n, strerror(-ret));
diff --git a/cmds-scrub.c b/cmds-scrub.c
index e5fccc7..52264f1 100644
--- a/cmds-scrub.c
+++ b/cmds-scrub.c
@@ -1101,13 +1101,14 @@ static int scrub_start(int argc, char **argv, int 
resume)
 
path = argv[optind];
 
-   fdmnt = open_file_or_dir(path);
+   fdmnt = open_path_or_dev_mnt(path);
+
if (fdmnt  0) {
ERR(!do_quiet, ERROR: can't access '%s'\n, path);
return 12;
}
 
-   ret = get_fs_info(fdmnt, path, fi_args, di_args);
+   ret = get_fs_info(path, fi_args, di_args);
if (ret) {
ERR(!do_quiet, ERROR: getting dev info for scrub failed: 
%s\n, strerror(-ret));
@@ -1558,13 +1559,14 @@ static int cmd_scrub_status(int argc, char **argv)
 
path = argv[optind];
 
-   fdmnt = open_file_or_dir(path);
+   fdmnt = open_path_or_dev_mnt(path);
+
if (fdmnt  0) {
fprintf(stderr, ERROR: can't access to '%s'\n, path);
return 12;
}
 
-   ret = get_fs_info(fdmnt, path, fi_args, di_args);
+   ret = get_fs_info(path, fi_args, di_args);
if (ret) {
fprintf(stderr, ERROR: getting dev info for scrub failed: 
%s\n, strerror(-ret));
diff --git a/utils.c b/utils.c
index 4bf457f..c756e23 100644
--- a/utils.c
+++ b/utils.c
@@ -717,7 +717,7 @@ int open_path_or_dev_mnt(const char *path)
errno = EINVAL;
return -1;
}
-   fdmnt = open(mp, O_RDWR);
+   fdmnt = open_file_or_dir(mp);
} else
fdmnt = open_file_or_dir(path);
 
@@ -1544,9 +1544,20 @@ int get_device_info(int fd, u64 devid,
return ret ? -errno : 0;
 }
 
-int get_fs_info(int fd, char *path, struct btrfs_ioctl_fs_info_args *fi_args,
+/*
+ * For a given path, fill in the ioctl fs_ and info_ args.
+ * If the path is a btrfs mountpoint, fill info for all devices.
+ * If the path