Re: snapshot deletion / unmount slowness
On Sun, 10 Mar 2013 22:31:08 -0700 Michael Johnson - MJ m...@revmj.com wrote: What I now suspect is going on is that while deleting the snapshots was quick, that probably kicks of a background thread which actually does the heavy lifting. Exactly that, the snapshot deletion only syncs on unmount, there is no other way to ensure it is complete. If you have some patience and let it unmount properly and then remount it, you may find that you have gained much more free space, due to all the snapshots being actually deleted and the space they were occupying freed only just now. -- With respect, Roman signature.asc Description: PGP signature
Re: snapshot deletion / unmount slowness
On Mon, Mar 11, 2013 at 12:11:43PM +0600, Roman Mamedov wrote: On Sun, 10 Mar 2013 22:31:08 -0700 Michael Johnson - MJ m...@revmj.com wrote: What I now suspect is going on is that while deleting the snapshots was quick, that probably kicks of a background thread which actually does the heavy lifting. Exactly that, the snapshot deletion only syncs on unmount, there is no other way to ensure it is complete. If you have some patience and let it unmount properly and then remount it, you may find that you have gained much more free space, due to all the snapshots being actually deleted and the space they were occupying freed only just now. A recent commit(commit fa6ac8765c48a06dfed914e8c8c3a903f9d313a0 Btrfs: fix cleaner thread not working with inode cache option) may improve the situation. You may want to try it. thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] [RFC] RAID-level terminology change
On Sun, Mar 10, 2013 at 11:55:10PM +, sam tygier wrote: On 09/03/13 20:31, Hugo Mills wrote: Some time ago, and occasionally since, we've discussed altering the RAID-n terminology to change it to an nCmSpP format, where n is the number of copies, m is the number of (data) devices in a stripe per copy, and p is the number of parity devices in a stripe. The current kernel implementation uses as many devices as it can in the striped modes (RAID-0, -10, -5, -6), and in this implementation, that is written as mS (with a literal m). The mS and pP sections are omitted if the value is 1S or 0P. The magic look-up table for old-style / new-style is: single 1C (or omitted, in btrfs fi df output) RAID-0 1CmS RAID-1 2C DUP 2CD RAID-10 2CmS RAID-5 1CmS1P RAID-6 1CmS2P Are these the only valid options? Currently, yes. Are 'sensible' new levels (eg 3C, mirrored to 3 disk or 1CmS3P, like raid6 with but with 3 parity blocks) allowed? Not right now, but: - I don't know if the forthcoming 3c code will allow arbitrary values or not, but Chris has promised 3c. - Fixed S will definitely happen for the parity-RAID levels. I'm not sure about the stripe-RAID levels. - Higher P are mathematically possible, but (AIUI) awkward to construct efficient and effective ones (and it's a manual process to do so). I suspect that 3p may happen, but 4p may not for a long time. Are any arbitrary levels allowed (some other comments in the thread suggest no)? Currently, no, and I don't think there are immediate plans to generalise it, but I'd like to see that happen eventually. Will there be a recommended (or supported) set? Quite likely, even with the limited (forthcoming) set of parameters. Using mSpP on an array of larger than some particular size is probably not going to be particularly good for performance, for example. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You stay in the theatre because you're afraid of having no --- money? There's irony... signature.asc Description: Digital signature
[PATCH] Btrfs: get better concurrency for snapshot-aware defrag work
Using spinning case instead of blocking will result in better concurrency overall. Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/inode.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 13ab4de..1f26 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2312,6 +2312,7 @@ again: key.type = BTRFS_EXTENT_DATA_KEY; key.offset = start; + path-leave_spinning = 1; if (merge) { struct btrfs_file_extent_item *fi; u64 extent_len; @@ -2368,6 +2369,7 @@ again: btrfs_mark_buffer_dirty(leaf); inode_add_bytes(inode, len); + btrfs_release_path(path); ret = btrfs_inc_extent_ref(trans, root, new-bytenr, new-disk_len, 0, @@ -2381,6 +2383,7 @@ again: ret = 1; out_free_path: btrfs_release_path(path); + path-leave_spinning = 0; btrfs_end_transaction(trans, root); out_unlock: unlock_extent_cached(BTRFS_I(inode)-io_tree, lock_start, lock_end, -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: remove btrfs_try_spin_lock
Remove an useless function claim. Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/locking.h |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/locking.h b/fs/btrfs/locking.h index ca52681..b81e0e9 100644 --- a/fs/btrfs/locking.h +++ b/fs/btrfs/locking.h @@ -26,7 +26,6 @@ void btrfs_tree_lock(struct extent_buffer *eb); void btrfs_tree_unlock(struct extent_buffer *eb); -int btrfs_try_spin_lock(struct extent_buffer *eb); void btrfs_tree_read_lock(struct extent_buffer *eb); void btrfs_tree_read_unlock(struct extent_buffer *eb); -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: add options for changing size representations
Add '--si', '-h'/'--human-readable' and '--block-size' global options, which allow users to customize the way sizes are displayed. Options and their format tries to mimic GNU ls utility. Signed-off-by: Audrius Butkevicius audrius.butkevic...@elastichosts.com --- btrfs.c |3 ++ utils.c | 146 +++ utils.h |6 +++ 3 files changed, 138 insertions(+), 17 deletions(-) diff --git a/btrfs.c b/btrfs.c index 691adef..6a8fc30 100644 --- a/btrfs.c +++ b/btrfs.c @@ -22,6 +22,8 @@ #include crc32c.h #include commands.h #include version.h +#include ctree.h +#include utils.h static const char * const btrfs_cmd_group_usage[] = { btrfs [--help] [--version] group [group...] command [args], @@ -291,6 +293,7 @@ int main(int argc, char **argv) crc32c_optimization_init(); + handle_size_unit_args(argc, argv); fixup_argv0(argv, cmd-token); exit(cmd-fn(argc, argv)); } diff --git a/utils.c b/utils.c index d660507..58c1919 100644 --- a/utils.c +++ b/utils.c @@ -16,6 +16,7 @@ * Boston, MA 021110-1307, USA. */ +#define _GNU_SOURCE #define _XOPEN_SOURCE 700 #define __USE_XOPEN2K8 #define __XOPEN2K8 /* due to an error in dirent.h, to get dirfd() */ @@ -1095,33 +1096,144 @@ out: return ret; } -static char *size_strs[] = { , KB, MB, GB, TB, - PB, EB, ZB, YB}; +static int sizes_format = SIZES_FORMAT_BYTES; +static u64 sizes_divisor = 1; + +void remove_arg(int i, int *argc, char ***argv) +{ + while (i++ *argc) + (*argv)[i - 1] = (*argv)[i]; + (*argc)--; +} + +void handle_size_unit_args(int *argc, char ***argv) +{ + int k; + int base = 1024; + char *suffix; + char *block_size; + u64 value; + + for (k = *argc - 1; k = 0; k--) { +if (!strcmp((*argv)[k], -h) || +!strcmp((*argv)[k], --human-readable)) { + sizes_format = SIZES_FORMAT_HUMAN; + remove_arg(k, argc, argv); +} else if (!strcmp((*argv)[k], --si)) { + sizes_format = SIZES_FORMAT_SI; + remove_arg(k, argc, argv); +} else if (!strncmp((*argv)[k], --block-size, 12)) { + if (strlen((*argv)[k]) 14 || (*argv)[k][12] != '=') { + fprintf(stderr, +--block-size requires an argument\n); + exit(1); + } + + sizes_format = SIZES_FORMAT_BLOCK; + block_size = strchr((*argv)[k], '='); + + errno = 0; + value = strtoull(++block_size, suffix, 10); + if (errno == ERANGE value == ULLONG_MAX) { + fprintf(stderr, +--block-size argument '%s' too large\n, +block_size); + exit(1); + } + if (suffix == block_size) + value = 1; + + if (strlen(suffix) == 1 value 0) { + base = 1024; + } else if (strlen(suffix) == 2 suffix[1] == 'B' +value 0) { + base = 1000; + /* Allow non-zero values without a suffix */ + } else if (strlen(suffix) != 0 || value == 0) { + fprintf(stderr, +invalid --block-size argument '%s'\n, +block_size); + exit(1); + } + + if (strlen(suffix) 0) { + switch(suffix[0]) { + case 'E': + sizes_divisor *= base; + case 'P': + sizes_divisor *= base; + case 'T': + sizes_divisor *= base; + case 'G': + sizes_divisor *= base; + case 'M': + sizes_divisor *= base; + case 'K': + sizes_divisor *= base; + break; + default: + fprintf(stderr, +invalid --block-size \ +argument '%s'\n, +
Re: snapshot deletion / unmount slowness
Le 11/03/2013 07:47, Liu Bo a écrit : A recent commit(commit fa6ac8765c48a06dfed914e8c8c3a903f9d313a0 Btrfs: fix cleaner thread not working with inode cache option) may improve the situation. Hi Liu, I have never seen this issue with btrfs-cleaner not working, when I delete snapshots it typically kicks in a few seconds later and works until done. Does the bug you mention affect only specific kernel versions ? AFAIK I use inode_cache (it's not in my fstab but I mounted my FSes using it manually, and I believe it's a persistent option ? - I may possibly be wrong...) TIA Kind regards. -- Swâmi Petaramesh sw...@petaramesh.org http://petaramesh.org PGP 9076E32E Ne cherchez pas : Je ne suis pas sur Facebook. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BTRFS and ionice
Hi, I use ionice -c 3 command to run some low-priority background tasks (i.e. tar, big file copies, or performing checksums sur very big files) using the disk only when idle, which would be supposed to have very little impact on my system performance meanwhile. I can be pretty sure that those tasks are I/O-bound and use very little CPU (and they are niced as well, anyway). However, when such tasks are running my BTRFS system slows down to a crawl, becomes very very unresponsive, and it seems to me that disk I/O is completely saturated (LED is fixed lit...) So I wonder if BTRFS correctly support ionice, or if it's plain useless ? TIA Kind regards. -- Swâmi Petaramesh sw...@petaramesh.org http://petaramesh.org PGP 9076E32E Ne cherchez pas : Je ne suis pas sur Facebook. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: about btrfs quota issues
ping.. Hello, Arne Steps to reproduce: mkfs.btrfs disk mount disk mnt btrfs quota enable mnt btrfs sub create mnt/sub btrfs qgroup create 1/1 mnt btrfs qgroup assign sub_qgroupid 1/1 mnt dd if=/dev/zero of=mnt/sub/data bs=1M count=1 sync btrfs qgroup show mnt #until now, every thing goes well, however, if snapshot happens #the quota accounting will go wrong btrfs sub snapshot mnt/sub mnt/snap sync btrfs qgroup show mnt #the accounting information of group(1/1) is not expected #here exclusive of group (1/1) do not change as expected. So i took a close look at the algorithm of quota accounting, the 3 steps of algorithm don't consider some cases like the above example. In fact, i think you try to put some work on users, especially when snapshot happens. It is complex to track all the group's accounting when having snapshots..See the following commands. btrfs sub snapshot -c src_qgroupid:dst_qgroupid mnt btrfs sub snapshot -x src_qgroupid:dst_qgroupid mnt Are these commands designed for some cases regarding to snapshots/subvolume cases? If so, i think it really confusing and too complex for users to do such work, is't it?... BTW, i have a question about the function btrfs_qgroup_inherit(), when copying exclusive value from src_qgroup to dst_qgroup: dst_qgroup-exclusive = src_qgroup-exclusive + level_size while copying referenced value from src_qgroup to dot_qgroup: dst_qgroup-referenced = src_qgroup-referenced -level_size I can't really figure out...~_~ Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: add options for changing size representations
Hello, Add '--si', '-h'/'--human-readable' and '--block-size' global options, which allow users to customize the way sizes are displayed. why not use the function getopt_long() to complete the parsing. Never Re-inventing the wheel again. As discussed before, better not use 'exit(1)' in the parsing process, I think it better to implement the parse_function like this: int parse_str(char *str, u64 *size) Thanks, Wang Options and their format tries to mimic GNU ls utility. Signed-off-by: Audrius Butkevicius audrius.butkevic...@elastichosts.com --- btrfs.c |3 ++ utils.c | 146 +++ utils.h |6 +++ 3 files changed, 138 insertions(+), 17 deletions(-) diff --git a/btrfs.c b/btrfs.c index 691adef..6a8fc30 100644 --- a/btrfs.c +++ b/btrfs.c @@ -22,6 +22,8 @@ #include crc32c.h #include commands.h #include version.h +#include ctree.h +#include utils.h static const char * const btrfs_cmd_group_usage[] = { btrfs [--help] [--version] group [group...] command [args], @@ -291,6 +293,7 @@ int main(int argc, char **argv) crc32c_optimization_init(); + handle_size_unit_args(argc, argv); fixup_argv0(argv, cmd-token); exit(cmd-fn(argc, argv)); } diff --git a/utils.c b/utils.c index d660507..58c1919 100644 --- a/utils.c +++ b/utils.c @@ -16,6 +16,7 @@ * Boston, MA 021110-1307, USA. */ +#define _GNU_SOURCE #define _XOPEN_SOURCE 700 #define __USE_XOPEN2K8 #define __XOPEN2K8 /* due to an error in dirent.h, to get dirfd() */ @@ -1095,33 +1096,144 @@ out: return ret; } -static char *size_strs[] = { , KB, MB, GB, TB, - PB, EB, ZB, YB}; +static int sizes_format = SIZES_FORMAT_BYTES; +static u64 sizes_divisor = 1; + +void remove_arg(int i, int *argc, char ***argv) +{ + while (i++ *argc) + (*argv)[i - 1] = (*argv)[i]; + (*argc)--; +} + +void handle_size_unit_args(int *argc, char ***argv) +{ + int k; + int base = 1024; + char *suffix; + char *block_size; + u64 value; + + for (k = *argc - 1; k = 0; k--) { + if (!strcmp((*argv)[k], -h) || + !strcmp((*argv)[k], --human-readable)) { + sizes_format = SIZES_FORMAT_HUMAN; + remove_arg(k, argc, argv); + } else if (!strcmp((*argv)[k], --si)) { + sizes_format = SIZES_FORMAT_SI; + remove_arg(k, argc, argv); + } else if (!strncmp((*argv)[k], --block-size, 12)) { + if (strlen((*argv)[k]) 14 || (*argv)[k][12] != '=') { + fprintf(stderr, + --block-size requires an argument\n); + exit(1); + } + + sizes_format = SIZES_FORMAT_BLOCK; + block_size = strchr((*argv)[k], '='); + + errno = 0; + value = strtoull(++block_size, suffix, 10); + if (errno == ERANGE value == ULLONG_MAX) { + fprintf(stderr, + --block-size argument '%s' too large\n, + block_size); + exit(1); + } + if (suffix == block_size) + value = 1; + + if (strlen(suffix) == 1 value 0) { + base = 1024; + } else if (strlen(suffix) == 2 suffix[1] == 'B' + value 0) { + base = 1000; + /* Allow non-zero values without a suffix */ + } else if (strlen(suffix) != 0 || value == 0) { + fprintf(stderr, + invalid --block-size argument '%s'\n, + block_size); + exit(1); + } + + if (strlen(suffix) 0) { + switch(suffix[0]) { + case 'E': + sizes_divisor *= base; + case 'P': + sizes_divisor *= base; + case 'T': + sizes_divisor *= base; + case 'G': + sizes_divisor *= base; + case 'M': + sizes_divisor *= base; + case 'K': + sizes_divisor *= base; +
Re: BTRFS and ionice
Hi, Which IO scheduler do you use? I used to have terrible read performance during a btrfs scrub until I switched the disk scheduler from deadline to cfq. Cheers, Dan On Mon, Mar 11, 2013 at 11:26 AM, Swâmi Petaramesh sw...@petaramesh.org wrote: Hi, I use ionice -c 3 command to run some low-priority background tasks (i.e. tar, big file copies, or performing checksums sur very big files) using the disk only when idle, which would be supposed to have very little impact on my system performance meanwhile. I can be pretty sure that those tasks are I/O-bound and use very little CPU (and they are niced as well, anyway). However, when such tasks are running my BTRFS system slows down to a crawl, becomes very very unresponsive, and it seems to me that disk I/O is completely saturated (LED is fixed lit...) So I wonder if BTRFS correctly support ionice, or if it's plain useless ? TIA Kind regards. -- Swāmi Petaramesh sw...@petaramesh.org http://petaramesh.org PGP 9076E32E Ne cherchez pas : Je ne suis pas sur Facebook. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: about btrfs quota issues
On 10.03.2013 05:21, Shilong Wang wrote: Hello, Arne Steps to reproduce: mkfs.btrfs disk mount disk mnt btrfs quota enable mnt btrfs sub create mnt/sub btrfs qgroup create 1/1 mnt btrfs qgroup assign sub_qgroupid 1/1 mnt dd if=/dev/zero of=mnt/sub/data bs=1M count=1 sync btrfs qgroup show mnt #until now, every thing goes well, however, if snapshot happens #the quota accounting will go wrong btrfs sub snapshot mnt/sub mnt/snap sync btrfs qgroup show mnt #the accounting information of group(1/1) is not expected #here exclusive of group (1/1) do not change as expected. So i took a close look at the algorithm of quota accounting, the 3 steps of algorithm don't consider some cases like the above example. In fact, i think you try to put some work on users, especially when snapshot happens. It is complex to track all the group's accounting when having snapshots..See the following commands. btrfs sub snapshot -c src_qgroupid:dst_qgroupid mnt btrfs sub snapshot -x src_qgroupid:dst_qgroupid mnt Are these commands designed for some cases regarding to snapshots/subvolume cases? Yes, these commands would have helped you in the above case. You need to create an empty qgroup and copy the exclusive from there on snapshot creation. If so, i think it really confusing and too complex for users to do such work, is't it?... It is complex. That is why I always point anyone asking to do some work on btrfs or qgroups to writing an enhanced interface to simplify this task for the user. I don't think the kernel should handle this. And that's why I took the effort to write a pdf to explain the concepts :) But the current interface is not only complex, it also is very powerful. You can solve problems with it that no other quota system I know of can solve. BTW, i have a question about the function btrfs_qgroup_inherit(), when copying exclusive value from src_qgroup to dst_qgroup: dst_qgroup-exclusive = src_qgroup-exclusive + level_size while copying referenced value from src_qgroup to dot_qgroup: dst_qgroup-referenced = src_qgroup-referenced -level_size I can't really figure out...~_~ level_size is just a small correction for the space the tree root occupies. The tree root is never shared between subvolumes. -Arne Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: about btrfs quota issues
Hello, On 10.03.2013 05:21, Shilong Wang wrote: Hello, Arne Steps to reproduce: mkfs.btrfs disk mount disk mnt btrfs quota enable mnt btrfs sub create mnt/sub btrfs qgroup create 1/1 mnt btrfs qgroup assign sub_qgroupid 1/1 mnt dd if=/dev/zero of=mnt/sub/data bs=1M count=1 sync btrfs qgroup show mnt #until now, every thing goes well, however, if snapshot happens #the quota accounting will go wrong btrfs sub snapshot mnt/sub mnt/snap sync btrfs qgroup show mnt #the accounting information of group(1/1) is not expected #here exclusive of group (1/1) do not change as expected. So i took a close look at the algorithm of quota accounting, the 3 steps of algorithm don't consider some cases like the above example. In fact, i think you try to put some work on users, especially when snapshot happens. It is complex to track all the group's accounting when having snapshots..See the following commands. btrfs sub snapshot -c src_qgroupid:dst_qgroupid mnt btrfs sub snapshot -x src_qgroupid:dst_qgroupid mnt Are these commands designed for some cases regarding to snapshots/subvolume cases? Yes, these commands would have helped you in the above case. You need to create an empty qgroup and copy the exclusive from there on snapshot creation. I am wondering why we need the concept of exclusive. Maybe it helps to some extent How about just kicking it off, since the concepts of exclusive adds the complexity of btrfs quota. The worst thing is that i don't think users can master this magic concept very well. If so, i think it really confusing and too complex for users to do such work, is't it?... It is complex. That is why I always point anyone asking to do some work on btrfs or qgroups to writing an enhanced interface to simplify this task for the user. I don't think the kernel should handle this. And that's why I took the effort to write a pdf to explain the concepts :) I don't have any good ideas about this yet.. But the current interface is not only complex, it also is very powerful. You can solve problems with it that no other quota system I know of can solve. BTW, i have a question about the function btrfs_qgroup_inherit(), when copying exclusive value from src_qgroup to dst_qgroup: dst_qgroup-exclusive = src_qgroup-exclusive + level_size while copying referenced value from src_qgroup to dot_qgroup: dst_qgroup-referenced = src_qgroup-referenced -level_size I can't really figure out...~_~ level_size is just a small correction for the space the tree root occupies. The tree root is never shared between sub volumes. O.K. I got it.. Thanks, Wang -Arne Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: about btrfs quota issues
On 11.03.2013 14:31, Wang Shilong wrote: Hello, snip In fact, i think you try to put some work on users, especially when snapshot happens. It is complex to track all the group's accounting when having snapshots..See the following commands. btrfs sub snapshot -c src_qgroupid:dst_qgroupid mnt btrfs sub snapshot -x src_qgroupid:dst_qgroupid mnt Are these commands designed for some cases regarding to snapshots/subvolume cases? Yes, these commands would have helped you in the above case. You need to create an empty qgroup and copy the exclusive from there on snapshot creation. I am wondering why we need the concept of exclusive. Maybe it helps to some extent It is needed to answer the question 'how many space can I gain but deleting this subvol or this set of subvolumes?' How about just kicking it off, since the concepts of exclusive adds the complexity of btrfs quota. If you don't need that value, just ignore the tracking error. The worst thing is that i don't think users can master this magic concept very well. Normally users don't need very sophisticated scenarios. In fact, they don't even need higher level quota groups, the basic tracking is enough. In this case, everything just works as expected for the user. If you start creating and assigning qgroups manually, prepare to handle the complexity. -Arne If so, i think it really confusing and too complex for users to do such work, is't it?... It is complex. That is why I always point anyone asking to do some work on btrfs or qgroups to writing an enhanced interface to simplify this task for the user. I don't think the kernel should handle this. And that's why I took the effort to write a pdf to explain the concepts :) I don't have any good ideas about this yet.. But the current interface is not only complex, it also is very powerful. You can solve problems with it that no other quota system I know of can solve. BTW, i have a question about the function btrfs_qgroup_inherit(), when copying exclusive value from src_qgroup to dst_qgroup: dst_qgroup-exclusive = src_qgroup-exclusive + level_size while copying referenced value from src_qgroup to dot_qgroup: dst_qgroup-referenced = src_qgroup-referenced -level_size I can't really figure out...~_~ level_size is just a small correction for the space the tree root occupies. The tree root is never shared between sub volumes. O.K. I got it.. Thanks, Wang -Arne Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] [RFC] RAID-level terminology change
On Sun, Mar 10, 2013 at 11:49:53PM +, Hugo Mills wrote: Using an asterisk '*' in something will be used as a command line argument risks having the shell expand it. Sticking to pure alphanumeric names would be better. Yeah, David's just pointed this out on IRC. After a bit of fiddling around with various options, I like using X. I'd like to see something that can exist as na identifier or can be copy-pasted in one click, but '*' being a shell meta-character is IMO stronger argument against using it. I'm also going to use lowercase c,s,p, because it seems to be easier to read with the different-height characters. So we end up with, e.g. 1c (single) 2cXs(RAID-10) 1cXs2p (RAID-6) This form looks ok to me. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: about btrfs quota issues
snip The worst thing is that i don't think users can master this magic concept very well. Normally users don't need very sophisticated scenarios. In fact, they don't even need higher level quota groups, the basic tracking is enough. In this case, everything just works as expected for the user. If you start creating and assigning qgroups manually, prepare to handle the complexity. Considering this case: a subvolume related to a user, we limit the space by limiting every subvolume qgroup, but we also want to limit the total space all the users can use. So we create a parent qgroup(1/1 for example) and assign all subvolume group to this parent group. The above case is regularly used i think, What's more, many snapshots may be done. So i think what i am concerning is not a corner case.. Thanks, Wang If so, i think it really confusing and too complex for users to do such work, is't it?... It is complex. That is why I always point anyone asking to do some work on btrfs or qgroups to writing an enhanced interface to simplify this task for the user. I don't think the kernel should handle this. And that's why I took the effort to write a pdf to explain the concepts :) I don't have any good ideas about this yet.. But the current interface is not only complex, it also is very powerful. You can solve problems with it that no other quota system I know of can solve. BTW, i have a question about the function btrfs_qgroup_inherit(), when copying exclusive value from src_qgroup to dst_qgroup: dst_qgroup-exclusive = src_qgroup-exclusive + level_size while copying referenced value from src_qgroup to dot_qgroup: dst_qgroup-referenced = src_qgroup-referenced -level_size I can't really figure out...~_~ level_size is just a small correction for the space the tree root occupies. The tree root is never shared between sub volumes. O.K. I got it.. Thanks, Wang -Arne Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: about btrfs quota issues
On 11.03.2013 15:15, Wang Shilong wrote: snip The worst thing is that i don't think users can master this magic concept very well. Normally users don't need very sophisticated scenarios. In fact, they don't even need higher level quota groups, the basic tracking is enough. In this case, everything just works as expected for the user. If you start creating and assigning qgroups manually, prepare to handle the complexity. Considering this case: a subvolume related to a user, we limit the space by limiting every subvolume qgroup, but we also want to limit the total space all the users can use. So we create a parent qgroup(1/1 for example) and assign all subvolume group to this parent group. The above case is regularly used i think, What's more, many snapshots may be done. So i think what i am concerning is not a corner case.. So you just missed to assign the new subvolume to 1/1 by using -i on snapshot creation. -Arne Thanks, Wang If so, i think it really confusing and too complex for users to do such work, is't it?... It is complex. That is why I always point anyone asking to do some work on btrfs or qgroups to writing an enhanced interface to simplify this task for the user. I don't think the kernel should handle this. And that's why I took the effort to write a pdf to explain the concepts :) I don't have any good ideas about this yet.. But the current interface is not only complex, it also is very powerful. You can solve problems with it that no other quota system I know of can solve. BTW, i have a question about the function btrfs_qgroup_inherit(), when copying exclusive value from src_qgroup to dst_qgroup: dst_qgroup-exclusive = src_qgroup-exclusive + level_size while copying referenced value from src_qgroup to dot_qgroup: dst_qgroup-referenced = src_qgroup-referenced -level_size I can't really figure out...~_~ level_size is just a small correction for the space the tree root occupies. The tree root is never shared between sub volumes. O.K. I got it.. Thanks, Wang -Arne Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: about btrfs quota issues
On 11.03.2013 15:15, Wang Shilong wrote: snip The worst thing is that i don't think users can master this magic concept very well. Normally users don't need very sophisticated scenarios. In fact, they don't even need higher level quota groups, the basic tracking is enough. In this case, everything just works as expected for the user. If you start creating and assigning qgroups manually, prepare to handle the complexity. Considering this case: a subvolume related to a user, we limit the space by limiting every subvolume qgroup, but we also want to limit the total space all the users can use. So we create a parent qgroup(1/1 for example) and assign all subvolume group to this parent group. The above case is regularly used i think, What's more, many snapshots may be done. So i think what i am concerning is not a corner case.. So you just missed to assign the new subvolume to 1/1 by using -i on snapshot creation. When snapshot happens, the exclusive of 1/1 will go wrong even with this simple case.. However, thanks very much for your patience and kindly reply ^_^ Thanks, Wang -Arne Thanks, Wang If so, i think it really confusing and too complex for users to do such work, is't it?... It is complex. That is why I always point anyone asking to do some work on btrfs or qgroups to writing an enhanced interface to simplify this task for the user. I don't think the kernel should handle this. And that's why I took the effort to write a pdf to explain the concepts :) I don't have any good ideas about this yet.. But the current interface is not only complex, it also is very powerful. You can solve problems with it that no other quota system I know of can solve. BTW, i have a question about the function btrfs_qgroup_inherit(), when copying exclusive value from src_qgroup to dst_qgroup: dst_qgroup-exclusive = src_qgroup-exclusive + level_size while copying referenced value from src_qgroup to dot_qgroup: dst_qgroup-referenced = src_qgroup-referenced -level_size I can't really figure out...~_~ level_size is just a small correction for the space the tree root occupies. The tree root is never shared between sub volumes. O.K. I got it.. Thanks, Wang -Arne Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: about btrfs quota issues
On 11.03.2013 15:35, Wang Shilong wrote: On 11.03.2013 15:15, Wang Shilong wrote: snip The worst thing is that i don't think users can master this magic concept very well. Normally users don't need very sophisticated scenarios. In fact, they don't even need higher level quota groups, the basic tracking is enough. In this case, everything just works as expected for the user. If you start creating and assigning qgroups manually, prepare to handle the complexity. Considering this case: a subvolume related to a user, we limit the space by limiting every subvolume qgroup, but we also want to limit the total space all the users can use. So we create a parent qgroup(1/1 for example) and assign all subvolume group to this parent group. The above case is regularly used i think, What's more, many snapshots may be done. So i think what i am concerning is not a corner case.. So you just missed to assign the new subvolume to 1/1 by using -i on snapshot creation. When snapshot happens, the exclusive of 1/1 will go wrong even with this simple case.. Your example does not describe your use case. If you want to account the snapshot to the user, you also have to assign the snapshot to 1/1. If you do so, the exclusive will be correct. -Arne However, thanks very much for your patience and kindly reply ^_^ Thanks, Wang -Arne Thanks, Wang If so, i think it really confusing and too complex for users to do such work, is't it?... It is complex. That is why I always point anyone asking to do some work on btrfs or qgroups to writing an enhanced interface to simplify this task for the user. I don't think the kernel should handle this. And that's why I took the effort to write a pdf to explain the concepts :) I don't have any good ideas about this yet.. But the current interface is not only complex, it also is very powerful. You can solve problems with it that no other quota system I know of can solve. BTW, i have a question about the function btrfs_qgroup_inherit(), when copying exclusive value from src_qgroup to dst_qgroup: dst_qgroup-exclusive = src_qgroup-exclusive + level_size while copying referenced value from src_qgroup to dot_qgroup: dst_qgroup-referenced = src_qgroup-referenced -level_size I can't really figure out...~_~ level_size is just a small correction for the space the tree root occupies. The tree root is never shared between sub volumes. O.K. I got it.. Thanks, Wang -Arne Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Integration branch of btrfs-progs 2013-03-11
Hi, this set contains help text updates, the series from Eric and my fix to mkfs superblock checksum (now needed to mount a new filesystem when the kernel-side check is in place -- applies to current btrfs-next). git://repo.or.cz/btrfs-progs-unstable/devel.git integration-20130311 david -- Anand Jain (3): btrfs-progs: from troubleshooting point of view messages must be unique btrfs-progs: usage should match what is coded btrfs-progs: update the .gitignore file David Sterba (1): btrfs-progs: separate super_copy out of fs_info Eric Sandeen (14): btrfs-progs: close fd on cmd_subvol_list return btrfs-progs: close fd on do_convert error returns btrfs-progs: free resources on do_rollback error returns btrfs-progs: free allocated metadump structure on restore failure btrfs-progs: check for null string in parse_size btrfs-progs: tidy up cmd_snapshot() whitespace returns btrfs-progs: Free resources when returning error from cmd_snapshot() btrfs-progs: tidy up cmd_subvol_create() whitespace returns btrfs-progs: Free resources when returning error from cmd_subvol_create() btrfs-progs: check return of posix_fadvise btrfs-progs: Issue warnings if ioctls fail in sigint handlers btrfs-progs: better option/error handling for btrfs-vol btrfs-progs: Error handling in scrub_progress_cycle() thread btrfs-progs: fix scrub error return from pthread_mutex_lock Zhi Yong Wu (1): btrfs-progs: update mkfs.btrfs help info for raid5/6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: snapshot deletion / unmount slowness
On Sun, Mar 10, 2013 at 10:31:08PM -0700, Michael Johnson - MJ wrote: I currently have a btrfs filesystem that I am unmounting and it has been has been unmounting for the last 20 minutes. I'm pretty sure I know exactly what is going on and in my current situation it's not a huge issues, but it would be a problem if this was a production system and I was trying to do a maintenance. Here is how I got into this situation: I am migrating my data from one pair of disks (mirrored with btrfs) to another pair of disks. I rsync'd my data from the original btrfs file system to the other. When it completed, my new filesystem showed 165GB used. The original show 1.8TB used. I came to the conclusion that it must be the daily snapshots I have that were using the majority of the space and because I was going to destroy the filesystem, I decided, what the heck, let me destroy the snapshots and see what it looks like. To my surprise, removing all the snapshots resulted in the usage dropping from 1.8TB to 1.7TB. I re-ran my rsync, it complete without transferring any new data. I then did a du -s in the mountpoint for the original filesystem and is reported back 165GB which agrees with what rsync and df on the new filesystem reports. My first thought was that I must have some sort of bizarre corruption on the original filesystem. And then I went to unmount it and it still has not returned. What I now suspect is going on is that while deleting the snapshots was quick, that probably kicks of a background thread which actually does the heavy lifting. I noticed a btrfs-cleaner process that was in an io wait state, which I presumed was the process in question. However, now 40 minutes later, my unmount is still hung and the btrfs-cleaner process is sleeping, so perhaps I am wrong. You're right, umount will wake up cleaner kthread to do 'real work' of cleanup marked 'delete' snapshot/subvolume. but while btrfs-cleaner is sleeping, could you please show what unmount is waiting for? Maybe 'cat /proc//stack' will be helpful on figuring out why. thanks, liubo At this point I am going to powercycle my system, but I figured I would check and see if anyone else knew for certain it this was the type of behavior one would expect to see when removing large snapshots and then immediately trying to unmount the filesystem. If so, it seems like this is something that would need to change before someone would want to seriously consider using btrfs w/ snapshots in a production environment. I know btrfs is not considered production ready yet (well, at least not by the developers, regardless of what Oracle and Suse say). At the same time, I've not been able to find any mention of similar problems, so I figured it was worth mentioning. -- Michael Johnson - MJ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] btrfs-progs: use BTRFS_SCAN_BACKUP_SB flag in btrfs_scan_one_device
On 3/8/13 9:25 AM, Anand Jain wrote: bug: --- mkfs.btrfs /dev/sdb -f yes| mkfs.ext4 /dev/sdb mount /dev/sdb /ext4 mkfs.btrfs -f /dev/sdc /dev/sdd (run twice) mkfs.btrfs -f /dev/sdc /dev/sdd :: ERROR: unable to scan the device '/dev/sdb' - Device or resource busy ERROR: unable to scan the device '/dev/sdb' - Device or resource busy adding device /dev/sdd id 2 fs created label (null) on /dev/sdc nodesize 4096 leafsize 4096 sectorsize 4096 size 3.11GB Since we run mkfs.btrfs twice above, there is already a stale btrfs when mkfs.btrfs is run for the 2nd time. which kicks in btrfs_scan_for_fsid() to perform a system-wide scan to find the stale btrfs's partner (to check if that by any chance is mounted) which in process comes across /dev/sdb. Now when it finds /dev/sdb it finds that primary SB is not present and we need to stop him there. This is done by NOT setting BTRFS_SCAN_BACKUP_SB for the function btrfs_scan_for_fsid(). To ensure rest of the logic is unaffected, this patch will ensure BTRFS_SCAN_BACKUP_SB is set for all other places except at check_mounted_where(). Thanks, this seems like progress in the right direction. But that means that many other paths will still scan backups, right? In the following case sdb1 is an ext4-mounted partition w/ a stale btrfs backup superblock present in the middle. # mount /dev/sdb1 /mnt/test # mount | grep sdb1 /dev/sdb1 on /mnt/test type ext4 (rw) # btrfs device scan /dev/sdb1 Scanning for Btrfs filesystems in '/dev/sdb1' ERROR: unable to scan the device '/dev/sdb1' - Device or resource busy Perhaps this is ok since we explicitly told it to scan an ext4-mounted device. [[But, then if I unmount it: # btrfs device scan /dev/sdb1 Scanning for Btrfs filesystems in '/dev/sdb1' ERROR: unable to scan the device '/dev/sdb1' - Invalid argument weird, not sure where that came from. :( Unrelated to this question though.]] Also: # btrfs filesystem show /dev/sdb1 Label: none uuid: a96ea6e6-d3d5-444d-9aaf-057ec579dffe Total devices 1 FS bytes used 28.00KB devid1 size 4.00GB used 445.50MB path /dev/sdb1 whoa, ok, so it's a currently mounted ext4 device, but filesystem show tells me it's btrfs? How about this one: # mount /dev/sdb1 /mnt/test # mount | grep sdb1 /dev/sdb1 on /mnt/test type ext4 (rw) # btrfs check /dev/sdb1 Checking filesystem on /dev/sdb1 UUID: a96ea6e6-d3d5-444d-9aaf-057ec579dffe checking extents checking fs roots checking root refs found 28672 bytes used err is 0 total csum bytes: 0 total tree bytes: 28672 total fs tree bytes: 8192 btree space waste bytes: 22875 file data blocks allocated: 0 referenced 0 Btrfs v0.20-rc1-194-g3deeb4c So my mountged ext4 fs is also a perfectly consistent btrfs fs? and I think the list goes on. IMHO, nothing should be checking the backup superblocks unless explicitly told to. i.e. in e2fsprogs, e2fsck has: -b superblock Instead of using the normal superblock, use an alternative superblock specified by superblock. and debugfs has: -s superblock Causes the file system superblock to be read from the given block number, instead of using the primary superblock I think the backups need to be used for explicit recovery (and maybe to be checked once the primary has been confirmed) and never used during any normal operation, if the first one is found to be missing. -Eric Signed-off-by: Anand Jain anand.j...@oracle.com --- cmds-device.c | 3 ++- cmds-filesystem.c | 2 +- cmds-replace.c| 3 ++- disk-io.c | 7 --- find-root.c | 5 +++-- utils.c | 9 ++--- volumes.c | 4 ++-- volumes.h | 2 +- 8 files changed, 21 insertions(+), 14 deletions(-) diff --git a/cmds-device.c b/cmds-device.c index 1b8f378..9447e7f 100644 --- a/cmds-device.c +++ b/cmds-device.c @@ -203,7 +203,8 @@ static int cmd_scan_dev(int argc, char **argv) printf(Scanning for Btrfs filesystems\n); if(checklist) - ret = btrfs_scan_block_devices(BTRFS_SCAN_REGISTER); + ret = btrfs_scan_block_devices(BTRFS_SCAN_REGISTER| +BTRFS_SCAN_BACKUP_SB); else ret = btrfs_scan_one_dir(/dev, BTRFS_SCAN_REGISTER); if (ret){ diff --git a/cmds-filesystem.c b/cmds-filesystem.c index 2210020..d2e708d 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -257,7 +257,7 @@ static int cmd_show(int argc, char **argv) usage(cmd_show_usage); if(checklist) - ret = btrfs_scan_block_devices(0); + ret = btrfs_scan_block_devices(BTRFS_SCAN_BACKUP_SB); else ret = btrfs_scan_one_dir(/dev, 0); diff --git a/cmds-replace.c b/cmds-replace.c index 4cc32df..f6e1619 100644 --- a/cmds-replace.c +++ b/cmds-replace.c
Re: snapshot deletion / unmount slowness
On Mon, Mar 11, 2013 at 11:20:15AM +0100, Swâmi Petaramesh wrote: Le 11/03/2013 07:47, Liu Bo a écrit : A recent commit(commit fa6ac8765c48a06dfed914e8c8c3a903f9d313a0 Btrfs: fix cleaner thread not working with inode cache option) may improve the situation. Hi Liu, I have never seen this issue with btrfs-cleaner not working, when I delete snapshots it typically kicks in a few seconds later and works until done. The 'not working' is a little confused, sorry. It means that cleaner thread does not do its work in time. When we delete a snapshot/subvolume, we a)invalidate all of inodes that belong to it and then b)add it to a list for cleaner thread to do the real work if the last inode is destroyed from memory. What the commit tries to fix is that the inode cache inode will remain in memory so that keeps the snapshot/subvolume from adding to the cleanup list. And this'd result in the situation that our space is not freed as we wish. So back to the thread, if you notice that even cleaner thread does not help you get free space after you've delete the snapshot/subvolume, there should be some inodes of snapshot/subvolume remaining in memory. Does the bug you mention affect only specific kernel versions ? After we have inode cache. AFAIK I use inode_cache (it's not in my fstab but I mounted my FSes using it manually, and I believe it's a persistent option ? - I may possibly be wrong...) It's only working when you mount with it, it helps you reuse inode id. thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs: merge save_error_info helpers into one
Signed-off-by: David Sterba dste...@suse.cz --- fs/btrfs/super.c |7 +-- 1 files changed, 1 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 68a29a1..eed1464 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -91,7 +91,7 @@ static const char *btrfs_decode_error(int errno, char nbuf[16]) return errstr; } -static void __save_error_info(struct btrfs_fs_info *fs_info) +static void save_error_info(struct btrfs_fs_info *fs_info) { /* * today we only save the error info into ram. Long term we'll @@ -100,11 +100,6 @@ static void __save_error_info(struct btrfs_fs_info *fs_info) set_bit(BTRFS_FS_STATE_ERROR, fs_info-fs_state); } -static void save_error_info(struct btrfs_fs_info *fs_info) -{ - __save_error_info(fs_info); -} - /* btrfs handle error by forcing the filesystem readonly */ static void btrfs_handle_error(struct btrfs_fs_info *fs_info) { -- 1.7.9 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs: clean up transaction abort messages
The transaction abort stacktrace is printed only once per module lifetime, but we'd like to see it each time it happens per filesystem. Introduce a fs_state flag that records the state. Tweak the messages around abort: * add error number to the first abor * print the exact negative errno from btrfs_decode_error and don't expect a simple snprintf to fail * no dots at the end of the messages Signed-off-by: David Sterba dste...@suse.cz --- fs/btrfs/ctree.h |1 + fs/btrfs/super.c | 19 +++ fs/btrfs/transaction.c |5 ++--- 3 files changed, 14 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index e391d6b..14d8f8d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -340,6 +340,7 @@ static inline unsigned long btrfs_chunk_item_size(int num_stripes) */ #define BTRFS_FS_STATE_ERROR 0 #define BTRFS_FS_STATE_REMOUNTING 1 +#define BTRFS_FS_STATE_TRANS_ABORTED 2 /* Super block flags */ /* Errors detected */ diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index eed1464..fe0d6ce 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -65,7 +65,7 @@ static struct file_system_type btrfs_fs_type; static const char *btrfs_decode_error(int errno, char nbuf[16]) { - char *errstr = NULL; + char *errstr = nbuf; switch (errno) { case -EIO: @@ -81,10 +81,7 @@ static const char *btrfs_decode_error(int errno, char nbuf[16]) errstr = Object already exists; break; default: - if (nbuf) { - if (snprintf(nbuf, 16, error %d, -errno) = 0) - errstr = nbuf; - } + snprintf(nbuf, 16, error %d, errno); break; } @@ -121,7 +118,6 @@ static void btrfs_handle_error(struct btrfs_fs_info *fs_info) * mounted writeable again, the device replace * operation continues. */ -// WARN_ON(1); } } @@ -247,7 +243,14 @@ void __btrfs_abort_transaction(struct btrfs_trans_handle *trans, struct btrfs_root *root, const char *function, unsigned int line, int errno) { - WARN_ONCE(1, KERN_DEBUG btrfs: Transaction aborted\n); + /* +* Report first abort since mount +*/ + if (!test_and_set_bit(BTRFS_FS_STATE_TRANS_ABORTED, + root-fs_info-fs_state)) { + WARN(1, KERN_DEBUG btrfs: Transaction aborted (error %d)\n, + errno); + } trans-aborted = errno; /* Nothing used. The other threads that have joined this * transaction may be able to continue. */ @@ -257,7 +260,7 @@ void __btrfs_abort_transaction(struct btrfs_trans_handle *trans, errstr = btrfs_decode_error(errno, nbuf); btrfs_printk(root-fs_info, -%s:%d: Aborting unused transaction(%s).\n, +%s:%d: Aborting unused transaction (%s)\n, function, line, errstr); return; } diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index a0467eb..a5bbda1 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1810,7 +1810,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, ret = btrfs_write_and_wait_transaction(trans, root); if (ret) { btrfs_error(root-fs_info, ret, - Error while writing out transaction.); + Error while writing out transaction); mutex_unlock(root-fs_info-tree_log_mutex); goto cleanup_transaction; } @@ -1866,8 +1866,7 @@ cleanup_transaction: btrfs_qgroup_free(root, trans-qgroup_reserved); trans-qgroup_reserved = 0; } - btrfs_printk(root-fs_info, Skipping commit of aborted transaction.\n); -// WARN_ON(1); + btrfs_printk(root-fs_info, Skipping commit of aborted transaction\n); if (current-journal_info == trans) current-journal_info = NULL; cleanup_transaction(trans, root, ret); -- 1.7.9 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Creating zero-filled file aborts after 20GB in a 4GB volume with compress=lzo
On Mon, Mar 11, 2013 at 04:16:34PM +0100, Clemens Eisserer wrote: When running ... dd if=/dev/zero of=testfile bs=1M on a compressed btrfs volume of 4GB mounted with compress=lzo, dd aborts after about 20GB written. # mkfs 4g # dd if=/dev/zero of=testfile bs=1M dd: writing `testfile': No space left on device 58623787008 bytes (59 GB) copied, 154.548 s, 379 MB/s # btrfs fi df . Data: total=2.04GB, used=1.71GB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=204.75MB, used=77.56MB Metadata: total=8.00MB, used=0.00 # pretty $(filesize testfile) 58,623,787,008 bytes (54.6 GiB) I'm not sure why the enospc came so early (never seen that before *cough*), maybe the other tests running in parallel, so a usuall # cat /dev/zero zerofill cat: write error: No space left on device # pretty $(filesize zerofill) 47648243712 bytes (44.4 GiB) # btrfs fi df . Data: total=3.32GB, used=3.10GB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=332.75MB, used=144.19MB Metadata: total=8.00MB, used=0.00 Now that looks like a full fs, with ~100GB worth of compressed zeros. As I don't think this can be attributed to metadata consumed ... Any ideas why lzo achieves such a poor ratio for even highly compressible data? What does your 'fi df' report before and after the test? david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: get better concurrency for snapshot-aware defrag work
On Mon, Mar 11, 2013 at 05:20:58PM +0800, Liu Bo wrote: Using spinning case instead of blocking will result in better concurrency overall. Do you have numbers to support that? david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: snapshot deletion / unmount slowness
On Sun, Mar 10, 2013 at 10:31:08PM -0700, Michael Johnson - MJ wrote: I currently have a btrfs filesystem that I am unmounting and it has been has been unmounting for the last 20 minutes. I'm pretty sure I know exactly what is going on and in my current situation it's not a huge issues, but it would be a problem if this was a production system and I was trying to do a maintenance. Here is how I got into this situation: What I now suspect is going on is that while deleting the snapshots was quick, that probably kicks of a background thread which actually does the heavy lifting. I noticed a btrfs-cleaner process that was in an io wait state, which I presumed was the process in question. However, now 40 minutes later, my unmount is still hung and the btrfs-cleaner process is sleeping, so perhaps I am wrong. The umount blocked by cleaner is known and I have now a patch ready to improve that http://thread.gmane.org/gmane.comp.file-systems.btrfs/23212 cleaner does not wait to do the background work for all deleted snapshots and is able to return in the middle of processing the current one when the fs si going down. There's another umount blocker, when a huge orphan file is being cleaned up, but from first look it also seems to be possible exit early if umount is detected. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] btrfs-progs: use BTRFS_SCAN_BACKUP_SB flag in btrfs_scan_one_device
On Mon, Mar 11, 2013 at 10:03:46AM -0500, Eric Sandeen wrote: IMHO, nothing should be checking the backup superblocks unless explicitly told to. That's the whole point I believe. update the infrastructure, every SB access looks to the first copy unless told by command line options. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs: clean up transaction abort messages
* print the exact negative errno from btrfs_decode_error and don't expect a simple snprintf to fail What an.. odd function. Looks like it was inherited from ext*. And the callers over in that neck of the woods also don't check for the implemented-but-basically-impossible snprintf failure that leads to returning null. + snprintf(nbuf, 16, error %d, errno); The buffer is only used to print the error number for unknown errors? If changing this function anyway, maybe you can find a few minutes to: - drop the nbuf arugment - just return static strings for known errnos - return unknown for unknown errors - and have the callers always print the string and error : %s (errno %d) No worries if you're not keen to fix it up, but it'd be nice. One less wart to be distracted by when stumbling through the code. - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
converting to raid5
Hi, Just installed 3.9.0-rc2 and the latest btrfs-progs. filesystem is a 4 disk raid1 array. first, i did the following: `btrfs val start -dconvert=raid5,usage=1` to convert the mostly empty chunks. This resulted in a lot of allocated space (10's of gigs), with only a few 100 meg used. i did `btrfs val start -dusage=75` to clean things up. then i ran `btrfs bal start -dconvert=raid5,soft`. I noticed how the difference between total and used for raid5 kept growing. My guess is that its taking 1 raid1 chunk (2x1 gig disk space, 1 gig data), and moving it to 1 raid5 chunk (4gig disk space, 3gig data), leaving all chunks 33% used. This is what 3 calls of `btrfs file df /` looks like a few minutes after each other, with the balance still running: Data, RAID1: total=807.00GB, used=805.70GB Data, RAID5: total=543.00GB, used=192.81GB System, RAID1: total=32.00MB, used=192.00KB Metadata, RAID1: total=6.00GB, used=3.54GB -- Data, RAID1: total=800.00GB, used=798.70GB Data, RAID5: total=564.00GB, used=199.30GB System, RAID1: total=32.00MB, used=192.00KB Metadata, RAID1: total=6.00GB, used=3.53GB -- Data, RAID1: total=795.00GB, used=793.70GB Data, RAID5: total=579.00GB, used=204.81GB System, RAID1: total=32.00MB, used=192.00KB Metadata, RAID1: total=6.00GB, used=3.54GB Remco-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/5] Add man page description for NcMsPp replication levels
Signed-off-by: Hugo Mills h...@carfax.org.uk --- man/btrfs.8.in | 16 man/mkfs.btrfs.8.in | 24 +++- 2 files changed, 39 insertions(+), 1 deletion(-) diff --git a/man/btrfs.8.in b/man/btrfs.8.in index 94f4ffe..4072510 100644 --- a/man/btrfs.8.in +++ b/man/btrfs.8.in @@ -25,6 +25,8 @@ btrfs \- control a btrfs filesystem [-s \fIstart\fR] [-t \fIsize\fR] -[vf] \fIfile\fR|\fIdir\fR \ [\fIfile\fR|\fIdir\fR...] .PP +\fBbtrfs\fP \fBfilesystem df\fP [-r|-e]\fI path \fP +.PP \fBbtrfs\fP \fBfilesystem sync\fP\fI path \fP .PP \fBbtrfs\fP \fBfilesystem resize\fP\fI [devid:][+/\-]size[gkm]|[devid:]max filesystem\fP @@ -217,6 +219,20 @@ don't use it if you use snapshots, have de-duplicated your data or made copies with \fBcp --reflink\fP. .TP +\fBfilesystem df\fR [-r|-e] \fIpath\fR +Show usage information for the filesystem identified by \fIpath\fR. + +\fB-r, --raid\fP Use old-style RAID-n terminology to show replication types + +\fB-e, --explain\fP Explain the new-style NcMsPp terminology in more +detail: Nc shows the number of copies of data; a trailing d +indicates reduced device redundancy (e.g. more than one of the copies +may live on a single device), Ms shows the number of data stripes per +copy (with Xs indicating as many as will fit across the available +devices), and Pp shows the number of parity stripes. + +.TP + \fBfilesystem sync\fR\fI path \fR Force a sync for the filesystem identified by \fIpath\fR. .TP diff --git a/man/mkfs.btrfs.8.in b/man/mkfs.btrfs.8.in index 41163e0..6d1f5d0 100644 --- a/man/mkfs.btrfs.8.in +++ b/man/mkfs.btrfs.8.in @@ -37,7 +37,29 @@ mkfs.btrfs uses all the available storage for the filesystem. .TP \fB\-d\fR, \fB\-\-data \fItype\fR Specify how the data must be spanned across the devices specified. Valid -values are raid0, raid1, raid10 or single. +values are of the form nc[d][ms[pp]], where n is the number of copies +of data, m is the number of stripes per copy, and p is the number of parity +stripes. The m parameter must (currently) be a literal X, indicating that +as many stripes as possible will be used. The letter d may be added to the +number of copies, to indicate non-redundant copies (e.g. on the same device). + +The following deprecated values may also be used: +.RS 16 +.P +single 1c +.P +raid0 1cXs +.P +raid1 2c +.P +dup2cd +.P +raid10 2cXsS +.P +raid5 1cXs1p +.P +raid6 1cXs2p +.RS -16 .TP \fB\-f\fR Force overwrite when an existing filesystem is detected on the device. -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/5] Convert balance filter parser to use common NcMsPp replication-level parser
Balance filters are the second location which takes user input of replication levels. Update this to use the common parser so that we can provide nCmSpP-style names. Signed-off-by: Hugo Mills h...@carfax.org.uk --- cmds-balance.c | 23 --- 1 file changed, 8 insertions(+), 15 deletions(-) diff --git a/cmds-balance.c b/cmds-balance.c index f5dc317..6186963 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -42,23 +42,16 @@ static const char balance_cmd_group_info[] = static int parse_one_profile(const char *profile, u64 *flags) { - if (!strcmp(profile, raid0)) { - *flags |= BTRFS_BLOCK_GROUP_RAID0; - } else if (!strcmp(profile, raid1)) { - *flags |= BTRFS_BLOCK_GROUP_RAID1; - } else if (!strcmp(profile, raid10)) { - *flags |= BTRFS_BLOCK_GROUP_RAID10; - } else if (!strcmp(profile, raid5)) { - *flags |= BTRFS_BLOCK_GROUP_RAID5; - } else if (!strcmp(profile, raid6)) { - *flags |= BTRFS_BLOCK_GROUP_RAID6; - } else if (!strcmp(profile, dup)) { - *flags |= BTRFS_BLOCK_GROUP_DUP; - } else if (!strcmp(profile, single)) { - *flags |= BTRFS_AVAIL_ALLOC_BIT_SINGLE; - } else { + u64 result; + + result = parse_profile(profile); + if (result == (u64)-1) { fprintf(stderr, Unknown profile '%s'\n, profile); return 1; + } else if (result == 0) { + *flags |= BTRFS_AVAIL_ALLOC_BIT_SINGLE; + } else { + *flags |= result; } return 0; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/5] Move parse_profile to utils.c
Make parse_profile a shared function so it can be used across the code-base. Signed-off-by: Hugo Mills h...@carfax.org.uk --- mkfs.c | 94 --- utils.c | 94 +++ utils.h |1 + 3 files changed, 95 insertions(+), 94 deletions(-) diff --git a/mkfs.c b/mkfs.c index 70df5db..0facf13 100644 --- a/mkfs.c +++ b/mkfs.c @@ -348,100 +348,6 @@ static void print_version(void) exit(0); } -static u64 make_profile(int copies, int dup, int stripes, int parity) -{ - if(copies == 1 !dup stripes == 0 parity == 0) - return 0; - else if(copies == 2 dup stripes == 0 parity == 0) - return BTRFS_BLOCK_GROUP_DUP; - else if(copies == 2 !dup stripes == 0 parity == 0) - return BTRFS_BLOCK_GROUP_RAID1; - else if(copies == 2 !dup stripes == -1 parity == 0) - return BTRFS_BLOCK_GROUP_RAID10; - else if(copies == 1 !dup stripes == -1 parity == 0) - return BTRFS_BLOCK_GROUP_RAID0; - else if(copies == 1 !dup stripes == -1 parity == 1) - return BTRFS_BLOCK_GROUP_RAID5; - else if(copies == 1 !dup stripes == -1 parity == 2) - return BTRFS_BLOCK_GROUP_RAID6; - - return (u64)-1; -} - -static u64 parse_profile(const char *s) -{ - char *pos, *parse_end; - int copies = 1; - int stripes = 0; - int parity = 0; - int dup = 0; - u64 profile = (u64)-1; - - /* Look for exact match with historical forms first */ - if (strcmp(s, raid0) == 0) { - return BTRFS_BLOCK_GROUP_RAID0; - } else if (strcmp(s, raid1) == 0) { - return BTRFS_BLOCK_GROUP_RAID1; - } else if (strcmp(s, raid5) == 0) { - return BTRFS_BLOCK_GROUP_RAID5; - } else if (strcmp(s, raid6) == 0) { - return BTRFS_BLOCK_GROUP_RAID6; - } else if (strcmp(s, raid10) == 0) { - return BTRFS_BLOCK_GROUP_RAID10; - } else if (strcmp(s, dup) == 0) { - return BTRFS_BLOCK_GROUP_DUP; - } else if (strcmp(s, single) == 0) { - return 0; - } - - /* Attempt to parse new ncmspp form */ - /* nc is required and n must be an unsigned decimal number */ - copies = strtoul(s, parse_end, 10); - if(parse_end == s || (*parse_end != 'c' *parse_end != 'C')) - goto unknown; - - /* c may be followed by d to indicate non-redundant/DUP */ - pos = parse_end + 1; - if(*pos == 'd' || *pos == 'D') { - dup = 1; - pos++; - } - if(*pos == 0) - goto done; - - /* ms is optional, and m may be an integer, or a literal x */ - if(*pos == 'x' || *pos == 'X') { - stripes = -1; - parse_end = pos+1; - } else { - stripes = strtoul(pos, parse_end, 10); - } - if(parse_end == pos || (*parse_end != 's' *parse_end != 'S')) - goto unknown; - - pos = parse_end + 1; - if(*pos == 0) - goto done; - - /* pp is optional, and p must be an integer */ - parity = strtoul(pos, parse_end, 10); - if(parse_end == pos || (*parse_end != 'p' *parse_end != 'P')) - goto unknown; - pos = parse_end + 1; - if(*pos != 0) - goto unknown; - -done: - profile = make_profile(copies, dup, stripes, parity); - if(profile == (u64)-1) - fprintf(stderr, Unknown or unavailable profile '%s'\n, s); - return profile; - -unknown: - fprintf(stderr, Unparseable profile '%s'\n, s); - return (u64)-1; -} - static char *parse_label(char *input) { int len = strlen(input); diff --git a/utils.c b/utils.c index f68436d..f1d2432 100644 --- a/utils.c +++ b/utils.c @@ -1420,6 +1420,100 @@ u64 parse_size(char *s) return strtoull(s, NULL, 10) * mult; } +static u64 make_profile(int copies, int dup, int stripes, int parity) +{ + if(copies == 1 !dup stripes == 0 parity == 0) + return 0; + else if(copies == 2 dup stripes == 0 parity == 0) + return BTRFS_BLOCK_GROUP_DUP; + else if(copies == 2 !dup stripes == 0 parity == 0) + return BTRFS_BLOCK_GROUP_RAID1; + else if(copies == 2 !dup stripes == -1 parity == 0) + return BTRFS_BLOCK_GROUP_RAID10; + else if(copies == 1 !dup stripes == -1 parity == 0) + return BTRFS_BLOCK_GROUP_RAID0; + else if(copies == 1 !dup stripes == -1 parity == 1) + return BTRFS_BLOCK_GROUP_RAID5; + else if(copies == 1 !dup stripes == -1 parity == 2) + return BTRFS_BLOCK_GROUP_RAID6; + + return (u64)-1; +} + +u64 parse_profile(const char *s) +{ + char *pos, *parse_end; + int copies = 1; +
[PATCH v2 4/5] Change output of btrfs fi df to report new (or old) RAID names
Signed-off-by: Hugo Mills h...@carfax.org.uk --- cmds-filesystem.c | 173 ++--- 1 file changed, 152 insertions(+), 21 deletions(-) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index 2210020..3150ff7 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -18,6 +18,7 @@ #include stdlib.h #include string.h #include unistd.h +#include getopt.h #include sys/ioctl.h #include errno.h #include uuid/uuid.h @@ -39,11 +40,129 @@ static const char * const filesystem_cmd_group_usage[] = { }; static const char * const cmd_df_usage[] = { - btrfs filesystem df path, + btrfs filesystem df [options] path, Show space usage information for a mount point, + , + -r Use old-style RAID-n terminology, + -e Explain new-style NcMsPp terminology, NULL }; +static const char *cmd_df_short_options = re; +static const struct option cmd_df_options[] = { + { raid,no_argument, NULL, 'r' }, + { explain, no_argument, NULL, 'e' }, + { NULL, 0, NULL, 0 } +}; + +#define RAID_NAMES_NEW 0 +#define RAID_NAMES_OLD 1 +#define RAID_NAMES_LONG 2 + +static int write_raid_name(char* buffer, int size, u64 flags, int raid_format) +{ + int copies, stripes, parity; + int out; + int written = 0; + + if (raid_format == RAID_NAMES_OLD) { + if (flags BTRFS_BLOCK_GROUP_RAID0) { + return snprintf(buffer, size, %s, RAID0); + } else if (flags BTRFS_BLOCK_GROUP_RAID1) { + return snprintf(buffer, size, %s, RAID1); + } else if (flags BTRFS_BLOCK_GROUP_DUP) { + return snprintf(buffer, size, %s, DUP); + } else if (flags BTRFS_BLOCK_GROUP_RAID10) { + return snprintf(buffer, size, %s, RAID10); + } else if (flags BTRFS_BLOCK_GROUP_RAID5) { + return snprintf(buffer, size, %s, RAID5); + } else if (flags BTRFS_BLOCK_GROUP_RAID6) { + return snprintf(buffer, size, %s, RAID6); + } + return 0; + } + + if (flags (BTRFS_BLOCK_GROUP_RAID1 +| BTRFS_BLOCK_GROUP_RAID10 +| BTRFS_BLOCK_GROUP_DUP)) { + copies = 2; + } else { + copies = 1; + } + + if (raid_format == RAID_NAMES_LONG) + out = snprintf(buffer, size, %d copies, copies); + else + out = snprintf(buffer, size, %dc, copies); + if (size out) + return written + size; + written += out; + size -= out; + + if (flags BTRFS_BLOCK_GROUP_DUP) { + if (raid_format == RAID_NAMES_LONG) + out = snprintf(buffer+written, size, low redundancy); + else + out = snprintf(buffer+written, size, d); + if (size out) + return written + size; + written += out; + size -= out; + } + + if (flags (BTRFS_BLOCK_GROUP_RAID0 +| BTRFS_BLOCK_GROUP_RAID10 +| BTRFS_BLOCK_GROUP_RAID5 +| BTRFS_BLOCK_GROUP_RAID6)) { + stripes = -1; + } else { + stripes = 0; + } + + if (stripes == -1) { + if (raid_format == RAID_NAMES_LONG) + out = snprintf(buffer+written, size, , fit stripes); + else + out = snprintf(buffer+written, size, Xs); + } else if (stripes == 0) { + out = 0; + } else { + if (raid_format == RAID_NAMES_LONG) + out = snprintf(buffer+written, size, , %d stripes, stripes); + else + out = snprintf(buffer+written, size, %ds, stripes); + } + + if (size out) + return written + size; + written += out; + size -= out; + + if (flags BTRFS_BLOCK_GROUP_RAID5) { + parity = 1; + } else if (flags BTRFS_BLOCK_GROUP_RAID6) { + parity = 2; + } else { + parity = 0; + } + + if (parity == 0) { + out = 0; + } else { + if (raid_format == RAID_NAMES_LONG) + out = snprintf(buffer+written, size, , %d parity, parity); + else + out = snprintf(buffer+written, size, %dp, parity); + } + + if (size out) + return written + size; + written += out; + size -= out; + + return written; +} + static int cmd_df(int argc, char **argv) { struct btrfs_ioctl_space_args *sargs, *sargs_orig; @@ -52,11 +171,32 @@ static int cmd_df(int argc, char **argv) int fd; int e;
[PATCH v2 0/5] RAID-level terminology change
Some time ago, and occasionally since, we've discussed altering the RAID-n terminology to change it to an NcMsPp format, where N is the number of copies, M is the number of (data) devices in a stripe per copy, and P is the number of parity devices in a stripe. The current kernel implementation uses as many devices as it can in the striped modes (RAID-0, -10, -5, -6), and in this implementation, that is written as Xs (with a literal X). The Ms and Pp sections are omitted if the value is 1s or 0p. The magic look-up table for old-style / new-style is: single 1c (or omitted, in btrfs fi df output) RAID-0 1cXs RAID-1 2c DUP 2cd RAID-10 2cXs RAID-5 1cXs1p RAID-6 1cXs2p The following patch set modifies userspace tools to accept c/s/p formats in input (mkfs and the restriper). The older formats are also accepted. It also prints the newer formats by default in btrfs fi df, with an option to show the older format for the traditionalists, and to expand the abbreviation verbosely for those unfamiliar with it. v1 - v2: Changed to use lower-case letters for c/s/p, for readability Changed mS to Xs for readability Added explain option to df Switched option parsing for df to getopt_long Hugo. Hugo Mills (5): Use NcMsPp format for mkfs Move parse_profile to utils.c Convert balance filter parser to use common NcMsPp replication-level parser Change output of btrfs fi df to report new (or old) RAID names Add man page description for NcMsPp replication levels cmds-balance.c | 23 +++ cmds-filesystem.c | 173 --- man/btrfs.8.in | 16 + man/mkfs.btrfs.8.in | 24 ++- mkfs.c | 35 +++ utils.c | 94 utils.h |1 + 7 files changed, 303 insertions(+), 63 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/5] Use NcMsPp format for mkfs
Teach mkfs.btrfs about ncmspp format for replication levels, which avoids the semantic uncertainty over the RAID-XYZ naming. Signed-off-by: Hugo Mills h...@carfax.org.uk --- mkfs.c | 91 +++- 1 file changed, 84 insertions(+), 7 deletions(-) diff --git a/mkfs.c b/mkfs.c index b2520ce..70df5db 100644 --- a/mkfs.c +++ b/mkfs.c @@ -326,7 +326,9 @@ static void print_usage(void) fprintf(stderr, options:\n); fprintf(stderr, \t -A --alloc-start the offset to start the FS\n); fprintf(stderr, \t -b --byte-count total number of bytes in the FS\n); - fprintf(stderr, \t -d --data data profile, raid0, raid1, raid5, raid6, raid10, dup or single\n); + fprintf(stderr, \t -d --data data profile: nc[d][ms[pp]]\n); + fprintf(stderr, \t\tfor n copies (d=reduced dev redundancy), m stripes, p parity stripes\n); + fprintf(stderr, \t\tor raid0, raid1, raid10, dup or single (deprecated)\n); fprintf(stderr, \t -l --leafsize size of btree leaves\n); fprintf(stderr, \t -L --label set a label\n); fprintf(stderr, \t -m --metadata metadata profile, values like data profile\n); @@ -346,8 +348,36 @@ static void print_version(void) exit(0); } -static u64 parse_profile(char *s) +static u64 make_profile(int copies, int dup, int stripes, int parity) { + if(copies == 1 !dup stripes == 0 parity == 0) + return 0; + else if(copies == 2 dup stripes == 0 parity == 0) + return BTRFS_BLOCK_GROUP_DUP; + else if(copies == 2 !dup stripes == 0 parity == 0) + return BTRFS_BLOCK_GROUP_RAID1; + else if(copies == 2 !dup stripes == -1 parity == 0) + return BTRFS_BLOCK_GROUP_RAID10; + else if(copies == 1 !dup stripes == -1 parity == 0) + return BTRFS_BLOCK_GROUP_RAID0; + else if(copies == 1 !dup stripes == -1 parity == 1) + return BTRFS_BLOCK_GROUP_RAID5; + else if(copies == 1 !dup stripes == -1 parity == 2) + return BTRFS_BLOCK_GROUP_RAID6; + + return (u64)-1; +} + +static u64 parse_profile(const char *s) +{ + char *pos, *parse_end; + int copies = 1; + int stripes = 0; + int parity = 0; + int dup = 0; + u64 profile = (u64)-1; + + /* Look for exact match with historical forms first */ if (strcmp(s, raid0) == 0) { return BTRFS_BLOCK_GROUP_RAID0; } else if (strcmp(s, raid1) == 0) { @@ -362,12 +392,54 @@ static u64 parse_profile(char *s) return BTRFS_BLOCK_GROUP_DUP; } else if (strcmp(s, single) == 0) { return 0; + } + + /* Attempt to parse new ncmspp form */ + /* nc is required and n must be an unsigned decimal number */ + copies = strtoul(s, parse_end, 10); + if(parse_end == s || (*parse_end != 'c' *parse_end != 'C')) + goto unknown; + + /* c may be followed by d to indicate non-redundant/DUP */ + pos = parse_end + 1; + if(*pos == 'd' || *pos == 'D') { + dup = 1; + pos++; + } + if(*pos == 0) + goto done; + + /* ms is optional, and m may be an integer, or a literal x */ + if(*pos == 'x' || *pos == 'X') { + stripes = -1; + parse_end = pos+1; } else { - fprintf(stderr, Unknown profile %s\n, s); - print_usage(); + stripes = strtoul(pos, parse_end, 10); } - /* not reached */ - return 0; + if(parse_end == pos || (*parse_end != 's' *parse_end != 'S')) + goto unknown; + + pos = parse_end + 1; + if(*pos == 0) + goto done; + + /* pp is optional, and p must be an integer */ + parity = strtoul(pos, parse_end, 10); + if(parse_end == pos || (*parse_end != 'p' *parse_end != 'P')) + goto unknown; + pos = parse_end + 1; + if(*pos != 0) + goto unknown; + +done: + profile = make_profile(copies, dup, stripes, parity); + if(profile == (u64)-1) + fprintf(stderr, Unknown or unavailable profile '%s'\n, s); + return profile; + +unknown: + fprintf(stderr, Unparseable profile '%s'\n, s); + return (u64)-1; } static char *parse_label(char *input) @@ -1447,6 +1519,11 @@ int main(int ac, char **av) printf(\nWARNING! - %s IS EXPERIMENTAL\n, BTRFS_BUILD_VERSION); printf(WARNING! - see http://btrfs.wiki.kernel.org before using\n\n); + if (data_profile == (u64)-1 || metadata_profile == (u64)-1) { + fprintf(stderr, Cannot handle requested replication profile. Aborting\n); + exit(1); + } + if (source_dir == 0) { file = av[optind++]; ret = is_swap_device(file); @@ -1666,7 +1743,7 @@ raid_groups:
Re: [PATCH] btrfs-progs: add options for changing size representations
On 11 March 2013 10:12, Audrius Butkevicius audrius.butkevic...@elastichosts.com wrote: Add '--si', '-h'/'--human-readable' and '--block-size' global options, which allow users to customize the way sizes are displayed. Options and their format tries to mimic GNU ls utility. Signed-off-by: Audrius Butkevicius audrius.butkevic...@elastichosts.com --- btrfs.c |3 ++ utils.c | 146 +++ utils.h |6 +++ 3 files changed, 138 insertions(+), 17 deletions(-) diff --git a/btrfs.c b/btrfs.c index 691adef..6a8fc30 100644 --- a/btrfs.c +++ b/btrfs.c @@ -22,6 +22,8 @@ #include crc32c.h #include commands.h #include version.h +#include ctree.h +#include utils.h static const char * const btrfs_cmd_group_usage[] = { btrfs [--help] [--version] group [group...] command [args], @@ -291,6 +293,7 @@ int main(int argc, char **argv) crc32c_optimization_init(); + handle_size_unit_args(argc, argv); fixup_argv0(argv, cmd-token); exit(cmd-fn(argc, argv)); } diff --git a/utils.c b/utils.c index d660507..58c1919 100644 --- a/utils.c +++ b/utils.c @@ -16,6 +16,7 @@ * Boston, MA 021110-1307, USA. */ +#define _GNU_SOURCE #define _XOPEN_SOURCE 700 #define __USE_XOPEN2K8 #define __XOPEN2K8 /* due to an error in dirent.h, to get dirfd() */ @@ -1095,33 +1096,144 @@ out: return ret; } -static char *size_strs[] = { , KB, MB, GB, TB, - PB, EB, ZB, YB}; +static int sizes_format = SIZES_FORMAT_BYTES; +static u64 sizes_divisor = 1; + +void remove_arg(int i, int *argc, char ***argv) +{ + while (i++ *argc) + (*argv)[i - 1] = (*argv)[i]; + (*argc)--; +} + +void handle_size_unit_args(int *argc, char ***argv) +{ + int k; + int base = 1024; + char *suffix; + char *block_size; + u64 value; + + for (k = *argc - 1; k = 0; k--) { +if (!strcmp((*argv)[k], -h) || +!strcmp((*argv)[k], --human-readable)) { + sizes_format = SIZES_FORMAT_HUMAN; + remove_arg(k, argc, argv); +} else if (!strcmp((*argv)[k], --si)) { + sizes_format = SIZES_FORMAT_SI; + remove_arg(k, argc, argv); +} else if (!strncmp((*argv)[k], --block-size, 12)) { + if (strlen((*argv)[k]) 14 || (*argv)[k][12] != '=') { + fprintf(stderr, +--block-size requires an argument\n); + exit(1); + } + + sizes_format = SIZES_FORMAT_BLOCK; + block_size = strchr((*argv)[k], '='); + + errno = 0; + value = strtoull(++block_size, suffix, 10); + if (errno == ERANGE value == ULLONG_MAX) { + fprintf(stderr, +--block-size argument '%s' too large\n, +block_size); + exit(1); + } + if (suffix == block_size) + value = 1; + + if (strlen(suffix) == 1 value 0) { + base = 1024; + } else if (strlen(suffix) == 2 suffix[1] == 'B' +value 0) { + base = 1000; + /* Allow non-zero values without a suffix */ + } else if (strlen(suffix) != 0 || value == 0) { + fprintf(stderr, +invalid --block-size argument '%s'\n, +block_size); + exit(1); + } + + if (strlen(suffix) 0) { + switch(suffix[0]) { + case 'E': + sizes_divisor *= base; + case 'P': + sizes_divisor *= base; + case 'T': + sizes_divisor *= base; + case 'G': + sizes_divisor *= base; + case 'M': + sizes_divisor *= base; + case 'K': + sizes_divisor *= base; + break; +
[PATCH] btrfs-progs: Add a rule to build a static mkfs.btrfs
Static mkfs.btrfs can be used to bootstrap a system from a live CD which does not provide mkfs.btrfs. The executable produced is named mkfs.btrfs.static and built by invoking the static make rule. Signed-off-by: Antoine Sirinelli anto...@monte-stello.com --- Makefile |9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index bea8ae9..e986e51 100644 --- a/Makefile +++ b/Makefile @@ -72,7 +72,7 @@ all: version.h $(progs) manpages # NOTE: For static compiles, you need to have all the required libs # static equivalent available # -static: version.h btrfs.static +static: version.h btrfs.static mkfs.btrfs.static version.h: $(Q)bash version.sh @@ -116,6 +116,11 @@ mkfs.btrfs: $(objects) mkfs.o @echo [LD] $@ $(Q)$(CC) $(CFLAGS) -o mkfs.btrfs $(objects) mkfs.o $(LDFLAGS) $(LIBS) -lblkid +mkfs.btrfs.static: $(static_objects) mkfs.static.o + @echo [LD] $@ + $(Q)$(CC) $(STATIC_CFLAGS) -o mkfs.btrfs.static mkfs.static.o \ + $(static_objects) $(STATIC_LDFLAGS) $(STATIC_LIBS) + btrfs-debug-tree: $(objects) debug-tree.o @echo [LD] $@ $(Q)$(CC) $(CFLAGS) -o btrfs-debug-tree $(objects) debug-tree.o $(LDFLAGS) $(LIBS) @@ -178,7 +183,7 @@ clean : @echo Cleaning $(Q)rm -f $(progs) cscope.out *.o .*.d btrfs-convert btrfs-image btrfs-select-super \ btrfs-zero-log btrfstune dir-test ioctl-test quick-test send-test btrfs.static btrfsck \ - version.h + version.h mkfs.btrfs.static $(Q)$(MAKE) $(MAKEOPTS) -C man $@ install: $(progs) install-man -- 1.7.10.4 signature.asc Description: Digital signature
WARNING: at fs/btrfs/extent_map.c:77 free_extent_map
Since the updates for linux-3.9 I've had three or four times a system freeze and only a reset (Magic SysRq) helped. After the reboot I found a bunch of this in syslog: Mar 11 21:56:09 localhost kernel: [ cut here ] Mar 11 21:56:09 localhost kernel: WARNING: at fs/btrfs/extent_map.c:77 free_extent_map+0x64/0x76() Mar 11 21:56:09 localhost kernel: Hardware name: EasyNote TK81 Mar 11 21:56:09 localhost kernel: Modules linked in: nfsv4 nfsd exportfs auth_rpcgss nfs_acl fuse nfs lockd sunrpc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel ath9k snd_hda_codec ath9k_common ath9k_hw acer_wmi snd_hwdep snd_pcm ath sr_mod wmi broadcom snd_page_alloc snd_timer cdrom tg3 k10temp snd acpi_cpufreq ohci_hcd soundcore i2c_piix4 mperf Mar 11 21:56:09 localhost kernel: Pid: 11260, comm: bogofilter Tainted: G W3.9.0-rc2 #293 Mar 11 21:56:09 localhost kernel: Call Trace: Mar 11 21:56:09 localhost kernel: [8102abc2] ? warn_slowpath_common+0x76/0x8c Mar 11 21:56:09 localhost kernel: [8115dcff] ? free_extent_map+0x64/0x76 Mar 11 21:56:09 localhost kernel: [8115bc57] ? btrfs_drop_extent_cache+0x363/0x39f Mar 11 21:56:09 localhost kernel: [81152db4] ? __cow_file_range+0x175/0x3c1 Mar 11 21:56:09 localhost kernel: [8114bb02] ? join_transaction.isra.34+0x30f/0x31a Mar 11 21:56:09 localhost kernel: [8114d9f7] ? start_transaction+0x2d8/0x3e8 Mar 11 21:56:09 localhost kernel: [8115383e] ? cow_file_range+0xa9/0xc5 Mar 11 21:56:09 localhost kernel: [811538f7] ? run_delalloc_range+0x9d/0x33b Mar 11 21:56:09 localhost kernel: [8116139b] ? free_extent_state+0x12/0x21 Mar 11 21:56:09 localhost kernel: [81163fa3] ? __extent_writepage+0x1a8/0x5d8 Mar 11 21:56:09 localhost kernel: [811635ae] ? end_extent_writepage+0x5d/0x5d Mar 11 21:56:09 localhost kernel: [8116451d] ? extent_write_cache_pages.isra.29.constprop.47+0x14a/0x255 Mar 11 21:56:09 localhost kernel: [81164836] ? extent_writepages+0x49/0x60 Mar 11 21:56:09 localhost kernel: [81150146] ? btrfs_update_inode_item+0xde/0xde Mar 11 21:56:09 localhost kernel: [8108fc58] ? __filemap_fdatawrite_range+0x4d/0x52 Mar 11 21:56:09 localhost kernel: [8115a192] ? btrfs_sync_file+0x48/0x203 Mar 11 21:56:09 localhost kernel: [810c85ff] ? vfs_write+0xaf/0xf8 Mar 11 21:56:09 localhost kernel: [810e783b] ? do_fsync+0x2b/0x50 Mar 11 21:56:09 localhost kernel: [810e7a42] ? sys_fdatasync+0xb/0xf Mar 11 21:56:09 localhost kernel: [814877d2] ? system_call_fastpath+0x16/0x1b Mar 11 21:56:09 localhost kernel: ---[ end trace 3eaea449d8d56f92 ]--- As far as I remeber, it happend when fetching emails with claws. But it's not a reliable testcase. Another trace from the first time I found it in the logs. But here the system didn't hang: Mar 4 14:28:35 localhost kernel: [ cut here ] Mar 4 14:28:35 localhost kernel: WARNING: at fs/btrfs/extent_map.c:77 free_extent_map+0x64/0x76() Mar 4 14:28:35 localhost kernel: Hardware name: EasyNote TK81 Mar 4 14:28:35 localhost kernel: Modules linked in: nfsd exportfs auth_rpcgss nfs_acl fuse nfs lockd sunrpc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel ath9k snd_hda_codec ath9k_common snd_hwdep snd_pcm broadcom ath9k_hw snd_page_alloc ath sr_mod snd_timer acer_wmi snd cdrom wmi tg3 ohci_hcd soundcore k10temp edac_core acpi_cpufreq i2c_piix4 mperf Mar 4 14:28:35 localhost kernel: Pid: 1574, comm: flush-btrfs-1 Not tainted 3.9.0-rc1 #289 Mar 4 14:28:35 localhost kernel: Call Trace: Mar 4 14:28:35 localhost kernel: [8102ab92] ? warn_slowpath_common+0x76/0x8c Mar 4 14:28:35 localhost kernel: [8115dc7b] ? free_extent_map+0x64/0x76 Mar 4 14:28:35 localhost kernel: [8115bbd3] ? btrfs_drop_extent_cache+0x363/0x39f Mar 4 14:28:35 localhost kernel: [81152d2d] ? __cow_file_range+0x175/0x3c1 Mar 4 14:28:36 localhost kernel: [81487830] ? _raw_spin_unlock+0x1c/0x28 Mar 4 14:28:36 localhost kernel: [81160de3] ? release_extent_buffer.isra.25+0x90/0x97 Mar 4 14:28:36 localhost kernel: [81153673] ? run_delalloc_nocow+0x6fa/0x795 Mar 4 14:28:36 localhost kernel: [81153837] ? run_delalloc_range+0x64/0x33b Mar 4 14:28:36 localhost kernel: [81161317] ? free_extent_state+0x12/0x21 Mar 4 14:28:36 localhost kernel: [81163f1f] ? __extent_writepage+0x1a8/0x5d8 Mar 4 14:28:36 localhost kernel: [8116352a] ? end_extent_writepage+0x5d/0x5d Mar 4 14:28:36 localhost kernel: [811d4b69] ? cpumask_any_but+0x25/0x34 Mar 4 14:28:36 localhost kernel: [810a5259] ? vma_interval_tree_subtree_search+0x33/0x55 Mar 4 14:28:36 localhost kernel: [810b07b8] ? page_mkclean+0x107/0x119 Mar 4 14:28:36 localhost kernel: [81164499] ? extent_write_cache_pages.isra.29.constprop.47+0x14a/0x255 Mar 4 14:28:36
Re: [PATCH 2/2] btrfs: clean up transaction abort messages
On Mon, Mar 11, 2013 at 12:02:09PM -0700, Zach Brown wrote: No worries if you're not keen to fix it up, but it'd be nice. One less wart to be distracted by when stumbling through the code. I'll gladly update the code, thanks for the hints and comments. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] btrfs-progs: three new device/path helpers
Add 3 new helpers: * is_block_device(), to test if a path is a block device. * get_btrfs_mount(), to get the mountpoint of a device, if mounted. * open_path_or_dev_mnt(path), to open either the pathname or, if it's a mounted btrfs dev, the mountpoint. Useful for some commands which can take either type of arg. Signed-off-by: Eric Sandeen sand...@redhat.com --- utils.c | 84 +++ utils.h |3 ++ 2 files changed, 87 insertions(+), 0 deletions(-) diff --git a/utils.c b/utils.c index 1c73d67..4bf457f 100644 --- a/utils.c +++ b/utils.c @@ -640,6 +640,90 @@ error: return ret; } +/* + * checks if a path is a block device node + * Returns negative errno on failure, otherwise + * returns 1 for blockdev, 0 for not-blockdev + */ +int is_block_device (const char *path) { + struct stat statbuf; + + if (stat(path, statbuf) 0) + return -errno; + + return (S_ISBLK(statbuf.st_mode)); +} + +/* + * Find the mount point for a mounted device. + * On success, returns 0 with mountpoint in *mp. + * On failure, returns -errno (not mounted yields -EINVAL) + * Is noisy on failures, expects to be given a mounted device. + */ +int get_btrfs_mount(const char *dev, char *mp, size_t mp_size) { + int ret; + int fd = -1; + + ret = is_block_device(dev); + if (ret = 0) { + if (!ret) { + fprintf(stderr, %s is not a block device\n, dev); + ret = -EINVAL; + } else + fprintf(stderr, Could not check %s: %s\n, + dev, strerror(-ret)); + goto out; + } + + fd = open(dev, O_RDONLY); + if (fd 0) { + ret = -errno; + fprintf(stderr, Could not open %s: %s\n, dev, strerror(errno)); + goto out; + } + + ret = check_mounted_where(fd, dev, mp, mp_size, NULL); + if (!ret) { + fprintf(stderr, %s is not a mounted btrfs device\n, dev); + ret = -EINVAL; + } else /* mounted, all good */ + ret = 0; +out: + if (fd != -1) + close(fd); + if (ret) + fprintf(stderr, Could not get mountpoint for %s\n, dev); + return ret; +} + +/* + * Given a pathname, return a filehandle to: + * the original pathname or, + * if the pathname is a mounted btrfs device, to its mountpoint. + * + * On error, return -1, errno should be set. + */ +int open_path_or_dev_mnt(const char *path) +{ + char mp[BTRFS_PATH_NAME_MAX + 1]; + int fdmnt; + + if (is_block_device(path)) { + int ret; + + ret = get_btrfs_mount(path, mp, sizeof(mp)); + if (ret 0) { + /* not a mounted btrfs dev */ + errno = EINVAL; + return -1; + } + fdmnt = open(mp, O_RDWR); + } else + fdmnt = open_file_or_dir(path); + + return fdmnt; +} + /* checks if a device is a loop device */ int is_loop_device (const char* device) { struct stat statbuf; diff --git a/utils.h b/utils.h index 0b681ed..8e0252b 100644 --- a/utils.h +++ b/utils.h @@ -56,6 +56,9 @@ int get_label(const char *btrfs_dev); int set_label(const char *btrfs_dev, const char *label); char *__strncpy__null(char *dest, const char *src, size_t n); +int is_block_device(const char *file); +int get_btrfs_mount(const char *path, char *mp, size_t mp_size); +int open_path_or_dev_mnt(const char *path); int is_swap_device(const char *file); /* Helper to always get proper size of the destination string */ #define strncpy_null(dest, src) __strncpy__null(dest, src, sizeof(dest)) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] btrfs-progs: rework get_fs_info to remove side effects
get_fs_info() has been silently switching from a device to a mounted path as needed; the caller's filehandle was unexpectedly closed reopened outside the caller's scope. Not so great. The callers do want fdmnt to be the filehandle for the mount point in all cases, though - the various ioctls act on this (not on an fd for the device). But switching it in the local scope of get_fs_info is incorrect; it just so happens that *usually* the fd number is unchanged. So - use the new helpers to detect when an argument is a block device, and open the the mounted path more obviously / explicitly for ioctl use, storing the filehandle in fdmnt. Then, in get_fs_info, ignore the fd completely, and use the path on the argument to determine if the caller wanted to act on just that device, or on all devices for the filesystem. Affects those commands which are documented to accept either a block device or a path: * btrfs device stats * btrfs replace start * btrfs scrub start * btrfs scrub status Signed-off-by: Eric Sandeen sand...@redhat.com --- cmds-device.c |5 ++- cmds-replace.c |6 +++- cmds-scrub.c | 10 --- utils.c| 73 +++ utils.h|2 +- 5 files changed, 66 insertions(+), 30 deletions(-) diff --git a/cmds-device.c b/cmds-device.c index 58df6da..41e79d3 100644 --- a/cmds-device.c +++ b/cmds-device.c @@ -321,13 +321,14 @@ static int cmd_dev_stats(int argc, char **argv) path = argv[optind]; - fdmnt = open_file_or_dir(path); + fdmnt = open_path_or_dev_mnt(path); + if (fdmnt 0) { fprintf(stderr, ERROR: can't access '%s'\n, path); return 12; } - ret = get_fs_info(fdmnt, path, fi_args, di_args); + ret = get_fs_info(path, fi_args, di_args); if (ret) { fprintf(stderr, ERROR: getting dev info for devstats failed: %s\n, strerror(-ret)); diff --git a/cmds-replace.c b/cmds-replace.c index 10030f6..6397bb5 100644 --- a/cmds-replace.c +++ b/cmds-replace.c @@ -168,7 +168,9 @@ static int cmd_start_replace(int argc, char **argv) if (check_argc_exact(argc - optind, 3)) usage(cmd_start_replace_usage); path = argv[optind + 2]; - fdmnt = open_file_or_dir(path); + + fdmnt = open_path_or_dev_mnt(path); + if (fdmnt 0) { fprintf(stderr, ERROR: can't access \%s\: %s\n, path, strerror(errno)); @@ -215,7 +217,7 @@ static int cmd_start_replace(int argc, char **argv) } start_args.start.srcdevid = (__u64)atoi(srcdev); - ret = get_fs_info(fdmnt, path, fi_args, di_args); + ret = get_fs_info(path, fi_args, di_args); if (ret) { fprintf(stderr, ERROR: getting dev info for devstats failed: %s\n, strerror(-ret)); diff --git a/cmds-scrub.c b/cmds-scrub.c index e5fccc7..52264f1 100644 --- a/cmds-scrub.c +++ b/cmds-scrub.c @@ -1101,13 +1101,14 @@ static int scrub_start(int argc, char **argv, int resume) path = argv[optind]; - fdmnt = open_file_or_dir(path); + fdmnt = open_path_or_dev_mnt(path); + if (fdmnt 0) { ERR(!do_quiet, ERROR: can't access '%s'\n, path); return 12; } - ret = get_fs_info(fdmnt, path, fi_args, di_args); + ret = get_fs_info(path, fi_args, di_args); if (ret) { ERR(!do_quiet, ERROR: getting dev info for scrub failed: %s\n, strerror(-ret)); @@ -1558,13 +1559,14 @@ static int cmd_scrub_status(int argc, char **argv) path = argv[optind]; - fdmnt = open_file_or_dir(path); + fdmnt = open_path_or_dev_mnt(path); + if (fdmnt 0) { fprintf(stderr, ERROR: can't access to '%s'\n, path); return 12; } - ret = get_fs_info(fdmnt, path, fi_args, di_args); + ret = get_fs_info(path, fi_args, di_args); if (ret) { fprintf(stderr, ERROR: getting dev info for scrub failed: %s\n, strerror(-ret)); diff --git a/utils.c b/utils.c index 4bf457f..27cec56 100644 --- a/utils.c +++ b/utils.c @@ -717,7 +717,7 @@ int open_path_or_dev_mnt(const char *path) errno = EINVAL; return -1; } - fdmnt = open(mp, O_RDWR); + fdmnt = open_file_or_dir(mp); } else fdmnt = open_file_or_dir(path); @@ -1544,9 +1544,20 @@ int get_device_info(int fd, u64 devid, return ret ? -errno : 0; } -int get_fs_info(int fd, char *path, struct btrfs_ioctl_fs_info_args *fi_args, +/* + * For a given path, fill in the ioctl fs_ and info_ args. + * If the path is a btrfs mountpoint, fill info for all devices. + * If the path is a btrfs
[PATCH 1/4] btrfs-progs: close fd on return from label get/set functions
Somehow missed these 2 in the last round. Signed-off-by: Eric Sandeen sand...@redhat.com --- utils.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/utils.c b/utils.c index f68436d..1c73d67 100644 --- a/utils.c +++ b/utils.c @@ -1217,6 +1217,7 @@ static int set_label_mounted(const char *mount_path, const char *label) return -1; } + close(fd); return 0; } @@ -1274,6 +1275,7 @@ static int get_label_mounted(const char *mount_path) } fprintf(stdout, %s\n, label); + close(fd); return 0; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] smalle cleanup + get_fs_info rework
The first patch is a trival close of fd on function returns, somehow missed that last go-round. The next 3 are a little more substantial, working to avoid the nasty behavior of get_fs_info, closing re-opening the callers' filehandle out of scope, if it needs to switch from device node to mountpoint. (I suppose we could pass in *fd by reference, but this behavior just seems like a wrong, magical side effect for get_fs_info). So instead, the callers use a helper to *always* wind up with the mountpoint opened, and get_fs_info() now *only* - well - only gets fs info. The previous behavior of if given a device act only on that device; if given a mountpoint act on all devices should persist; I guess that's the original intent. It's really only lightly tested; it should mostly affect: * btrfs device stats * btrfs replace start * btrfs scrub start * btrfs scrub status so any independent sanity testing of that would be great. It'l be nice if/when we get xfstests coverage of some of this to make it easier. :) Thanks, -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] btrfs-progs: don't open-code mountpoint discovery in scrub cancel
cmd_scrub_cancel had its own mountpoint discovery routine; just use open_path_or_dev_mnt() for that now. Signed-off-by: Eric Sandeen sand...@redhat.com --- cmds-scrub.c | 53 + 1 files changed, 17 insertions(+), 36 deletions(-) diff --git a/cmds-scrub.c b/cmds-scrub.c index b0fcde6..e5fccc7 100644 --- a/cmds-scrub.c +++ b/cmds-scrub.c @@ -1459,56 +1459,37 @@ static int cmd_scrub_cancel(int argc, char **argv) { char *path; int ret; - int fdmnt; - int err; - char mp[BTRFS_PATH_NAME_MAX + 1]; - struct btrfs_fs_devices *fs_devices_mnt = NULL; + int fdmnt = -1; if (check_argc_exact(argc, 2)) usage(cmd_scrub_cancel_usage); path = argv[1]; -again: - fdmnt = open_file_or_dir(path); - if (fdmnt 0) { - perror(ERROR: scrub cancel failed:); - return 1; - } + fdmnt = open_path_or_dev_mnt(path); + if (fdmnt 0) { + fprintf(stderr, ERROR: could not open %s: %s\n, + path, strerror(errno)); + ret = 1; + goto out; + } ret = ioctl(fdmnt, BTRFS_IOC_SCRUB_CANCEL, NULL); - err = errno; - - if (ret err == EINVAL) { - /* path is not a btrfs mount point. See if it's a device. */ - ret = check_mounted_where(fdmnt, path, mp, sizeof(mp), - fs_devices_mnt); - if (ret 0) { - /* It's a mounted btrfs device; retry w/ mountpoint. */ - close(fdmnt); - path = mp; - goto again; - } else { - /* It's not a mounted btrfs device either */ - fprintf(stderr, - ERROR: %s is not a mounted btrfs device\n, - path); - ret = 1; - err = EINVAL; - } - } - close(fdmnt); - - if (ret) { + if (ret 0) { fprintf(stderr, ERROR: scrub cancel failed on %s: %s\n, path, - err == ENOTCONN ? not running : strerror(err)); - return 1; + errno == ENOTCONN ? not running : strerror(errno)); + ret = 1; + goto out; } + ret = 0; printf(scrub cancelled\n); - return 0; +out: + if (fdmnt != -1) + close(fdmnt); + return ret; } static const char * const cmd_scrub_resume_usage[] = { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] btrfs-progs: rework get_fs_info to remove side effects
On 3/11/13 6:13 PM, Eric Sandeen wrote: get_fs_info() has been silently switching from a device to a mounted path as needed; the caller's filehandle was unexpectedly closed reopened outside the caller's scope. Not so great. The callers do want fdmnt to be the filehandle for the mount point in all cases, though - the various ioctls act on this (not on an fd for the device). But switching it in the local scope of get_fs_info is incorrect; it just so happens that *usually* the fd number is unchanged. So - use the new helpers to detect when an argument is a block device, and open the the mounted path more obviously / explicitly for ioctl use, storing the filehandle in fdmnt. Then, in get_fs_info, ignore the fd completely, and use the path on the argument to determine if the caller wanted to act on just that device, or on all devices for the filesystem. Affects those commands which are documented to accept either a block device or a path: Following my tradition I'll (immediately) self-nak this one for now. After I sent this I thought to test: # mkfs.btrfs /dev/sdb1 /dev/sdb2; mount /dev/sdb1 /mnt/test; btrfs stats /dev/sdb2 after I tested it, and that fails where it used to work. So a) we could use a test for this, and b) I broke something If the overall idea of the change seems decent, I'll get it fixed up after I sort out what I broke. :/ -Eric * btrfs device stats * btrfs replace start * btrfs scrub start * btrfs scrub status Signed-off-by: Eric Sandeen sand...@redhat.com --- cmds-device.c |5 ++- cmds-replace.c |6 +++- cmds-scrub.c | 10 --- utils.c| 73 +++ utils.h|2 +- 5 files changed, 66 insertions(+), 30 deletions(-) diff --git a/cmds-device.c b/cmds-device.c index 58df6da..41e79d3 100644 --- a/cmds-device.c +++ b/cmds-device.c @@ -321,13 +321,14 @@ static int cmd_dev_stats(int argc, char **argv) path = argv[optind]; - fdmnt = open_file_or_dir(path); + fdmnt = open_path_or_dev_mnt(path); + if (fdmnt 0) { fprintf(stderr, ERROR: can't access '%s'\n, path); return 12; } - ret = get_fs_info(fdmnt, path, fi_args, di_args); + ret = get_fs_info(path, fi_args, di_args); if (ret) { fprintf(stderr, ERROR: getting dev info for devstats failed: %s\n, strerror(-ret)); diff --git a/cmds-replace.c b/cmds-replace.c index 10030f6..6397bb5 100644 --- a/cmds-replace.c +++ b/cmds-replace.c @@ -168,7 +168,9 @@ static int cmd_start_replace(int argc, char **argv) if (check_argc_exact(argc - optind, 3)) usage(cmd_start_replace_usage); path = argv[optind + 2]; - fdmnt = open_file_or_dir(path); + + fdmnt = open_path_or_dev_mnt(path); + if (fdmnt 0) { fprintf(stderr, ERROR: can't access \%s\: %s\n, path, strerror(errno)); @@ -215,7 +217,7 @@ static int cmd_start_replace(int argc, char **argv) } start_args.start.srcdevid = (__u64)atoi(srcdev); - ret = get_fs_info(fdmnt, path, fi_args, di_args); + ret = get_fs_info(path, fi_args, di_args); if (ret) { fprintf(stderr, ERROR: getting dev info for devstats failed: %s\n, strerror(-ret)); diff --git a/cmds-scrub.c b/cmds-scrub.c index e5fccc7..52264f1 100644 --- a/cmds-scrub.c +++ b/cmds-scrub.c @@ -1101,13 +1101,14 @@ static int scrub_start(int argc, char **argv, int resume) path = argv[optind]; - fdmnt = open_file_or_dir(path); + fdmnt = open_path_or_dev_mnt(path); + if (fdmnt 0) { ERR(!do_quiet, ERROR: can't access '%s'\n, path); return 12; } - ret = get_fs_info(fdmnt, path, fi_args, di_args); + ret = get_fs_info(path, fi_args, di_args); if (ret) { ERR(!do_quiet, ERROR: getting dev info for scrub failed: %s\n, strerror(-ret)); @@ -1558,13 +1559,14 @@ static int cmd_scrub_status(int argc, char **argv) path = argv[optind]; - fdmnt = open_file_or_dir(path); + fdmnt = open_path_or_dev_mnt(path); + if (fdmnt 0) { fprintf(stderr, ERROR: can't access to '%s'\n, path); return 12; } - ret = get_fs_info(fdmnt, path, fi_args, di_args); + ret = get_fs_info(path, fi_args, di_args); if (ret) { fprintf(stderr, ERROR: getting dev info for scrub failed: %s\n, strerror(-ret)); diff --git a/utils.c b/utils.c index 4bf457f..27cec56 100644 --- a/utils.c +++ b/utils.c @@ -717,7 +717,7 @@ int open_path_or_dev_mnt(const char *path) errno = EINVAL; return -1; } -
Unable to boot btrfs filesystem, and btrfsck aborts
My laptop crashed hard earlier today. It reset immediately to a black screen followed by the BIOS. I have no idea why. However, it now fails to boot. I took a picture of the kernel panic that results from trying to mount the root filesystem: https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi To make things worse, btrfsck aborts with a double free, without fixing it. I took a picture of that, too: https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT As the kernel panic mentions btrfs_remove_free_space, I also tried mounting with clear_cache. Unfortunately it didn't dislodge anything. This is on a fully updated Fedora 18 system. I would really like to get this data back. If anybody could offer a suggestion I'd be very grateful. Thanks, Matt -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to boot btrfs filesystem, and btrfsck aborts
On Mon, Mar 11, 2013 at 11:44 PM, Matthew Booth matt...@heisenbug.com wrote: My laptop crashed hard earlier today. It reset immediately to a black screen followed by the BIOS. I have no idea why. However, it now fails to boot. I took a picture of the kernel panic that results from trying to mount the root filesystem: https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi To make things worse, btrfsck aborts with a double free, without fixing it. I took a picture of that, too: https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT As the kernel panic mentions btrfs_remove_free_space, I also tried mounting with clear_cache. Unfortunately it didn't dislodge anything. This is on a fully updated Fedora 18 system. I would really like to get this data back. If anybody could offer a suggestion I'd be very grateful. Thanks, Matt -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html If you can make a complete image backup of the drive before trying any things to bring it back. Try mounting with -o nospace_cache, also try -o ro and -o recovery as well as -o recovery,ro. If you can bringt it back in ro mode you can at least copy your data out of it if all else fails... I'm not a dev, just a random guy having an interest in btrfs, so if you don't have a backup and aren't able to create a dd copy of it right now you might wanna wait for a reply of someone who actually knows the code... Good luck -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to boot btrfs filesystem, and btrfsck aborts
On Mon, Mar 11, 2013 at 11:49 PM, Harald Glatt m...@hachre.de wrote: On Mon, Mar 11, 2013 at 11:44 PM, Matthew Booth matt...@heisenbug.com wrote: My laptop crashed hard earlier today. It reset immediately to a black screen followed by the BIOS. I have no idea why. However, it now fails to boot. I took a picture of the kernel panic that results from trying to mount the root filesystem: https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi To make things worse, btrfsck aborts with a double free, without fixing it. I took a picture of that, too: https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT As the kernel panic mentions btrfs_remove_free_space, I also tried mounting with clear_cache. Unfortunately it didn't dislodge anything. This is on a fully updated Fedora 18 system. I would really like to get this data back. If anybody could offer a suggestion I'd be very grateful. If you can make a complete image backup of the drive before trying any things to bring it back. Try mounting with -o nospace_cache, also try -o ro and -o recovery as well as -o recovery,ro. I think the bug happens during log recovery, so btrfs-zero-log might get it mountable again, with the caveat of losing the most recently fsynced changes. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to boot btrfs filesystem, and btrfsck aborts
If you are going to use btrfs-zero-log please create a btrfs-image first that you can then upload to a bug report so that this can be fixed. # btrfs-image -c 9 -t 8 /dev/yourbtrfs /tmp/fs_image On Mon, Mar 11, 2013 at 11:53 PM, Jan Steffens jan.steff...@gmail.com wrote: On Mon, Mar 11, 2013 at 11:49 PM, Harald Glatt m...@hachre.de wrote: On Mon, Mar 11, 2013 at 11:44 PM, Matthew Booth matt...@heisenbug.com wrote: My laptop crashed hard earlier today. It reset immediately to a black screen followed by the BIOS. I have no idea why. However, it now fails to boot. I took a picture of the kernel panic that results from trying to mount the root filesystem: https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi To make things worse, btrfsck aborts with a double free, without fixing it. I took a picture of that, too: https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT As the kernel panic mentions btrfs_remove_free_space, I also tried mounting with clear_cache. Unfortunately it didn't dislodge anything. This is on a fully updated Fedora 18 system. I would really like to get this data back. If anybody could offer a suggestion I'd be very grateful. If you can make a complete image backup of the drive before trying any things to bring it back. Try mounting with -o nospace_cache, also try -o ro and -o recovery as well as -o recovery,ro. I think the bug happens during log recovery, so btrfs-zero-log might get it mountable again, with the caveat of losing the most recently fsynced changes. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL] Re: Integration branch of btrfs-progs 2013-02-27
Hi Chris, please pull this integration branch git://repo.or.cz/btrfs-progs-unstable/devel.git integration-20130227 so far no problems reported (which may also mean that nobody is using it), worked in my test setups and I've tested the label get/set patches specifically. thanks, david On Wed, Feb 27, 2013 at 06:14:45PM +0100, David Sterba wrote: Anand Jain (1): Btrfs-progs: add correct indentation David Sterba (1): btrfs-progs: don't link binaries to a dynamic library Eric Sandeen (16): btrfs-progs: fix btrfs_get_subvol cut/paste error btrfs-progs: Remove write-only var fdres in cmd_dev_stats() btrfs-progs: btrfs_list_get_path_rootid error handling btrfs-progs: avoid double-free in __btrfs_map_block btrfs-progs: fix open error test in cmd_start_replace btrfs-progs: fix close of error fd in scrub cancel btrfs-progs: more scrub cancel error handling btrfs-progs: free memory before error exit in read_whole_eb btrfs-progs: don't call close on error fd btrfs-progs: provide positive errno to strerror in cmd_restore btrfs-progs: free allocated di_args in cmd_start_replace btrfs-progs: close fd on cmd_subvol_get_default return btrfs-progs: fix mem leak in resolve_root btrfs-progs: Tidy up resolve_root btrfs-progs: fix fd leak in cmd_subvol_set_default btrfs-progs: initialize save_ptr prior to strtok_r Jeff Liu (5): Btrfs-progs: Change the label of a mounted file system Btrfs-progs: Fix set_label_unmounted() with label length validation Btrfs-progs: fix cmd_label_usage to reflect this change. btrfs-progs: refactor check_label() btrfs-progs: move btrfslabel.[c|h] stuff to utils.[c|h] Mark Fasheh (1): btrfs-progs: libify some parts of btrfs-progs Tsutomu Itoh (1): Btrfs-progs: fix segmentation fault of btrfs check Wang Shilong (2): Btrfs-progs: let the error message outputed only once Btrfs-progs: output the error reason when qgroup_show fails -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to boot btrfs filesystem, and btrfsck aborts
On Mon, Mar 11, 2013 at 04:44:58PM -0600, Matthew Booth wrote: My laptop crashed hard earlier today. It reset immediately to a black screen followed by the BIOS. I have no idea why. However, it now fails to boot. I took a picture of the kernel panic that results from trying to mount the root filesystem: https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi To make things worse, btrfsck aborts with a double free, without fixing it. I took a picture of that, too: https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT As the kernel panic mentions btrfs_remove_free_space, I also tried mounting with clear_cache. Unfortunately it didn't dislodge anything. This is on a fully updated Fedora 18 system. I would really like to get this data back. If anybody could offer a suggestion I'd be very grateful. This is fixed in 3.9, I'll send those patches back to -stable, sorry I should have done that before now. If you can't get a 3.9 kernel to boot then just use btrfs-zero-log and you'll be good to go. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING: at fs/btrfs/extent_map.c:77 free_extent_map
On Mon, Mar 11, 2013 at 10:34:04PM +0100, Johannes Hirte wrote: Since the updates for linux-3.9 I've had three or four times a system freeze and only a reset (Magic SysRq) helped. After the reboot I found a bunch of this in syslog: Mar 11 21:56:09 localhost kernel: [ cut here ] Mar 11 21:56:09 localhost kernel: WARNING: at fs/btrfs/extent_map.c:77 free_extent_map+0x64/0x76() Mar 11 21:56:09 localhost kernel: Hardware name: EasyNote TK81 Mar 11 21:56:09 localhost kernel: Modules linked in: nfsv4 nfsd exportfs auth_rpcgss nfs_acl fuse nfs lockd sunrpc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel ath9k snd_hda_codec ath9k_common ath9k_hw acer_wmi snd_hwdep snd_pcm ath sr_mod wmi broadcom snd_page_alloc snd_timer cdrom tg3 k10temp snd acpi_cpufreq ohci_hcd soundcore i2c_piix4 mperf Mar 11 21:56:09 localhost kernel: Pid: 11260, comm: bogofilter Tainted: G W3.9.0-rc2 #293 Mar 11 21:56:09 localhost kernel: Call Trace: Mar 11 21:56:09 localhost kernel: [8102abc2] ? warn_slowpath_common+0x76/0x8c Mar 11 21:56:09 localhost kernel: [8115dcff] ? free_extent_map+0x64/0x76 Mar 11 21:56:09 localhost kernel: [8115bc57] ? btrfs_drop_extent_cache+0x363/0x39f Mar 11 21:56:09 localhost kernel: [81152db4] ? __cow_file_range+0x175/0x3c1 Mar 11 21:56:09 localhost kernel: [8114bb02] ? join_transaction.isra.34+0x30f/0x31a Mar 11 21:56:09 localhost kernel: [8114d9f7] ? start_transaction+0x2d8/0x3e8 Mar 11 21:56:09 localhost kernel: [8115383e] ? cow_file_range+0xa9/0xc5 Mar 11 21:56:09 localhost kernel: [811538f7] ? run_delalloc_range+0x9d/0x33b Mar 11 21:56:09 localhost kernel: [8116139b] ? free_extent_state+0x12/0x21 Mar 11 21:56:09 localhost kernel: [81163fa3] ? __extent_writepage+0x1a8/0x5d8 Mar 11 21:56:09 localhost kernel: [811635ae] ? end_extent_writepage+0x5d/0x5d Mar 11 21:56:09 localhost kernel: [8116451d] ? extent_write_cache_pages.isra.29.constprop.47+0x14a/0x255 Mar 11 21:56:09 localhost kernel: [81164836] ? extent_writepages+0x49/0x60 Mar 11 21:56:09 localhost kernel: [81150146] ? btrfs_update_inode_item+0xde/0xde Mar 11 21:56:09 localhost kernel: [8108fc58] ? __filemap_fdatawrite_range+0x4d/0x52 Mar 11 21:56:09 localhost kernel: [8115a192] ? btrfs_sync_file+0x48/0x203 Mar 11 21:56:09 localhost kernel: [810c85ff] ? vfs_write+0xaf/0xf8 Mar 11 21:56:09 localhost kernel: [810e783b] ? do_fsync+0x2b/0x50 Mar 11 21:56:09 localhost kernel: [810e7a42] ? sys_fdatasync+0xb/0xf Mar 11 21:56:09 localhost kernel: [814877d2] ? system_call_fastpath+0x16/0x1b Mar 11 21:56:09 localhost kernel: ---[ end trace 3eaea449d8d56f92 ]--- As far as I remeber, it happend when fetching emails with claws. But it's not a reliable testcase. Hi Johannes, Could you please tell us what mount options you're with? thanks, liubo Another trace from the first time I found it in the logs. But here the system didn't hang: Mar 4 14:28:35 localhost kernel: [ cut here ] Mar 4 14:28:35 localhost kernel: WARNING: at fs/btrfs/extent_map.c:77 free_extent_map+0x64/0x76() Mar 4 14:28:35 localhost kernel: Hardware name: EasyNote TK81 Mar 4 14:28:35 localhost kernel: Modules linked in: nfsd exportfs auth_rpcgss nfs_acl fuse nfs lockd sunrpc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel ath9k snd_hda_codec ath9k_common snd_hwdep snd_pcm broadcom ath9k_hw snd_page_alloc ath sr_mod snd_timer acer_wmi snd cdrom wmi tg3 ohci_hcd soundcore k10temp edac_core acpi_cpufreq i2c_piix4 mperf Mar 4 14:28:35 localhost kernel: Pid: 1574, comm: flush-btrfs-1 Not tainted 3.9.0-rc1 #289 Mar 4 14:28:35 localhost kernel: Call Trace: Mar 4 14:28:35 localhost kernel: [8102ab92] ? warn_slowpath_common+0x76/0x8c Mar 4 14:28:35 localhost kernel: [8115dc7b] ? free_extent_map+0x64/0x76 Mar 4 14:28:35 localhost kernel: [8115bbd3] ? btrfs_drop_extent_cache+0x363/0x39f Mar 4 14:28:35 localhost kernel: [81152d2d] ? __cow_file_range+0x175/0x3c1 Mar 4 14:28:36 localhost kernel: [81487830] ? _raw_spin_unlock+0x1c/0x28 Mar 4 14:28:36 localhost kernel: [81160de3] ? release_extent_buffer.isra.25+0x90/0x97 Mar 4 14:28:36 localhost kernel: [81153673] ? run_delalloc_nocow+0x6fa/0x795 Mar 4 14:28:36 localhost kernel: [81153837] ? run_delalloc_range+0x64/0x33b Mar 4 14:28:36 localhost kernel: [81161317] ? free_extent_state+0x12/0x21 Mar 4 14:28:36 localhost kernel: [81163f1f] ? __extent_writepage+0x1a8/0x5d8 Mar 4 14:28:36 localhost kernel: [8116352a] ? end_extent_writepage+0x5d/0x5d Mar 4 14:28:36 localhost kernel: [811d4b69] ? cpumask_any_but+0x25/0x34 Mar 4 14:28:36 localhost kernel: [810a5259] ?
Re: [PATCH] Btrfs: get better concurrency for snapshot-aware defrag work
On Mon, Mar 11, 2013 at 06:26:40PM +0100, David Sterba wrote: On Mon, Mar 11, 2013 at 05:20:58PM +0800, Liu Bo wrote: Using spinning case instead of blocking will result in better concurrency overall. Do you have numbers to support that? Sorry, I don't, just judging from what leave_spinning is desiged for and the similar usecases, like insert_reserved_file_extents(), which is also involved in endio write worker. thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4 V2] btrfs-progs: rework get_fs_info to remove side effects
get_fs_info() has been silently switching from a device to a mounted path as needed; the caller's filehandle was unexpectedly closed reopened outside the caller's scope. Not so great. The callers do want fdmnt to be the filehandle for the mount point in all cases, though - the various ioctls act on this (not on an fd for the device). But switching it in the local scope of get_fs_info is incorrect; it just so happens that *usually* the fd number is unchanged. So - use the new helpers to detect when an argument is a block device, and open the the mounted path more obviously / explicitly for ioctl use, storing the filehandle in fdmnt. Then, in get_fs_info, ignore the fd completely, and use the path on the argument to determine if the caller wanted to act on just that device, or on all devices for the filesystem. Affects those commands which are documented to accept either a block device or a path: * btrfs device stats * btrfs replace start * btrfs scrub start * btrfs scrub status Signed-off-by: Eric Sandeen sand...@redhat.com --- V2: don't call BTRFS_IOC_FS_INFO in the single device case after we change path/fd to be for the fs mount point. In the single device case we manually filled in fi_args; calling this ioctl after switching fd/path to the mount point overwrites that setup. diff --git a/cmds-device.c b/cmds-device.c index 58df6da..41e79d3 100644 --- a/cmds-device.c +++ b/cmds-device.c @@ -321,13 +321,14 @@ static int cmd_dev_stats(int argc, char **argv) path = argv[optind]; - fdmnt = open_file_or_dir(path); + fdmnt = open_path_or_dev_mnt(path); + if (fdmnt 0) { fprintf(stderr, ERROR: can't access '%s'\n, path); return 12; } - ret = get_fs_info(fdmnt, path, fi_args, di_args); + ret = get_fs_info(path, fi_args, di_args); if (ret) { fprintf(stderr, ERROR: getting dev info for devstats failed: %s\n, strerror(-ret)); diff --git a/cmds-replace.c b/cmds-replace.c index 10030f6..6397bb5 100644 --- a/cmds-replace.c +++ b/cmds-replace.c @@ -168,7 +168,9 @@ static int cmd_start_replace(int argc, char **argv) if (check_argc_exact(argc - optind, 3)) usage(cmd_start_replace_usage); path = argv[optind + 2]; - fdmnt = open_file_or_dir(path); + + fdmnt = open_path_or_dev_mnt(path); + if (fdmnt 0) { fprintf(stderr, ERROR: can't access \%s\: %s\n, path, strerror(errno)); @@ -215,7 +217,7 @@ static int cmd_start_replace(int argc, char **argv) } start_args.start.srcdevid = (__u64)atoi(srcdev); - ret = get_fs_info(fdmnt, path, fi_args, di_args); + ret = get_fs_info(path, fi_args, di_args); if (ret) { fprintf(stderr, ERROR: getting dev info for devstats failed: %s\n, strerror(-ret)); diff --git a/cmds-scrub.c b/cmds-scrub.c index e5fccc7..52264f1 100644 --- a/cmds-scrub.c +++ b/cmds-scrub.c @@ -1101,13 +1101,14 @@ static int scrub_start(int argc, char **argv, int resume) path = argv[optind]; - fdmnt = open_file_or_dir(path); + fdmnt = open_path_or_dev_mnt(path); + if (fdmnt 0) { ERR(!do_quiet, ERROR: can't access '%s'\n, path); return 12; } - ret = get_fs_info(fdmnt, path, fi_args, di_args); + ret = get_fs_info(path, fi_args, di_args); if (ret) { ERR(!do_quiet, ERROR: getting dev info for scrub failed: %s\n, strerror(-ret)); @@ -1558,13 +1559,14 @@ static int cmd_scrub_status(int argc, char **argv) path = argv[optind]; - fdmnt = open_file_or_dir(path); + fdmnt = open_path_or_dev_mnt(path); + if (fdmnt 0) { fprintf(stderr, ERROR: can't access to '%s'\n, path); return 12; } - ret = get_fs_info(fdmnt, path, fi_args, di_args); + ret = get_fs_info(path, fi_args, di_args); if (ret) { fprintf(stderr, ERROR: getting dev info for scrub failed: %s\n, strerror(-ret)); diff --git a/utils.c b/utils.c index 4bf457f..c756e23 100644 --- a/utils.c +++ b/utils.c @@ -717,7 +717,7 @@ int open_path_or_dev_mnt(const char *path) errno = EINVAL; return -1; } - fdmnt = open(mp, O_RDWR); + fdmnt = open_file_or_dir(mp); } else fdmnt = open_file_or_dir(path); @@ -1544,9 +1544,20 @@ int get_device_info(int fd, u64 devid, return ret ? -errno : 0; } -int get_fs_info(int fd, char *path, struct btrfs_ioctl_fs_info_args *fi_args, +/* + * For a given path, fill in the ioctl fs_ and info_ args. + * If the path is a btrfs mountpoint, fill info for all devices. + * If the path