[PATCH v2] Btrfs-progs: update usage message for cmds-restore
Mention that a target path argument is mandatory unless the -l option is supplied. Also mention about the existence of the -l option, which was previously not announced. $ btrfs restore -v /dev/sdb3 usage: btrfs restore [options] device Try to restore files from a damaged filesystem (unmounted) -s get snapshots -v verbose -i ignore errors -o overwrite -t tree location -f offset filesystem location -u block super mirror -d find dir $ echo $? 129 After specifying a target path, the command works as expected: $ btrfs restore -v /dev/sdb3 files2/ Restoring files2/file1 Done searching $ echo $? 0 V2: Updated command synopsis by suggestion of Anand Jain. Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- cmds-restore.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/cmds-restore.c b/cmds-restore.c index eca528d..d362d79 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -956,7 +956,7 @@ out: } const char * const cmd_restore_usage[] = { - btrfs restore [options] device, + btrfs restore [options] device path | -l device, Try to restore files from a damaged filesystem (unmounted), , -s get snapshots, @@ -967,6 +967,7 @@ const char * const cmd_restore_usage[] = { -f offset filesystem location, -u block super mirror, -d find dir, + -l list roots, NULL }; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs crashes
I created a bug: https://bugzilla.kernel.org/show_bug.cgi?id=60544 Am 08.07.2013 15:24, schrieb Josef Bacik: On Mon, Jul 08, 2013 at 08:46:17AM +0200, Franziska Näpelt wrote: Hi everybody, we are using a btrfs RAID 1 with four 2TB hard drives on a Debian 7.1 (Kernel 3.9.6). After about one year of working, there was an error in messages log and the filesystem was mounted read-only. After that I restarted the system but that doesn't fix the bug. The btrfs-filesystem couldn't be mounted. I attach four logs: - when the error occured - after the error occured - when rebooting the system - when I tried ti mount manually Before I rebuilt the filesystem (formatting everything and creating a new btrfs-pool) I made a btrfs-image. I can provide it to you. Can you file a bugzilla at bugzilla.kernel.org (make sure the component is set to btrfs) with all of this information and a link to the image, and the output of btrfsck? Please use the most recent version of btrfs-progs git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: num_tolerated_disk_barrier_failures is incorrect for RAID6
Currently num_tolerated_disk_barrier_failures gets the value of fs_devices-num_devices in the RAID6 case. But, RAID6 can tolerate only two simultaneous failures, so set it to 2. CC: Stefan Behrens sbehr...@giantdisaster.de Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/disk-io.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b8b60b6..aecf788 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3258,7 +3258,7 @@ int btrfs_calc_num_tolerated_disk_barrier_failures( BTRFS_BLOCK_GROUP_RAID10)) { num_tolerated_disk_barrier_failures = 1; } else if (flags - BTRFS_BLOCK_GROUP_RAID5) { + BTRFS_BLOCK_GROUP_RAID6) { num_tolerated_disk_barrier_failures = 2; } } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: num_tolerated_disk_barrier_failures is incorrect for RAID6
On Wed, 10 Jul 2013 14:54:30 +0300, Ilya Dryomov wrote: Currently num_tolerated_disk_barrier_failures gets the value of fs_devices-num_devices in the RAID6 case. But, RAID6 can tolerate only two simultaneous failures, so set it to 2. CC: Stefan Behrens sbehr...@giantdisaster.de Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/disk-io.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b8b60b6..aecf788 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3258,7 +3258,7 @@ int btrfs_calc_num_tolerated_disk_barrier_failures( BTRFS_BLOCK_GROUP_RAID10)) { num_tolerated_disk_barrier_failures = 1; } else if (flags -BTRFS_BLOCK_GROUP_RAID5) { +BTRFS_BLOCK_GROUP_RAID6) { num_tolerated_disk_barrier_failures = 2; } } ELATE :), Henrik Nordvik already fixed it with commit 15b0a89d7. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance loss with lots of snapshots
On Wed, Jul 10, 2013 at 12:54:44PM +1000, Russell Coker wrote: There are two uses of backups, recovering from user errors (IE deleting the wrong file) and recovering from sysadmin errors or hardware failures (IE disks are dead or wiped). For the former use I'm mainly using BTRFS snapshots on many systems. A problem that I have had on more than a few occasions (most recently on the latest Debian 3.9 kernel) is of severe performance loss. A few days ago this happened on a workstation running an Intel 120G SSD device for the root filesystem which was being used for basic workstation tasks (kmail, GIMP, OpenOffice, etc). The /home and / subvols had about 400 snapshots between them (which doesn't seem like a huge number) when the system became unusably slow while running a scrub from a cron job, programs like GIMP became stuck in D state. The system in question has 8G of RAM and very light load, there shouldn't be any reason for it not giving good performance while the scrub was in progress and it definitely should have performed well when the scrub was cancelled. But it didn't return to decent performance until I deleted about 300 snapshots. This has happened to me often enough that I can probably reproduce it on a VM. What kernel should I use for such tests? If I get a virtual machine in a state where it has ongoing performance problems would any of the BTRFS developers like root access to debug it? There is a memory leak-ish with scrub where it doesn't free up the csums it's looked up until after its done scrubbing an area which can lead to OOM's or degraded performance. Btrfs-next has the fix as well as the pull request that just went to Linus, so pick which one you want and run again and see if that helps? I imagine you are probably seeing two things, first that oom'ish behavior and then some other performance gotcha with a fair number of snapshots, but just in case. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 1/2] Btrfs-progs: make pretty_sizes() work less error prone
On Tue, Jul 09, 2013 at 01:24:43PM -0700, Zach Brown wrote: The original codes don't handle error gracefully and some places forget to free memory. We can allocate memory before calling pretty_sizes(), for example, we can use static memory allocation and we don't have to deal with memory allocation fails. I agree that callers shouldn't have to know to free allocated memory. But I think that we can do better and not have callers need to worry about per-call string storage at all. How about something like this? Neat trick! A few neat-picks below. Besides, I guess we can use this sort of trick with the fi-df patches. --- a/utils.c +++ b/utils.c @@ -1153,12 +1153,13 @@ out: static char *size_strs[] = { , KB, MB, GB, TB, PB, EB, ZB, YB}; I'll drop the ZB, YB suffixes. --- a/utils.h +++ b/utils.h @@ -44,7 +44,15 @@ int check_mounted_where(int fd, const char *file, char *where, int size, struct btrfs_fs_devices **fs_devices_mnt); int btrfs_device_already_in_root(struct btrfs_root *root, int fd, int super_offset); -char *pretty_sizes(u64 size); + +void pretty_size_snprintf(u64 size, char *str, size_t str_bytes); +#define pretty_sizes(size) \ and rename it to pretty_size as it takes only one number + ({ \ + static __thread char _str[16]; \ 16 is not enough for exabyte scale, that needs at least 20 bytes + 1 for 0. len(str(2**64)) = 20 - 24 + pretty_size_snprintf(size, _str, sizeof(_str)); \ pretty_size_snprintf((size), _str, sizeof(_str)); \ As these are only trivial changes I'll fix them at commit time. + _str; \ + }) + int get_mountpt(char *dev, char *mntpt, size_t size); int btrfs_scan_block_devices(int run_ioctl); u64 parse_size(char *s); -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: per-thread, per-call pretty buffer
From: Zach Brown z...@redhat.com From: Zach Brown z...@redhat.com We don't need callers to manage string storage for each pretty_sizes() call. We can use a macro to have per-thread and per-call static storage so that pretty_sizes() can be used as many times as needed in printf() arguments without requiring a bunch of supporting variables. This lets us have a natural interface at the cost of requiring __thread and TLS from gcc and a small amount of static storage. This seems better than the current code or doing something with illegible format specifier macros. Signed-off-by: Zach Brown z...@redhat.com Signed-off-by: David Sterba dste...@suse.cz --- I've updated the rest of pretty_size callers in targets that were not built by default. btrfs-calc-size.c | 13 +++-- btrfs-fragments.c | 2 +- cmds-filesystem.c | 27 +-- cmds-scrub.c | 8 mkfs.c| 4 +--- utils.c | 19 ++- utils.h | 10 +- 7 files changed, 37 insertions(+), 46 deletions(-) diff --git a/btrfs-calc-size.c b/btrfs-calc-size.c index c4adfb0..5aa0b70 100644 --- a/btrfs-calc-size.c +++ b/btrfs-calc-size.c @@ -162,18 +162,11 @@ out_print: stat.total_inline, stat.total_nodes, stat.total_leaves, level + 1); } else { - char *total_size; - char *inline_size; - - total_size = pretty_sizes(stat.total_bytes); - inline_size = pretty_sizes(stat.total_inline); - printf(\t%s total size, %s inline data, %Lu nodes, %Lu leaves, %d levels\n, - total_size, inline_size, stat.total_nodes, - stat.total_leaves, level + 1); - free(total_size); - free(inline_size); + pretty_size(stat.total_bytes), + pretty_size(stat.total_inline), + stat.total_nodes, stat.total_leaves, level + 1); } out: btrfs_free_path(path); diff --git a/btrfs-fragments.c b/btrfs-fragments.c index a012fe1..7ec77e7 100644 --- a/btrfs-fragments.c +++ b/btrfs-fragments.c @@ -87,7 +87,7 @@ print_bg(FILE *html, char *name, u64 start, u64 len, u64 used, u64 flags, fprintf(html, p%s chunk starts at %lld, size is %s, %.2f%% used, %.2f%% fragmented/p\n, chunk_type(flags), start, - pretty_sizes(len), 100.0 * used / len, 100.0 * frag); + pretty_size(len), 100.0 * used / len, 100.0 * frag); fprintf(html, img src=\%s\ border=\1\ /\n, name); } diff --git a/cmds-filesystem.c b/cmds-filesystem.c index f41a72a..222e458 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -111,8 +111,6 @@ static int cmd_df(int argc, char **argv) for (i = 0; i sargs-total_spaces; i++) { char description[80]; - char *total_bytes; - char *used_bytes; int written = 0; u64 flags = sargs-spaces[i].flags; @@ -155,10 +153,9 @@ static int cmd_df(int argc, char **argv) written += 7; } - total_bytes = pretty_sizes(sargs-spaces[i].total_bytes); - used_bytes = pretty_sizes(sargs-spaces[i].used_bytes); - printf(%s: total=%s, used=%s\n, description, total_bytes, - used_bytes); + printf(%s: total=%s, used=%s\n, description, + pretty_size(sargs-spaces[i].total_bytes), + pretty_size(sargs-spaces[i].used_bytes)); } close(fd); free(sargs); @@ -192,7 +189,6 @@ static void print_one_uuid(struct btrfs_fs_devices *fs_devices) char uuidbuf[37]; struct list_head *cur; struct btrfs_device *device; - char *super_bytes_used; u64 devs_found = 0; u64 total; @@ -204,25 +200,20 @@ static void print_one_uuid(struct btrfs_fs_devices *fs_devices) else printf(Label: none ); - super_bytes_used = pretty_sizes(device-super_bytes_used); total = device-total_devs; printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf, - (unsigned long long)total, super_bytes_used); - - free(super_bytes_used); + (unsigned long long)total, + pretty_size(device-super_bytes_used)); list_for_each(cur, fs_devices-devices) { - char *total_bytes; - char *bytes_used; device = list_entry(cur, struct btrfs_device, dev_list); - total_bytes = pretty_sizes(device-total_bytes); - bytes_used = pretty_sizes(device-bytes_used); + printf(\tdevid %4llu size %s used %s path %s\n, (unsigned long long)device-devid, - total_bytes, bytes_used, device-name); -
Re: [PATCH] btrfs-progs: per-thread, per-call pretty buffer
Hello David, From: Zach Brown z...@redhat.com duplicate information. From: Zach Brown z...@redhat.com We don't need callers to manage string storage for each pretty_sizes() call. We can use a macro to have per-thread and per-call static storage so that pretty_sizes() can be used as many times as needed in printf() arguments without requiring a bunch of supporting variables. This lets us have a natural interface at the cost of requiring __thread and TLS from gcc and a small amount of static storage. This seems better than the current code or doing something with illegible format specifier macros. Signed-off-by: Zach Brown z...@redhat.com Signed-off-by: David Sterba dste...@suse.cz OK. please add my tag: Acked-by: Wang Shilong wangs.f...@cn.fujitsu.com (I have given my tag in the previous thread to Zach and cc to you!) Zach gives a better solution,but i at least report and try for it. Isn't it? Thanks Wang --- I've updated the rest of pretty_size callers in targets that were not built by default. btrfs-calc-size.c | 13 +++-- btrfs-fragments.c | 2 +- cmds-filesystem.c | 27 +-- cmds-scrub.c | 8 mkfs.c| 4 +--- utils.c | 19 ++- utils.h | 10 +- 7 files changed, 37 insertions(+), 46 deletions(-) diff --git a/btrfs-calc-size.c b/btrfs-calc-size.c index c4adfb0..5aa0b70 100644 --- a/btrfs-calc-size.c +++ b/btrfs-calc-size.c @@ -162,18 +162,11 @@ out_print: stat.total_inline, stat.total_nodes, stat.total_leaves, level + 1); } else { - char *total_size; - char *inline_size; - - total_size = pretty_sizes(stat.total_bytes); - inline_size = pretty_sizes(stat.total_inline); - printf(\t%s total size, %s inline data, %Lu nodes, %Lu leaves, %d levels\n, -total_size, inline_size, stat.total_nodes, -stat.total_leaves, level + 1); - free(total_size); - free(inline_size); +pretty_size(stat.total_bytes), +pretty_size(stat.total_inline), +stat.total_nodes, stat.total_leaves, level + 1); } out: btrfs_free_path(path); diff --git a/btrfs-fragments.c b/btrfs-fragments.c index a012fe1..7ec77e7 100644 --- a/btrfs-fragments.c +++ b/btrfs-fragments.c @@ -87,7 +87,7 @@ print_bg(FILE *html, char *name, u64 start, u64 len, u64 used, u64 flags, fprintf(html, p%s chunk starts at %lld, size is %s, %.2f%% used, %.2f%% fragmented/p\n, chunk_type(flags), start, - pretty_sizes(len), 100.0 * used / len, 100.0 * frag); + pretty_size(len), 100.0 * used / len, 100.0 * frag); fprintf(html, img src=\%s\ border=\1\ /\n, name); } diff --git a/cmds-filesystem.c b/cmds-filesystem.c index f41a72a..222e458 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -111,8 +111,6 @@ static int cmd_df(int argc, char **argv) for (i = 0; i sargs-total_spaces; i++) { char description[80]; - char *total_bytes; - char *used_bytes; int written = 0; u64 flags = sargs-spaces[i].flags; @@ -155,10 +153,9 @@ static int cmd_df(int argc, char **argv) written += 7; } - total_bytes = pretty_sizes(sargs-spaces[i].total_bytes); - used_bytes = pretty_sizes(sargs-spaces[i].used_bytes); - printf(%s: total=%s, used=%s\n, description, total_bytes, -used_bytes); + printf(%s: total=%s, used=%s\n, description, + pretty_size(sargs-spaces[i].total_bytes), + pretty_size(sargs-spaces[i].used_bytes)); } close(fd); free(sargs); @@ -192,7 +189,6 @@ static void print_one_uuid(struct btrfs_fs_devices *fs_devices) char uuidbuf[37]; struct list_head *cur; struct btrfs_device *device; - char *super_bytes_used; u64 devs_found = 0; u64 total; @@ -204,25 +200,20 @@ static void print_one_uuid(struct btrfs_fs_devices *fs_devices) else printf(Label: none ); - super_bytes_used = pretty_sizes(device-super_bytes_used); total = device-total_devs; printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf, -(unsigned long long)total, super_bytes_used); - - free(super_bytes_used); +(unsigned long long)total, +pretty_size(device-super_bytes_used)); list_for_each(cur, fs_devices-devices) { - char *total_bytes; - char *bytes_used; device = list_entry(cur, struct btrfs_device, dev_list); - total_bytes =
[PATCH] Btrfs-progs: fix restore command leaving corrupted files
When there are files that have parts shared with snapshots, the restore command was incorrectly restoring them, as it was not taking into account the offset and number of bytes fields from the file extent item. Besides leaving the recovered file corrupt, it was also inneficient as it read and wrote more data than needed (with each extent copy overwriting portions of the one previously written). The following steps and small C program show how to reproduce this corruption issue: $ mkfs.btrfs -f /dev/sdb3 $ mount /dev/sdb3 /mnt/btrfs $ ./write_file /mnt/btrfs/foobar $ du -b /mnt/btrfs/foobar 1048926 /mnt/btrfs/foobar $ md5sum /mnt/btrfs/foobar f9f778f3a7410c40e4ed104a3a63c3c4 /mnt/btrfs/foobar $ btrfs subvolume snapshot /mnt/btrfs /mnt/btrfs/my_snap $ perl -e 'open($f, +, /dev/btrfs/foobar); seek($f, 4096, 0); print $f \xff; close($f);' $ md5sum /mnt/btrfs/foobar b983fcefd4622a03a78936484c40272b /mnt/btrfs/foobar $ umount /mnt/btrfs $ btrfs restore /dev/sdb3 /tmp/copy $ du -b /tmp/copy/foobar 1048926 /tmp/copy/foobar $ md5sum /tmp/copy/foobar 88db338cbc1c44dfabae083f1ce642d5 /tmp/copy/foobar $ od -t x1 -j 8192 -N 4 /tmp/copy/foobar 002 41 00 00 00 0020004 $ mount /dev/sdb3 /mnt/btrfs $ od -t x1 -j 8192 -N 4 /mnt/btrfs/foobar 002 00 00 00 00 0020004 $ md5sum /mnt/btrfs/foobar b983fcefd4622a03a78936484c40272b /mnt/btrfs/foobar $ cat write_file.c: int main(int argc, char *argv[]) { int fd; unsigned char buf[BUF_SIZE]; if (argc 2) { fprintf(stderr, Use: %s filepath\n, argv[0]); return 1; } fd = open(argv[1], O_CREAT | O_WRONLY | O_TRUNC, S_IRWXU); assert(fd = 0); memset(buf, 0, BUF_SIZE); buf[0] = 65; assert(write(fd, buf, BUF_SIZE) == BUF_SIZE); assert(close(fd) == 0); return 0; } Tested this change with zlib, lzo compression and file sizes larger than 1GiB, and found no regression or other corruption issues (so far at least). Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- cmds-restore.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/cmds-restore.c b/cmds-restore.c index e48df40..9688599 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -272,6 +272,7 @@ static int copy_one_extent(struct btrfs_root *root, int fd, u64 bytenr; u64 ram_size; u64 disk_size; + u64 num_bytes; u64 length; u64 size_left; u64 dev_bytenr; @@ -288,7 +289,9 @@ static int copy_one_extent(struct btrfs_root *root, int fd, disk_size = btrfs_file_extent_disk_num_bytes(leaf, fi); ram_size = btrfs_file_extent_ram_bytes(leaf, fi); offset = btrfs_file_extent_offset(leaf, fi); - size_left = disk_size; + num_bytes = btrfs_file_extent_num_bytes(leaf, fi); + size_left = num_bytes; + bytenr += offset; if (offset) printf(offset is %Lu\n, offset); @@ -296,7 +299,7 @@ static int copy_one_extent(struct btrfs_root *root, int fd, if (disk_size == 0) return 0; - inbuf = malloc(disk_size); + inbuf = malloc(size_left); if (!inbuf) { fprintf(stderr, No memory\n); return -1; @@ -351,8 +354,8 @@ again: goto again; if (compress == BTRFS_COMPRESS_NONE) { - while (total ram_size) { - done = pwrite(fd, inbuf+total, ram_size-total, + while (total num_bytes) { + done = pwrite(fd, inbuf+total, num_bytes-total, pos+total); if (done 0) { ret = -1; @@ -365,7 +368,7 @@ again: goto out; } - ret = decompress(inbuf, outbuf, disk_size, ram_size, compress); + ret = decompress(inbuf, outbuf, num_bytes, ram_size, compress); if (ret) { num_copies = btrfs_num_copies(root-fs_info-mapping_tree, bytenr, length); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: per-thread, per-call pretty buffer
On Wed, Jul 10, 2013 at 11:31:17PM +0800, Wang Shilong wrote: Hello David, From: Zach Brown z...@redhat.com duplicate information. git-send-email tricked me, the line is not present in thre tree From: Zach Brown z...@redhat.com We don't need callers to manage string storage for each pretty_sizes() call. We can use a macro to have per-thread and per-call static storage so that pretty_sizes() can be used as many times as needed in printf() arguments without requiring a bunch of supporting variables. This lets us have a natural interface at the cost of requiring __thread and TLS from gcc and a small amount of static storage. This seems better than the current code or doing something with illegible format specifier macros. Signed-off-by: Zach Brown z...@redhat.com Signed-off-by: David Sterba dste...@suse.cz OK. please add my tag: Acked-by: Wang Shilong wangs.f...@cn.fujitsu.com (I have given my tag in the previous thread to Zach and cc to you!) Zach gives a better solution,but i at least report and try for it. Isn't it? Oh sorry, I'll add the tag of course, I was so excited with zach's patch and missed it. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 1/2] Btrfs-progs: make pretty_sizes() work less error prone
Neat trick! A few neat-picks below. Indeed, those are all good fixes. As these are only trivial changes I'll fix them at commit time. Great, thanks David! - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Btrfs-progs: fix restore command leaving corrupted files
When there are files that have parts shared with snapshots, the restore command was incorrectly restoring them, as it was not taking into account the offset and number of bytes fields from the file extent item. Besides leaving the recovered file corrupt, it was also inneficient as it read and wrote more data than needed (with each extent copy overwriting portions of the one previously written). The following steps and small C program show how to reproduce this corruption issue: $ mkfs.btrfs -f /dev/sdb3 $ mount /dev/sdb3 /mnt/btrfs $ ./write_file /mnt/btrfs/foobar $ du -b /mnt/btrfs/foobar 1048926 /mnt/btrfs/foobar $ md5sum /mnt/btrfs/foobar f9f778f3a7410c40e4ed104a3a63c3c4 /mnt/btrfs/foobar $ btrfs subvolume snapshot /mnt/btrfs /mnt/btrfs/my_snap $ perl -e 'open($f, +, /dev/btrfs/foobar); seek($f, 4096, 0); print $f \xff; close($f);' $ md5sum /mnt/btrfs/foobar b983fcefd4622a03a78936484c40272b /mnt/btrfs/foobar $ umount /mnt/btrfs $ btrfs restore /dev/sdb3 /tmp/copy $ du -b /tmp/copy/foobar 1048926 /tmp/copy/foobar $ md5sum /tmp/copy/foobar 88db338cbc1c44dfabae083f1ce642d5 /tmp/copy/foobar $ od -t x1 -j 8192 -N 4 /tmp/copy/foobar 002 41 00 00 00 0020004 $ mount /dev/sdb3 /mnt/btrfs $ od -t x1 -j 8192 -N 4 /mnt/btrfs/foobar 002 00 00 00 00 0020004 $ md5sum /mnt/btrfs/foobar b983fcefd4622a03a78936484c40272b /mnt/btrfs/foobar $ cat write_file.c: #include stdio.h #include stdlib.h #include unistd.h #include sys/types.h #include sys/stat.h #include fcntl.h #include string.h #include assert.h #define BUF_SIZE (60 * 1024 * 1024 + 33350) int main(int argc, char *argv[]) { int fd; unsigned char buf[BUF_SIZE]; if (argc 2) { fprintf(stderr, Use: %s filepath\n, argv[0]); return 1; } fd = open(argv[1], O_CREAT | O_WRONLY | O_TRUNC, S_IRWXU); assert(fd = 0); memset(buf, 0, BUF_SIZE); buf[0] = 65; assert(write(fd, buf, BUF_SIZE) == BUF_SIZE); assert(close(fd) == 0); return 0; } Tested this change with zlib, lzo compression and file sizes larger than 1GiB, and found no regression or other corruption issues (so far at least). Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: updated commit message to include the C preprocessor macros in the C program. cmds-restore.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/cmds-restore.c b/cmds-restore.c index e48df40..9688599 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -272,6 +272,7 @@ static int copy_one_extent(struct btrfs_root *root, int fd, u64 bytenr; u64 ram_size; u64 disk_size; + u64 num_bytes; u64 length; u64 size_left; u64 dev_bytenr; @@ -288,7 +289,9 @@ static int copy_one_extent(struct btrfs_root *root, int fd, disk_size = btrfs_file_extent_disk_num_bytes(leaf, fi); ram_size = btrfs_file_extent_ram_bytes(leaf, fi); offset = btrfs_file_extent_offset(leaf, fi); - size_left = disk_size; + num_bytes = btrfs_file_extent_num_bytes(leaf, fi); + size_left = num_bytes; + bytenr += offset; if (offset) printf(offset is %Lu\n, offset); @@ -296,7 +299,7 @@ static int copy_one_extent(struct btrfs_root *root, int fd, if (disk_size == 0) return 0; - inbuf = malloc(disk_size); + inbuf = malloc(size_left); if (!inbuf) { fprintf(stderr, No memory\n); return -1; @@ -351,8 +354,8 @@ again: goto again; if (compress == BTRFS_COMPRESS_NONE) { - while (total ram_size) { - done = pwrite(fd, inbuf+total, ram_size-total, + while (total num_bytes) { + done = pwrite(fd, inbuf+total, num_bytes-total, pos+total); if (done 0) { ret = -1; @@ -365,7 +368,7 @@ again: goto out; } - ret = decompress(inbuf, outbuf, disk_size, ram_size, compress); + ret = decompress(inbuf, outbuf, num_bytes, ram_size, compress); if (ret) { num_copies = btrfs_num_copies(root-fs_info-mapping_tree, bytenr, length); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs-progs: remove duplicated code in cmds-restore.c
On Tue, Jul 09, 2013 at 07:49:53PM +0100, Filipe David Borba Manana wrote: The module cmds-restore.c was defining its own next_leaf() function, which did exactly the same as btrfs_next_leaf() from ctree.c. This has been removed by Eric's patch present in the integration branches: Btrfs-progs: remove cut paste btrfs_next_leaf from restore http://www.spinics.net/lists/linux-btrfs/msg24477.html but now Chris has a fix in the master branch, btrfs-restore: deal with NULL returns from read_node_slot https://git.kernel.org/cgit/linux/kernel/git/mason/btrfs-progs.git/commit/?id=194aa4a1bd6447bb545286d0bcb0b0be8204d79f the code of updated next_leaf is not identical to btrfs_next_leaf and I think 'restore' could be more tolerant to partially corrupted structures, so both functions could make sense in the end. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: per-thread, per-call pretty buffer
Sorry to be a pain in the arse at this late stage of the patch, but I've only just noticed. On Wed, Jul 10, 2013 at 04:30:15PM +0200, David Sterba wrote: static char *size_strs[] = { , KB, MB, GB, TB, - PB, EB, ZB, YB}; -char *pretty_sizes(u64 size) + PB, EB}; These are SI (power of 10) prefixes... +void pretty_size_snprintf(u64 size, char *str, size_t str_bytes) { int num_divs = 0; -int pretty_len = 16; float fraction; - char *pretty; + + if (str_bytes == 0) + return; if( size 1024 ){ fraction = size; @@ -1172,13 +1173,13 @@ char *pretty_sizes(u64 size) num_divs ++; } - if (num_divs = ARRAY_SIZE(size_strs)) - return NULL; + if (num_divs = ARRAY_SIZE(size_strs)) { + str[0] = '\0'; + return; + } fraction = (float)last_size / 1024; ... and this is working in IEC (power of 2) units. Can we fix this discrepancy, please? Also note that SI uses k for 10^3, but IEC uses K for 2^10. Just insert an i in the middle of each element of size_strs should deal with the problem. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Charting the inexorable advance of Western syphilisation... --- signature.asc Description: Digital signature
[PATCH v3] Btrfs-progs: fix restore command leaving corrupted files
When there are files that have parts shared with snapshots, the restore command was incorrectly restoring them, as it was not taking into account the offset and number of bytes fields from the file extent item. Besides leaving the recovered file corrupt, it was also inneficient as it read and wrote more data than needed (with each extent copy overwriting portions of the one previously written). The following steps and small C program show how to reproduce this corruption issue: $ mkfs.btrfs -f /dev/sdb3 $ mount /dev/sdb3 /mnt/btrfs $ ./write_file /mnt/btrfs/foobar $ du -b /mnt/btrfs/foobar 1048926 /mnt/btrfs/foobar $ md5sum /mnt/btrfs/foobar f9f778f3a7410c40e4ed104a3a63c3c4 /mnt/btrfs/foobar $ btrfs subvolume snapshot /mnt/btrfs /mnt/btrfs/my_snap $ perl -e 'open($f, +, /dev/btrfs/foobar); seek($f, 4096, 0); print $f \xff; close($f);' $ md5sum /mnt/btrfs/foobar b983fcefd4622a03a78936484c40272b /mnt/btrfs/foobar $ umount /mnt/btrfs $ btrfs restore /dev/sdb3 /tmp/copy $ du -b /tmp/copy/foobar 1048926 /tmp/copy/foobar $ md5sum /tmp/copy/foobar 88db338cbc1c44dfabae083f1ce642d5 /tmp/copy/foobar $ od -t x1 -j 8192 -N 4 /tmp/copy/foobar 002 41 00 00 00 0020004 $ mount /dev/sdb3 /mnt/btrfs $ od -t x1 -j 8192 -N 4 /mnt/btrfs/foobar 002 00 00 00 00 0020004 $ md5sum /mnt/btrfs/foobar b983fcefd4622a03a78936484c40272b /mnt/btrfs/foobar $ cat write_file.c: #include stdio.h #include stdlib.h #include unistd.h #include sys/types.h #include sys/stat.h #include fcntl.h #include string.h #include assert.h #define BUF_SIZE (1 * 1024 * 1024 + 350) int main(int argc, char *argv[]) { int fd; unsigned char *buf = malloc(BUF_SIZE); assert(buf != NULL); if (argc 2) { fprintf(stderr, Use: %s filepath\n, argv[0]); return 1; } fd = open(argv[1], O_CREAT | O_WRONLY | O_TRUNC, S_IRWXU); assert(fd = 0); memset(buf, 0, BUF_SIZE); buf[0] = 65; assert(write(fd, buf, BUF_SIZE) == BUF_SIZE); assert(close(fd) == 0); return 0; } Tested this change with zlib, lzo compression and file sizes larger than 1GiB, and found no regression or other corruption issues (so far at least). Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: updated commit message to include the C preprocessor macros in the C program. V3: updated commit message again to reflect the file size used in the example in the C program macro. cmds-restore.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/cmds-restore.c b/cmds-restore.c index e48df40..9688599 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -272,6 +272,7 @@ static int copy_one_extent(struct btrfs_root *root, int fd, u64 bytenr; u64 ram_size; u64 disk_size; + u64 num_bytes; u64 length; u64 size_left; u64 dev_bytenr; @@ -288,7 +289,9 @@ static int copy_one_extent(struct btrfs_root *root, int fd, disk_size = btrfs_file_extent_disk_num_bytes(leaf, fi); ram_size = btrfs_file_extent_ram_bytes(leaf, fi); offset = btrfs_file_extent_offset(leaf, fi); - size_left = disk_size; + num_bytes = btrfs_file_extent_num_bytes(leaf, fi); + size_left = num_bytes; + bytenr += offset; if (offset) printf(offset is %Lu\n, offset); @@ -296,7 +299,7 @@ static int copy_one_extent(struct btrfs_root *root, int fd, if (disk_size == 0) return 0; - inbuf = malloc(disk_size); + inbuf = malloc(size_left); if (!inbuf) { fprintf(stderr, No memory\n); return -1; @@ -351,8 +354,8 @@ again: goto again; if (compress == BTRFS_COMPRESS_NONE) { - while (total ram_size) { - done = pwrite(fd, inbuf+total, ram_size-total, + while (total num_bytes) { + done = pwrite(fd, inbuf+total, num_bytes-total, pos+total); if (done 0) { ret = -1; @@ -365,7 +368,7 @@ again: goto out; } - ret = decompress(inbuf, outbuf, disk_size, ram_size, compress); + ret = decompress(inbuf, outbuf, num_bytes, ram_size, compress); if (ret) { num_copies = btrfs_num_copies(root-fs_info-mapping_tree, bytenr, length); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs-progs: remove duplicated code in cmds-restore.c
On Wed, Jul 10, 2013 at 5:12 PM, David Sterba dste...@suse.cz wrote: On Tue, Jul 09, 2013 at 07:49:53PM +0100, Filipe David Borba Manana wrote: The module cmds-restore.c was defining its own next_leaf() function, which did exactly the same as btrfs_next_leaf() from ctree.c. This has been removed by Eric's patch present in the integration branches: Btrfs-progs: remove cut paste btrfs_next_leaf from restore http://www.spinics.net/lists/linux-btrfs/msg24477.html Oh, didn't notice that. but now Chris has a fix in the master branch, btrfs-restore: deal with NULL returns from read_node_slot https://git.kernel.org/cgit/linux/kernel/git/mason/btrfs-progs.git/commit/?id=194aa4a1bd6447bb545286d0bcb0b0be8204d79f the code of updated next_leaf is not identical to btrfs_next_leaf and I think 'restore' could be more tolerant to partially corrupted structures, so both functions could make sense in the end. Ok, I understand now why both exist. So please just ignore this patch and the following one (https://patchwork.kernel.org/patch/2825425/). thanks david -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs: stat(2) and /proc/pid/maps returns different devices
On Mon, Jul 08, 2013 at 11:54:46PM +0200, David Sterba wrote: On Thu, Jul 04, 2013 at 01:51:38PM +0400, Andrew Vagin wrote: We are not first who suffer from this problem: https://bugzilla.redhat.com/show_bug.cgi?id=711881 http://marc.info/?l=linux-btrfsm=130074451403261 https://bugzilla.openvz.org/show_bug.cgi?id=2653 And about 2 years ago Mark Fasheh tried to fix this problem: http://thr3ads.net/btrfs-devel/2011/05/2346176-RFC-PATCH-0-2-btrfs-vfs-Return-same-device-in-stat-2-and-proc-pid-maps And basically nobody cared :/ Eric Biederman sugested to not create a new method and use vfs_getattr, but here is a few problems: * fanotify doesn't have dentry, but its fdinfo contains device. * vfs_getattr can fail and which device should be shown in this case? * vfs_getattr gets much more parameters, so here is a question about performance degradation. So I have a question: Can two inodes from different subvolumes have equal inode numbers? Yes, subvolumes are separate inode number spaces. If someone have any suggestions how to fix this problem or any explanation why this is not a problem at all, please write here. The xstat syscall instead of the potentially heavyweight vfs_getattr could fix that, but it's not merged. For suse kernels we've taken the hackish approach of patching fs/proc/task_mmu.c:show_map_vma() (and the nommu variant) and use vfs_getattr only for btrfs. http://kernel.opensuse.org/cgit/kernel-source/tree/patches.suse/btrfs-use-correct-device-for-maps.patch?id=2434fa6ee93a83b117461eb13f24272606677fec Only a temporary and not upstreamable solution, but without it the core packaging tool zypper would not work correctly. As far as I can tell we'll be carrying this patch until a better solution is possible. When that will happen, I don't know. --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] Btrfs-progs: fix restore command leaving corrupted files
When there are files that have parts shared with snapshots, the restore command was incorrectly restoring them, as it was not taking into account the offset and number of bytes fields from the file extent item. Besides leaving the recovered file corrupt, it was also inneficient as it read and wrote more data than needed (with each extent copy overwriting portions of the one previously written). The following steps and small C program show how to reproduce this corruption issue: $ mkfs.btrfs -f /dev/sdb3 $ mount /dev/sdb3 /mnt/btrfs $ ./write_file /mnt/btrfs/foobar $ du -b /mnt/btrfs/foobar 1048926 /mnt/btrfs/foobar $ md5sum /mnt/btrfs/foobar f9f778f3a7410c40e4ed104a3a63c3c4 /mnt/btrfs/foobar $ btrfs subvolume snapshot /mnt/btrfs /mnt/btrfs/my_snap $ perl -e 'open($f, +, /mnt/btrfs/foobar); seek($f, 4096, 0); print $f \xff; close($f);' $ md5sum /mnt/btrfs/foobar b983fcefd4622a03a78936484c40272b /mnt/btrfs/foobar $ umount /mnt/btrfs $ btrfs restore /dev/sdb3 /tmp/copy $ du -b /tmp/copy/foobar 1048926 /tmp/copy/foobar $ md5sum /tmp/copy/foobar 88db338cbc1c44dfabae083f1ce642d5 /tmp/copy/foobar $ od -t x1 -j 8192 -N 4 /tmp/copy/foobar 002 41 00 00 00 0020004 $ mount /dev/sdb3 /mnt/btrfs $ od -t x1 -j 8192 -N 4 /mnt/btrfs/foobar 002 00 00 00 00 0020004 $ md5sum /mnt/btrfs/foobar b983fcefd4622a03a78936484c40272b /mnt/btrfs/foobar $ cat write_file.c: #include stdio.h #include stdlib.h #include unistd.h #include sys/types.h #include sys/stat.h #include fcntl.h #include string.h #include assert.h #define BUF_SIZE (1 * 1024 * 1024 + 350) int main(int argc, char *argv[]) { int fd; unsigned char *buf = malloc(BUF_SIZE); assert(buf != NULL); if (argc 2) { fprintf(stderr, Use: %s filepath\n, argv[0]); return 1; } fd = open(argv[1], O_CREAT | O_WRONLY | O_TRUNC, S_IRWXU); assert(fd = 0); memset(buf, 0, BUF_SIZE); buf[0] = 65; assert(write(fd, buf, BUF_SIZE) == BUF_SIZE); assert(close(fd) == 0); return 0; } Tested this change with zlib, lzo compression and file sizes larger than 1GiB, and found no regression or other corruption issues (so far at least). Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: updated commit message to include the C preprocessor macros in the C program. V3: updated commit message again to reflect the file size used in the example in the C program macro. V4: fixed wrong path in commit message in the perl command line. cmds-restore.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/cmds-restore.c b/cmds-restore.c index e48df40..9688599 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -272,6 +272,7 @@ static int copy_one_extent(struct btrfs_root *root, int fd, u64 bytenr; u64 ram_size; u64 disk_size; + u64 num_bytes; u64 length; u64 size_left; u64 dev_bytenr; @@ -288,7 +289,9 @@ static int copy_one_extent(struct btrfs_root *root, int fd, disk_size = btrfs_file_extent_disk_num_bytes(leaf, fi); ram_size = btrfs_file_extent_ram_bytes(leaf, fi); offset = btrfs_file_extent_offset(leaf, fi); - size_left = disk_size; + num_bytes = btrfs_file_extent_num_bytes(leaf, fi); + size_left = num_bytes; + bytenr += offset; if (offset) printf(offset is %Lu\n, offset); @@ -296,7 +299,7 @@ static int copy_one_extent(struct btrfs_root *root, int fd, if (disk_size == 0) return 0; - inbuf = malloc(disk_size); + inbuf = malloc(size_left); if (!inbuf) { fprintf(stderr, No memory\n); return -1; @@ -351,8 +354,8 @@ again: goto again; if (compress == BTRFS_COMPRESS_NONE) { - while (total ram_size) { - done = pwrite(fd, inbuf+total, ram_size-total, + while (total num_bytes) { + done = pwrite(fd, inbuf+total, num_bytes-total, pos+total); if (done 0) { ret = -1; @@ -365,7 +368,7 @@ again: goto out; } - ret = decompress(inbuf, outbuf, disk_size, ram_size, compress); + ret = decompress(inbuf, outbuf, num_bytes, ram_size, compress); if (ret) { num_copies = btrfs_num_copies(root-fs_info-mapping_tree, bytenr, length); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: per-thread, per-call pretty buffer
On Wed, Jul 10, 2013 at 05:16:23PM +0100, Hugo Mills wrote: Sorry to be a pain in the arse at this late stage of the patch, but I've only just noticed. No worries, good to have this one fixed. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: use IEC units for sizes
As implemented now, we use 1024 based units but reporting 1000 based, let's finally fix that and add optional unit bases later. Signed-off-by: David Sterba dste...@suse.cz --- utils.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/utils.c b/utils.c index bce06f1..2e24cb0 100644 --- a/utils.c +++ b/utils.c @@ -1173,8 +1173,7 @@ out: return ret; } -static char *size_strs[] = { , KB, MB, GB, TB, - PB, EB}; +static char *size_strs[] = { , KiB, MiB, GiB, TiB, PiB, EiB}; void pretty_size_snprintf(u64 size, char *str, size_t str_bytes) { int num_divs = 0; -- 1.8.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs: stat(2) and /proc/pid/maps returns different devices
On Wed, Jul 10, 2013 at 09:31:05AM -0700, Mark Fasheh wrote: As far as I can tell we'll be carrying this patch until a better solution is possible. When that will happen, I don't know. --Mark Well, what do I get when I pretend I don't care any more? The little voice in my head says keep plugging away. Here's another attempt at fixing this problem in a sane manner. Basically, this time we're adding a flag to s_flags which btrfs sets. Proc will see the flag and call -getattr(). This compiles, but it needs testing (which I will get to soon). It still has a bunch of problems in my honest opinion but maybe if we get something acceptable upstream we can work from there. Also, as Andrew pointed out there's more than one place which is return different device than from stat(2) so I probably need to update more sites to deal with this. Does anyone see a problem with this approach? --Mark -- Mark Fasheh From: Mark Fasheh mfas...@suse.de vfs: allow /proc/PID/maps to get device from stat stat(2) on btrfs returns a custom device, but proc uses s_dev from the super block. This causes problems because software (and users) are not expecting the kernel to return different devices from these calls. This patch fixes the problem by adding a new superblock flag, MS_PROC_USE_ST. When the proc code sees this flag, it will call the file systems -getattr() method to extract a device as opposed to getting it directly from s_dev. Signed-off-by: Mark Fasheh mfas...@suse.de --- fs/btrfs/super.c| 1 + fs/proc/generic.c | 30 ++ fs/proc/internal.h | 1 + fs/proc/task_mmu.c | 2 +- fs/proc/task_nommu.c| 2 +- include/uapi/linux/fs.h | 1 + 6 files changed, 35 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index f0857e0..67be4ef 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -822,6 +822,7 @@ static int btrfs_fill_super(struct super_block *sb, sb-s_flags |= MS_POSIXACL; #endif sb-s_flags |= MS_I_VERSION; + sb-s_flags |= MS_PROC_USE_ST; err = open_ctree(sb, fs_devices, (char *)data); if (err) { printk(btrfs: open_ctree failed\n); diff --git a/fs/proc/generic.c b/fs/proc/generic.c index a2596af..eca8195 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -24,6 +24,8 @@ #include linux/spinlock.h #include linux/completion.h #include asm/uaccess.h +#include linux/fs.h +#include linux/dcache.h #include internal.h @@ -637,3 +639,31 @@ void *PDE_DATA(const struct inode *inode) return __PDE_DATA(inode); } EXPORT_SYMBOL(PDE_DATA); + +static dev_t proc_get_dev_from_stat(struct inode *inode) +{ + struct dentry *dentry = d_find_any_alias(inode); + struct kstat kstat; + + if (!dentry) + goto out_error; + + if (inode-i_op-getattr(NULL, dentry, kstat)) + goto out_error_dput; + + dput(dentry); + return kstat.dev; + +out_error_dput: + dput(dentry); +out_error: + return inode-i_sb-s_dev; +} + +dev_t proc_get_map_dev(struct inode *inode) +{ + if (inode-i_sb-s_flags MS_PROC_USE_ST) + return proc_get_dev_from_stat(inode); + else + return inode-i_sb-s_dev; +} diff --git a/fs/proc/internal.h b/fs/proc/internal.h index d600fb0..24808b0 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -192,6 +192,7 @@ static inline struct proc_dir_entry *pde_get(struct proc_dir_entry *pde) return pde; } extern void pde_put(struct proc_dir_entry *); +dev_t proc_get_map_dev(struct inode *inode); /* * inode.c diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 3e636d8..9226600 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -272,7 +272,7 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid) if (file) { struct inode *inode = file_inode(vma-vm_file); - dev = inode-i_sb-s_dev; + dev = proc_get_map_dev(inode); ino = inode-i_ino; pgoff = ((loff_t)vma-vm_pgoff) PAGE_SHIFT; } diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index 56123a6..892d84a 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -150,7 +150,7 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma, if (file) { struct inode *inode = file_inode(vma-vm_file); - dev = inode-i_sb-s_dev; + dev = proc_get_map_dev(inode); ino = inode-i_ino; pgoff = (loff_t)vma-vm_pgoff PAGE_SHIFT; } diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index a4ed56c..b4173a3 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -88,6 +88,7 @@ struct inodes_stat_t { #define MS_STRICTATIME (124) /* Always perform atime updates */ /* These sb flags are internal to the kernel */ +#define MS_PROC_USE_ST
Re: btrfs: stat(2) and /proc/pid/maps returns different devices
On Wed, Jul 10, 2013 at 10:45:45AM -0700, Mark Fasheh wrote: Well, what do I get when I pretend I don't care any more? The little voice in my head says keep plugging away. Here's another attempt at fixing this problem in a sane manner. Basically, this time we're adding a flag to s_flags which btrfs sets. Proc will see the flag and call -getattr(). This compiles, but it needs testing (which I will get to soon). It still has a bunch of problems in my honest opinion but maybe if we get something acceptable upstream we can work from there. Also, as Andrew pointed out there's more than one place which is return different device than from stat(2) so I probably need to update more sites to deal with this. Does anyone see a problem with this approach? The approach looks ok to me, the implementation is internal to vfs and fairly minimal. The bit that bothers me is the name of the flag, it's completely unobvious what it means. There are some differences to the linked suse patch: +static dev_t proc_get_dev_from_stat(struct inode *inode) +{ + struct dentry *dentry = d_find_any_alias(inode); This does the dentry - inode mapping, while originally there was file-f_path passing just the inode to proc_get_dev_from_stat unnecessarily drops the available information that's about to be retrieved again. + struct kstat kstat; + + if (!dentry) + goto out_error; + if (inode-i_op-getattr(NULL, dentry, kstat)) The suse patch calls vfs_getattr that in turn calls security_inode_getattr(path-mnt, path-dentry); That would be missing. Plus checks for presence of the -getattr operation. Though this is superfluous with btrfs, I suggest to use vfs_getattr here, which will fix all of the above. + goto out_error_dput; + + dput(dentry); + return kstat.dev; + +out_error_dput: + dput(dentry); +out_error: + return inode-i_sb-s_dev; +} -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: lz4 status?
On Sun, Jun 30, 2013 at 12:35:09PM -0500, Mitch Harder wrote: There's been a parallel effort to incorporate a general set of lz4 patches in the kernel. I see these patches are currently queued up in the linux-next tree, so we may see them in the 3.11 kernel. The patches are now merged into 3.11. It looks like lz4 and lz4hc will be provided. Regarding HC mode, there are some core compression code changes needed in order to fully utilize the its potential, namely larger chunk size that's compressed at a time. There was some tiny yet measurable gain of HC against ordinary mode compared on current 4k-at-a-time implementation, but the space savings did not justify the speed drop of HC mode. I can't say if the patchset will be ready for 3.12 though. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs-progs: restore can now recover file xattrs
This change adds a new option to the restore command, named -x, that makes it restore file extented attributes too. This is an optional behaviour and it's disabled by default. Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- cmds-restore.c | 113 +++- 1 file changed, 112 insertions(+), 1 deletion(-) diff --git a/cmds-restore.c b/cmds-restore.c index e48df40..0f6169e 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -30,6 +30,8 @@ #include lzo/lzoconf.h #include lzo/lzo1x.h #include zlib.h +#include sys/types.h +#include attr/xattr.h #include ctree.h #include disk-io.h @@ -47,6 +49,7 @@ static int get_snaps = 0; static int verbose = 0; static int ignore_errors = 0; static int overwrite = 0; +static int get_xattrs = 0; #define LZO_LEN 4 #define PAGE_CACHE_SIZE 4096 @@ -412,6 +415,105 @@ again: } +static int set_file_xattrs(struct btrfs_root *root, u64 inode, + int fd, const char *file_name) +{ + struct btrfs_key key; + struct btrfs_path *path; + struct extent_buffer *leaf; + struct btrfs_dir_item *di; + u32 name_len = 0; + u32 data_len = 0; + u32 len = 0; + char *name = NULL; + char *data = NULL; + int ret = 0; + + key.objectid = inode; + key.type = BTRFS_XATTR_ITEM_KEY; + key.offset = 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + ret = btrfs_search_slot(NULL, root, key, path, 0, 0); + if (ret 0) + goto out; + + leaf = path-nodes[0]; + while (1) { + if (path-slots[0] = btrfs_header_nritems(leaf)) { + do { + ret = next_leaf(root, path); + if (ret 0) { + fprintf(stderr, Error searching for +extended attributes: %d\n, + ret); + goto out; + } else if (ret) { + /* No more leaves to search */ + goto out; + } + leaf = path-nodes[0]; + } while (!leaf); + continue; + } + + btrfs_item_key_to_cpu(leaf, key, path-slots[0]); + + if (key.type != BTRFS_XATTR_ITEM_KEY || key.objectid != inode) + break; + + di = btrfs_item_ptr(leaf, path-slots[0], + struct btrfs_dir_item); + + len = btrfs_dir_name_len(leaf, di); + if (len name_len) { + free(name); + name = (char *) malloc(len + 1); + if (!name) { + ret = -ENOMEM; + goto out; + } + } + read_extent_buffer(leaf, name, (unsigned long)(di + 1), len); + name[len] = '\0'; + name_len = len; + + len = btrfs_dir_data_len(leaf, di); + if (len data_len) { + free(data); + data = (char *) malloc(len); + if (!data) { + ret = -ENOMEM; + goto out; + } + } + read_extent_buffer(leaf, data, + (unsigned long)(di + 1) + name_len, len); + data_len = len; + + if (fsetxattr(fd, name, data, data_len, 0)) { + int err = errno; + + fprintf(stderr, Error setting extended attribute %s +on file %s: %s, name, file_name, + strerror(err)); + } + + path-slots[0]++; + } + ret = 0; +out: + btrfs_free_path(path); + free(name); + free(data); + + return ret; +} + + static int copy_file(struct btrfs_root *root, int fd, struct btrfs_key *key, const char *file) { @@ -535,6 +637,11 @@ set_size: if (ret) return ret; } + if (get_xattrs) { + ret = set_file_xattrs(root, key-objectid, fd, file); + if (ret) + return ret; + } return 0; } @@ -966,6 +1073,7 @@ const char * const cmd_restore_usage[] = { Try to restore files from a damaged filesystem (unmounted), , -s get snapshots, + -x get extended attributes, -v verbose, -i ignore errors, -o overwrite, @@
[PATCH v2] Btrfs-progs: restore can now recover file xattrs
This change adds a new option to the restore command, named -x, that makes it restore file extented attributes too. This is an optional behaviour and it's disabled by default. Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: added missing new line at end of error message. cmds-restore.c | 113 +++- 1 file changed, 112 insertions(+), 1 deletion(-) diff --git a/cmds-restore.c b/cmds-restore.c index e48df40..cb8754a 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -30,6 +30,8 @@ #include lzo/lzoconf.h #include lzo/lzo1x.h #include zlib.h +#include sys/types.h +#include attr/xattr.h #include ctree.h #include disk-io.h @@ -47,6 +49,7 @@ static int get_snaps = 0; static int verbose = 0; static int ignore_errors = 0; static int overwrite = 0; +static int get_xattrs = 0; #define LZO_LEN 4 #define PAGE_CACHE_SIZE 4096 @@ -412,6 +415,105 @@ again: } +static int set_file_xattrs(struct btrfs_root *root, u64 inode, + int fd, const char *file_name) +{ + struct btrfs_key key; + struct btrfs_path *path; + struct extent_buffer *leaf; + struct btrfs_dir_item *di; + u32 name_len = 0; + u32 data_len = 0; + u32 len = 0; + char *name = NULL; + char *data = NULL; + int ret = 0; + + key.objectid = inode; + key.type = BTRFS_XATTR_ITEM_KEY; + key.offset = 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + ret = btrfs_search_slot(NULL, root, key, path, 0, 0); + if (ret 0) + goto out; + + leaf = path-nodes[0]; + while (1) { + if (path-slots[0] = btrfs_header_nritems(leaf)) { + do { + ret = next_leaf(root, path); + if (ret 0) { + fprintf(stderr, Error searching for +extended attributes: %d\n, + ret); + goto out; + } else if (ret) { + /* No more leaves to search */ + goto out; + } + leaf = path-nodes[0]; + } while (!leaf); + continue; + } + + btrfs_item_key_to_cpu(leaf, key, path-slots[0]); + + if (key.type != BTRFS_XATTR_ITEM_KEY || key.objectid != inode) + break; + + di = btrfs_item_ptr(leaf, path-slots[0], + struct btrfs_dir_item); + + len = btrfs_dir_name_len(leaf, di); + if (len name_len) { + free(name); + name = (char *) malloc(len + 1); + if (!name) { + ret = -ENOMEM; + goto out; + } + } + read_extent_buffer(leaf, name, (unsigned long)(di + 1), len); + name[len] = '\0'; + name_len = len; + + len = btrfs_dir_data_len(leaf, di); + if (len data_len) { + free(data); + data = (char *) malloc(len); + if (!data) { + ret = -ENOMEM; + goto out; + } + } + read_extent_buffer(leaf, data, + (unsigned long)(di + 1) + name_len, len); + data_len = len; + + if (fsetxattr(fd, name, data, data_len, 0)) { + int err = errno; + + fprintf(stderr, Error setting extended attribute %s +on file %s: %s\n, name, file_name, + strerror(err)); + } + + path-slots[0]++; + } + ret = 0; +out: + btrfs_free_path(path); + free(name); + free(data); + + return ret; +} + + static int copy_file(struct btrfs_root *root, int fd, struct btrfs_key *key, const char *file) { @@ -535,6 +637,11 @@ set_size: if (ret) return ret; } + if (get_xattrs) { + ret = set_file_xattrs(root, key-objectid, fd, file); + if (ret) + return ret; + } return 0; } @@ -966,6 +1073,7 @@ const char * const cmd_restore_usage[] = { Try to restore files from a damaged filesystem (unmounted), , -s get snapshots, + -x get extended attributes, -v verbose, -i
[PATCH v3] Btrfs-progs: restore can now recover file xattrs
This change adds a new option to the restore command, named -x, that makes it restore file extented attributes too. This is an optional behaviour and it's disabled by default. Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: added missing new line at end of error message. V3: return with 0 when there are no more leaves. cmds-restore.c | 113 +++- 1 file changed, 112 insertions(+), 1 deletion(-) diff --git a/cmds-restore.c b/cmds-restore.c index e48df40..32ba89d 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -30,6 +30,8 @@ #include lzo/lzoconf.h #include lzo/lzo1x.h #include zlib.h +#include sys/types.h +#include attr/xattr.h #include ctree.h #include disk-io.h @@ -47,6 +49,7 @@ static int get_snaps = 0; static int verbose = 0; static int ignore_errors = 0; static int overwrite = 0; +static int get_xattrs = 0; #define LZO_LEN 4 #define PAGE_CACHE_SIZE 4096 @@ -412,6 +415,105 @@ again: } +static int set_file_xattrs(struct btrfs_root *root, u64 inode, + int fd, const char *file_name) +{ + struct btrfs_key key; + struct btrfs_path *path; + struct extent_buffer *leaf; + struct btrfs_dir_item *di; + u32 name_len = 0; + u32 data_len = 0; + u32 len = 0; + char *name = NULL; + char *data = NULL; + int ret = 0; + + key.objectid = inode; + key.type = BTRFS_XATTR_ITEM_KEY; + key.offset = 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + ret = btrfs_search_slot(NULL, root, key, path, 0, 0); + if (ret 0) + goto out; + + leaf = path-nodes[0]; + while (1) { + if (path-slots[0] = btrfs_header_nritems(leaf)) { + do { + ret = next_leaf(root, path); + if (ret 0) { + fprintf(stderr, Error searching for +extended attributes: %d\n, + ret); + goto out; + } else if (ret) { + /* No more leaves to search */ + ret = 0; + goto out; + } + leaf = path-nodes[0]; + } while (!leaf); + continue; + } + + btrfs_item_key_to_cpu(leaf, key, path-slots[0]); + + if (key.type != BTRFS_XATTR_ITEM_KEY || key.objectid != inode) + break; + + di = btrfs_item_ptr(leaf, path-slots[0], + struct btrfs_dir_item); + + len = btrfs_dir_name_len(leaf, di); + if (len name_len) { + free(name); + name = (char *) malloc(len + 1); + if (!name) { + ret = -ENOMEM; + goto out; + } + } + read_extent_buffer(leaf, name, (unsigned long)(di + 1), len); + name[len] = '\0'; + name_len = len; + + len = btrfs_dir_data_len(leaf, di); + if (len data_len) { + free(data); + data = (char *) malloc(len); + if (!data) { + ret = -ENOMEM; + goto out; + } + } + read_extent_buffer(leaf, data, + (unsigned long)(di + 1) + name_len, len); + data_len = len; + + if (fsetxattr(fd, name, data, data_len, 0)) { + int err = errno; + + fprintf(stderr, Error setting extended attribute %s +on file %s: %s\n, name, file_name, + strerror(err)); + } + + path-slots[0]++; + } +out: + btrfs_free_path(path); + free(name); + free(data); + + return ret; +} + + static int copy_file(struct btrfs_root *root, int fd, struct btrfs_key *key, const char *file) { @@ -535,6 +637,11 @@ set_size: if (ret) return ret; } + if (get_xattrs) { + ret = set_file_xattrs(root, key-objectid, fd, file); + if (ret) + return ret; + } return 0; } @@ -966,6 +1073,7 @@ const char * const cmd_restore_usage[] = { Try to restore files from a damaged filesystem (unmounted), , -s get snapshots, + -x
[PATCH v4] Btrfs-progs: restore can now recover file xattrs
This change adds a new option to the restore command, named -x, that makes it restore file extented attributes too. This is an optional behaviour and it's disabled by default. Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: added missing new line at end of error message. V3: return with 0 when there are no more leaves. V4: fix back return value to 0 when no more xattrs are found. cmds-restore.c | 114 +++- 1 file changed, 113 insertions(+), 1 deletion(-) diff --git a/cmds-restore.c b/cmds-restore.c index e48df40..5199476 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -30,6 +30,8 @@ #include lzo/lzoconf.h #include lzo/lzo1x.h #include zlib.h +#include sys/types.h +#include attr/xattr.h #include ctree.h #include disk-io.h @@ -47,6 +49,7 @@ static int get_snaps = 0; static int verbose = 0; static int ignore_errors = 0; static int overwrite = 0; +static int get_xattrs = 0; #define LZO_LEN 4 #define PAGE_CACHE_SIZE 4096 @@ -412,6 +415,106 @@ again: } +static int set_file_xattrs(struct btrfs_root *root, u64 inode, + int fd, const char *file_name) +{ + struct btrfs_key key; + struct btrfs_path *path; + struct extent_buffer *leaf; + struct btrfs_dir_item *di; + u32 name_len = 0; + u32 data_len = 0; + u32 len = 0; + char *name = NULL; + char *data = NULL; + int ret = 0; + + key.objectid = inode; + key.type = BTRFS_XATTR_ITEM_KEY; + key.offset = 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + ret = btrfs_search_slot(NULL, root, key, path, 0, 0); + if (ret 0) + goto out; + + leaf = path-nodes[0]; + while (1) { + if (path-slots[0] = btrfs_header_nritems(leaf)) { + do { + ret = next_leaf(root, path); + if (ret 0) { + fprintf(stderr, Error searching for +extended attributes: %d\n, + ret); + goto out; + } else if (ret) { + /* No more leaves to search */ + ret = 0; + goto out; + } + leaf = path-nodes[0]; + } while (!leaf); + continue; + } + + btrfs_item_key_to_cpu(leaf, key, path-slots[0]); + + if (key.type != BTRFS_XATTR_ITEM_KEY || key.objectid != inode) + break; + + di = btrfs_item_ptr(leaf, path-slots[0], + struct btrfs_dir_item); + + len = btrfs_dir_name_len(leaf, di); + if (len name_len) { + free(name); + name = (char *) malloc(len + 1); + if (!name) { + ret = -ENOMEM; + goto out; + } + } + read_extent_buffer(leaf, name, (unsigned long)(di + 1), len); + name[len] = '\0'; + name_len = len; + + len = btrfs_dir_data_len(leaf, di); + if (len data_len) { + free(data); + data = (char *) malloc(len); + if (!data) { + ret = -ENOMEM; + goto out; + } + } + read_extent_buffer(leaf, data, + (unsigned long)(di + 1) + name_len, len); + data_len = len; + + if (fsetxattr(fd, name, data, data_len, 0)) { + int err = errno; + + fprintf(stderr, Error setting extended attribute %s +on file %s: %s\n, name, file_name, + strerror(err)); + } + + path-slots[0]++; + } + ret = 0; +out: + btrfs_free_path(path); + free(name); + free(data); + + return ret; +} + + static int copy_file(struct btrfs_root *root, int fd, struct btrfs_key *key, const char *file) { @@ -535,6 +638,11 @@ set_size: if (ret) return ret; } + if (get_xattrs) { + ret = set_file_xattrs(root, key-objectid, fd, file); + if (ret) + return ret; + } return 0; } @@ -966,6 +1074,7 @@ const char * const cmd_restore_usage[] = { Try to restore files from a damaged filesystem
[PATCH 4/5] Btrfs: batch the extent state operation in the end io handle of the read page
It is unnecessary to unlock the extent by the page size, we can do it in batches, it makes the random read be faster by ~6%. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/extent_io.c | 70 ++-- 1 file changed, 40 insertions(+), 30 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 9f4dedf..8f95418 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -762,15 +762,6 @@ static void cache_state(struct extent_state *state, } } -static void uncache_state(struct extent_state **cached_ptr) -{ - if (cached_ptr (*cached_ptr)) { - struct extent_state *state = *cached_ptr; - *cached_ptr = NULL; - free_extent_state(state); - } -} - /* * set some bits on a range in the tree. This may require allocations or * sleeping, so the gfp mask is used to indicate what is allowed. @@ -2395,6 +2386,18 @@ static void end_bio_extent_writepage(struct bio *bio, int err) bio_put(bio); } +static void +endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 len, + int uptodate) +{ + struct extent_state *cached = NULL; + u64 end = start + len - 1; + + if (uptodate tree-track_uptodate) + set_extent_uptodate(tree, start, end, cached, GFP_ATOMIC); + unlock_extent_cached(tree, start, end, cached, GFP_ATOMIC); +} + /* * after a readpage IO is done, we need to: * clear the uptodate bits on error @@ -2417,6 +2420,8 @@ static void end_bio_extent_readpage(struct bio *bio, int err) u64 start; u64 end; u64 len; + u64 extent_start = 0; + u64 extent_len = 0; int mirror; int ret; @@ -2425,8 +2430,6 @@ static void end_bio_extent_readpage(struct bio *bio, int err) do { struct page *page = bvec-bv_page; - struct extent_state *cached = NULL; - struct extent_state *state; struct inode *inode = page-mapping-host; pr_debug(end_bio_extent_readpage: bi_sector=%llu, err=%d, @@ -2452,17 +2455,6 @@ static void end_bio_extent_readpage(struct bio *bio, int err) if (++bvec = bvec_end) prefetchw(bvec-bv_page-flags); - spin_lock(tree-lock); - state = find_first_extent_bit_state(tree, start, EXTENT_LOCKED); - if (likely(state state-start == start)) { - /* -* take a reference on the state, unlock will drop -* the ref -*/ - cache_state(state, cached); - } - spin_unlock(tree-lock); - mirror = io_bio-mirror_num; if (likely(uptodate tree-ops tree-ops-readpage_end_io_hook)) { @@ -2501,18 +2493,11 @@ static void end_bio_extent_readpage(struct bio *bio, int err) test_bit(BIO_UPTODATE, bio-bi_flags); if (err) uptodate = 0; - uncache_state(cached); continue; } } readpage_ok: - if (uptodate tree-track_uptodate) { - set_extent_uptodate(tree, start, end, cached, - GFP_ATOMIC); - } - unlock_extent_cached(tree, start, end, cached, GFP_ATOMIC); - - if (uptodate) { + if (likely(uptodate)) { loff_t i_size = i_size_read(inode); pgoff_t end_index = i_size PAGE_CACHE_SHIFT; unsigned offset; @@ -2528,8 +2513,33 @@ readpage_ok: } unlock_page(page); offset += len; + + if (unlikely(!uptodate)) { + if (extent_len) { + endio_readpage_release_extent(tree, + extent_start, + extent_len, 1); + extent_start = 0; + extent_len = 0; + } + endio_readpage_release_extent(tree, start, + end - start + 1, 0); + } else if (!extent_len) { + extent_start = start; + extent_len = end + 1 - start; + } else if (extent_start + extent_len == start) { + extent_len += end + 1 - start; + } else { + endio_readpage_release_extent(tree, extent_start, +
[PATCH 2/5] Btrfs: add branch prediction hints in the read page end IO function
Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/extent_io.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4bfbcc5..c9b28cf 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2503,7 +2503,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err) spin_lock(tree-lock); state = find_first_extent_bit_state(tree, start, EXTENT_LOCKED); - if (state state-start == start) { + if (likely(state state-start == start)) { /* * take a reference on the state, unlock will drop * the ref @@ -2513,7 +2513,8 @@ static void end_bio_extent_readpage(struct bio *bio, int err) spin_unlock(tree-lock); mirror = io_bio-mirror_num; - if (uptodate tree-ops tree-ops-readpage_end_io_hook) { + if (likely(uptodate tree-ops + tree-ops-readpage_end_io_hook)) { ret = tree-ops-readpage_end_io_hook(page, start, end, state, mirror); if (ret) @@ -2522,12 +2523,15 @@ static void end_bio_extent_readpage(struct bio *bio, int err) clean_io_failure(start, page); } - if (!uptodate tree-ops tree-ops-readpage_io_failed_hook) { + if (likely(uptodate)) + goto readpage_ok; + + if (tree-ops tree-ops-readpage_io_failed_hook) { ret = tree-ops-readpage_io_failed_hook(page, mirror); if (!ret !err test_bit(BIO_UPTODATE, bio-bi_flags)) uptodate = 1; - } else if (!uptodate) { + } else { /* * The generic bio_readpage_error handles errors the * following way: If possible, new read requests are @@ -2548,7 +2552,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err) continue; } } - +readpage_ok: if (uptodate tree-track_uptodate) { set_extent_uptodate(tree, start, end, cached, GFP_ATOMIC); -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] Btrfs: don't cache the csum value into the extent state tree
Before applying this patch, we cached the csum value into the extent state tree when reading some data from the disk, this operation increased the lock contention of the state tree. Now, we just store the csum value into the bio structure or other unshared structure, so we can reduce the lock contention. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/btrfs_inode.h | 21 + fs/btrfs/ctree.h | 4 +- fs/btrfs/disk-io.c | 5 ++- fs/btrfs/extent_io.c | 113 - fs/btrfs/extent_io.h | 10 ++--- fs/btrfs/file-item.c | 81 +++ fs/btrfs/inode.c | 85 +++-- fs/btrfs/volumes.h | 7 +++ 8 files changed, 163 insertions(+), 163 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 08b286b..d0ae226 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -218,6 +218,27 @@ static inline int btrfs_inode_in_log(struct inode *inode, u64 generation) return 0; } +struct btrfs_dio_private { + struct inode *inode; + u64 logical_offset; + u64 disk_bytenr; + u64 bytes; + void *private; + + /* number of bios pending for this dio */ + atomic_t pending_bios; + + /* IO errors */ + int errors; + + /* orig_bio is our btrfs_io_bio */ + struct bio *orig_bio; + + /* dio_bio came from fs/direct-io.c */ + struct bio *dio_bio; + u8 csum[0]; +}; + /* * Disable DIO read nolock optimization, so new dio readers will be forced * to grab i_mutex. It is used to avoid the endless truncate due to diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index f5b4b72..d52ec5d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3556,12 +3556,14 @@ int btrfs_find_name_in_ext_backref(struct btrfs_path *path, struct btrfs_inode_extref **extref_ret); /* file-item.c */ +struct btrfs_dio_private; int btrfs_del_csums(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 len); int btrfs_lookup_bio_sums(struct btrfs_root *root, struct inode *inode, struct bio *bio, u32 *dst); int btrfs_lookup_bio_sums_dio(struct btrfs_root *root, struct inode *inode, - struct bio *bio, u64 logical_offset); + struct btrfs_dio_private *dip, struct bio *bio, + u64 logical_offset); int btrfs_insert_file_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 objectid, u64 pos, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index dfe6864..290b83f 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -576,8 +576,9 @@ static noinline int check_leaf(struct btrfs_root *root, return 0; } -static int btree_readpage_end_io_hook(struct page *page, u64 start, u64 end, - struct extent_state *state, int mirror) +static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio, + u64 phy_offset, struct page *page, + u64 start, u64 end, int mirror) { struct extent_io_tree *tree; u64 found_start; diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c9b28cf..9f4dedf 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1837,64 +1837,6 @@ out: return ret; } -void extent_cache_csums_dio(struct extent_io_tree *tree, u64 start, u32 csums[], - int count) -{ - struct rb_node *node; - struct extent_state *state; - - spin_lock(tree-lock); - /* -* this search will find all the extents that end after -* our range starts. -*/ - node = tree_search(tree, start); - BUG_ON(!node); - - state = rb_entry(node, struct extent_state, rb_node); - BUG_ON(state-start != start); - - while (count) { - state-private = *csums++; - count--; - state = next_state(state); - } - spin_unlock(tree-lock); -} - -static inline u64 __btrfs_get_bio_offset(struct bio *bio, int bio_index) -{ - struct bio_vec *bvec = bio-bi_io_vec + bio_index; - - return page_offset(bvec-bv_page) + bvec-bv_offset; -} - -void extent_cache_csums(struct extent_io_tree *tree, struct bio *bio, int bio_index, - u32 csums[], int count) -{ - struct rb_node *node; - struct extent_state *state = NULL; - u64 start; - - spin_lock(tree-lock); - do { - start = __btrfs_get_bio_offset(bio, bio_index); - if (state == NULL || state-start != start) { - node = tree_search(tree, start); - BUG_ON(!node); - - state = rb_entry(node, struct
[PATCH 1/5] Btrfs: remove unnecessary argument of bio_readpage_error()
Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/extent_io.c | 25 +++-- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index f8586a9..4bfbcc5 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2202,8 +2202,7 @@ out: */ static int bio_readpage_error(struct bio *failed_bio, struct page *page, - u64 start, u64 end, int failed_mirror, - struct extent_state *state) + u64 start, u64 end, int failed_mirror) { struct io_failure_record *failrec = NULL; u64 private; @@ -2212,6 +2211,7 @@ static int bio_readpage_error(struct bio *failed_bio, struct page *page, struct extent_io_tree *failure_tree = BTRFS_I(inode)-io_failure_tree; struct extent_io_tree *tree = BTRFS_I(inode)-io_tree; struct extent_map_tree *em_tree = BTRFS_I(inode)-extent_tree; + struct extent_state *state; struct bio *bio; int num_copies; int ret; @@ -2297,21 +2297,18 @@ static int bio_readpage_error(struct bio *failed_bio, struct page *page, * matter what the error is, it is very likely to persist. */ pr_debug(bio_readpage_error: cannot repair, num_copies == 1. -state=%p, num_copies=%d, next_mirror %d, -failed_mirror %d\n, state, num_copies, -failrec-this_mirror, failed_mirror); +num_copies=%d, next_mirror %d, failed_mirror %d\n, +num_copies, failrec-this_mirror, failed_mirror); free_io_failure(inode, failrec, 0); return -EIO; } - if (!state) { - spin_lock(tree-lock); - state = find_first_extent_bit_state(tree, failrec-start, - EXTENT_LOCKED); - if (state state-start != failrec-start) - state = NULL; - spin_unlock(tree-lock); - } + spin_lock(tree-lock); + state = find_first_extent_bit_state(tree, failrec-start, + EXTENT_LOCKED); + if (state state-start != failrec-start) + state = NULL; + spin_unlock(tree-lock); /* * there are two premises: @@ -2541,7 +2538,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err) * can't handle the error it will return -EIO and we * remain responsible for that page. */ - ret = bio_readpage_error(bio, page, start, end, mirror, NULL); + ret = bio_readpage_error(bio, page, start, end, mirror); if (ret == 0) { uptodate = test_bit(BIO_UPTODATE, bio-bi_flags); -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html