Re: [PATCH] Btrfs/send: sparse and pre-allocated file support for, btrfs-send mechanism
Hi Chen, with all due respect, what do you mean by I see and OK for users? The semantics of the fallocate API with/without the FALLOC_FL_PUNCH_HOLE flag is not defined by me or you. As far as I understand, the file system *must* adhere to the semantics of this API, as defined elsewhere. Looking, for example, here: http://man7.org/linux/man-pages/man2/fallocate.2.html Allocating disk space The default operation (i.e., mode is zero) of fallocate() allocates and initializes to zero the disk space within the range specified by offset and len. Deallocating file space Specifying the FALLOC_FL_PUNCH_HOLE flag (available since Linux 2.6.38) in mode deallocates space (i.e., creates a hole) in the byte range starting at offset and continuing for len bytes. Within the specified range, partial file system blocks are zeroed, and whole file system blocks are removed from the file. These are clearly two different modes of operation, and I don't think you or me can decide otherwise, at this point. However, I may be not knowledgeable enough to confirm this. Jan/Alexander, can you perhaps comment on this? Thanks, Alex. On Thu, Jan 24, 2013 at 3:54 AM, Chen Yang chenyang.f...@cn.fujitsu.com wrote: 于 2013-1-23 19:56, Alex Lyakas 写道: Hi, On Wed, Jan 23, 2013 at 1:04 PM, Chen Yang chenyang.f...@cn.fujitsu.com wrote: From: Chen Yang chenyang.f...@cn.fujitsu.com Date: Wed, 23 Jan 2013 11:21:51 +0800 Subject: [PATCH] Btrfs/send: sparse and pre-allocated file support for btrfs-send mechanism When sending a file with sparse or pre-allocated part, these parts will be sent as ZERO streams, and it's unnecessary. There are two ways to improve this, one is just skip the EMPTY parts, and the other one is to add a punch command to send, when an EMPTY parts was detected. But considering a case of incremental sends, if we choose the first one, when a hole got punched into the file after the initial send, the data will be unchanged on the receiving side when received incrementally. So the second choice is right. Signed-off-by: Cheng Yang chenyang.f...@cn.fujitsu.com --- fs/btrfs/send.c | 60 ++- fs/btrfs/send.h |3 +- 2 files changed, 61 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 5445454..31e9aef 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -3585,6 +3585,52 @@ out: return ret; } +static int send_punch(struct send_ctx *sctx, u64 offset, u32 len) +{ + int ret = 0; + struct fs_path *p; + mm_segment_t old_fs; + + p = fs_path_alloc(sctx); + if (!p) + return -ENOMEM; + + /* +* vfs normally only accepts user space buffers for security reasons. +* we only read from the file and also only provide the read_buf buffer +* to vfs. As this buffer does not come from a user space call, it's +* ok to temporary allow kernel space buffers. +*/ + old_fs = get_fs(); + set_fs(KERNEL_DS); + +verbose_printk(btrfs: send_fallocate offset=%llu, len=%d\n, offset, len); + + ret = open_cur_inode_file(sctx); + if (ret 0) + goto out; + + ret = begin_cmd(sctx, BTRFS_SEND_C_PUNCH); + if (ret 0) + goto out; + + ret = get_cur_path(sctx, sctx-cur_ino, sctx-cur_inode_gen, p); + if (ret 0) + goto out; + + TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); + TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); + TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, len); + + ret = send_cmd(sctx); + +tlv_put_failure: +out: + fs_path_free(sctx, p); + set_fs(old_fs); + return ret; +} + /* * Read some bytes from the current inode/file and send a write command to * user space. @@ -3718,6 +3764,7 @@ static int send_write_or_clone(struct send_ctx *sctx, u64 pos = 0; u64 len; u32 l; + u64 bytenr; u8 type; ei = btrfs_item_ptr(path-nodes[0], path-slots[0], @@ -3731,8 +3778,19 @@ static int send_write_or_clone(struct send_ctx *sctx, * sure to send the whole thing */ len = PAGE_CACHE_ALIGN(len); - } else { + } else if (type == BTRFS_FILE_EXTENT_REG) { len = btrfs_file_extent_num_bytes(path-nodes[0], ei); + bytenr = btrfs_file_extent_disk_bytenr(path-nodes[0], ei); + if (bytenr == 0) { + ret = send_punch(sctx, offset, len); + goto out; + } + } else if (type == BTRFS_FILE_EXTENT_PREALLOC) { + len = btrfs_file_extent_num_bytes(path-nodes[0], ei); + ret = send_punch(sctx, offset, len); + goto out; + } else { + BUG(); } Are these two cases really the same? In the bytenr
Warnings on file removal after Quota exceeded error (EDQUOT)
Hi all, I executed the following simple sequence of commands: # mount /dev/sda /mnt/btrfs # btrfs quota enable /mnt/btrfs/ # btrfs subvolume create /mnt/btrfs/SV Create subvolume '/mnt/btrfs/SV' # btrfs qgroup limit 1m /mnt/btrfs/SV # dd if=/dev/zero of=/mnt/btrfs/SV/file bs=64x1024 dd: writing `/mnt/btrfs/SV/file': Disk quota exceeded 16+0 records in 15+0 records out 983040 bytes (983 kB) copied, 0.00192474 s, 511 MB/s # rm /mnt/btrfs/SV/file The file was removed, but in the kern.log I see: [67577.594688] [ cut here ] [67577.594723] WARNING: at /source/fs/btrfs/inode.c:7177 btrfs_destroy_inode+0x266/0x300 [btrfs]() [67577.594726] Hardware name: Bochs [67577.594728] Modules linked in: btrfs(O) libcrc32c deflate zlib_deflate ctr twofish_generic twofish_x86_64_3way glue_helper lrw xts gf128mul twofish_x86_64 twofish_common camellia_generic serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_generic des_generic xcbc rmd160 sha512_generic crypto_null af_key xfrm_algo kvm ppdev microcode psmouse nfsv4 serio_raw nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc parport_pc mac_hid i2c_piix4 lp parport floppy [last unloaded: btrfs] [67577.594788] Pid: 10692, comm: rm Tainted: GW O 3.7.0-560-generic #57ba86c00f9573b63b8c06810d4f6915efed2442 [67577.594790] Call Trace: [67577.594804] [81056d4f] warn_slowpath_common+0x7f/0xc0 [67577.594810] [81056daa] warn_slowpath_null+0x1a/0x20 [67577.594830] [a039b296] btrfs_destroy_inode+0x266/0x300 [btrfs] [67577.594838] [8119e16c] destroy_inode+0x3c/0x70 [67577.594843] [8119e2d6] evict+0x136/0x1d0 [67577.594849] [8119eb92] iput_final+0xe2/0x170 [67577.594853] [8119ec5e] iput+0x3e/0x50 [67577.594859] [8119167b] do_unlinkat+0x12b/0x1d0 [67577.594867] [81698dde] ? do_page_fault+0xe/0x10 [67577.594871] [816984d5] ? do_async_page_fault+0x35/0xa0 [67577.594877] [81194042] sys_unlinkat+0x22/0x40 [67577.594883] [8169d719] system_call_fastpath+0x16/0x1b [67577.594886] ---[ end trace 0b7f0b925e0b6048 ]--- [67577.594888] [ cut here ] [67577.594906] WARNING: at /source/fs/btrfs/inode.c:7178 btrfs_destroy_inode+0x2ef/0x300 [btrfs]() [67577.594908] Hardware name: Bochs [67577.594910] Modules linked in: btrfs(O) libcrc32c deflate zlib_deflate ctr twofish_generic twofish_x86_64_3way glue_helper lrw xts gf128mul twofish_x86_64 twofish_common camellia_generic serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_generic des_generic xcbc rmd160 sha512_generic crypto_null af_key xfrm_algo kvm ppdev microcode psmouse nfsv4 serio_raw nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc parport_pc mac_hid i2c_piix4 lp parport floppy [last unloaded: btrfs] [67577.594979] Pid: 10692, comm: rm Tainted: GW O 3.7.0-560-generic #57ba86c00f9573b63b8c06810d4f6915efed2442 [67577.594981] Call Trace: [67577.594988] [81056d4f] warn_slowpath_common+0x7f/0xc0 [67577.594993] [81056daa] warn_slowpath_null+0x1a/0x20 [67577.595012] [a039b31f] btrfs_destroy_inode+0x2ef/0x300 [btrfs] [67577.595017] [8119e16c] destroy_inode+0x3c/0x70 [67577.595022] [8119e2d6] evict+0x136/0x1d0 [67577.595027] [8119eb92] iput_final+0xe2/0x170 [67577.595032] [8119ec5e] iput+0x3e/0x50 [67577.595037] [8119167b] do_unlinkat+0x12b/0x1d0 [67577.595042] [81698dde] ? do_page_fault+0xe/0x10 [67577.595046] [816984d5] ? do_async_page_fault+0x35/0xa0 [67577.595051] [81194042] sys_unlinkat+0x22/0x40 [67577.595056] [8169d719] system_call_fastpath+0x16/0x1b [67577.595059] ---[ end trace 0b7f0b925e0b6049 ]--- Then, when I delete the subvolume and unmount the file system I also get warnings: # btrfs subvolume delete /mnt/btrfs/SV Delete subvolume '/mnt/btrfs/SV' # umount /mnt/btrfs [67847.557519] [ cut here ] [67847.557550] WARNING: at /source/fs/btrfs/extent-tree.c:4366 btrfs_free_block_groups+0x2ce/0x370 [btrfs]() [67847.557553] Hardware name: Bochs [67847.557554] Modules linked in: btrfs(O) libcrc32c deflate zlib_deflate ctr twofish_generic twofish_x86_64_3way glue_helper lrw xts gf128mul twofish_x86_64 twofish_common camellia_generic serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_generic des_generic xcbc rmd160 sha512_generic crypto_null af_key xfrm_algo kvm ppdev microcode psmouse nfsv4 serio_raw nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc parport_pc mac_hid i2c_piix4 lp parport floppy [last unloaded: btrfs] [67847.557614] Pid: 10706, comm: umount Tainted: GW O 3.7.0-560-generic #57ba86c00f9573b63b8c06810d4f6915efed2442 [67847.557616] Call Trace: [67847.557629] [81056d4f] warn_slowpath_common+0x7f/0xc0 [67847.557635] [81056daa] warn_slowpath_null+0x1a/0x20 [67847.557651] [a037ba2e]
[PATCH] Btrfs-progs: we need to have the string null terminated
Bug: - btrfs subvolume list / -a ID 258 gen 4226 top level 384 path media/smbshare :: btrfs subvolume list /home -a ID 258 gen 4226 top level 5 path FS_TREE/__active/media/smbshare4.snap In the first command's output, this path is printed correctly, however in the second output it has 4.snap appended, similar to the names of the snapshots I made 22 hours ago. Signed-off-by: Anand Jain anand.j...@oracle.com Reported-by: Brendan Hide bren...@swiftspirit.co.za --- btrfs-list.c | 1 + 1 file changed, 1 insertion(+) diff --git a/btrfs-list.c b/btrfs-list.c index e5f0f96..5acba78 100644 --- a/btrfs-list.c +++ b/btrfs-list.c @@ -633,6 +633,7 @@ static int resolve_root(struct root_lookup *rl, struct root_info *ri, len = strlen(full_path); tmp = malloc(len + add_len + 2); memcpy(tmp + add_len + 1, full_path, len); + tmp[len + add_len + 1] = '\0'; tmp[add_len] = '/'; memcpy(tmp, p, add_len); free(full_path); -- 1.8.1.227.g44fe835 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: inconsistent output on sub list
Brendan, --- [root@watricky mnt]# btrfs subvolume list / -a ID 258 gen 4226 top level 384 path media/smbshare :: [root@watricky mnt]# btrfs subvolume list /home -a ID 258 gen 4226 top level 5 path FS_TREE/__active/media/smbshare4.snap --- This is definitely a bug. Thanks for reporting. I have made some fair-assumptions, and have sent out the patch[1] to fix this bug (ref this email thread). Could you kindly test it and report the result ? [1] [PATCH] Btrfs-progs: we need to have the string null terminated Thanks, Anand On 01/23/2013 03:42 AM, Brendan Hide wrote: Linux watricky 3.6.11-1-ARCH #1 SMP PREEMPT Tue Dec 18 08:57:15 CET 2012 x86_64 GNU/Linux In working on a snapshot maintenance script I've noticed some odd behaviour. Note the smbshare path. I've put this into its own subvolume as I don't plan on snapshotting it. In the first command's output, this path is printed correctly, however in the second output it has 4.snap appended, similar to the names of the snapshots I made 22 hours ago. If this is a documented issue with a fix then no worries. But if not and anyone wants me to check into any further specifics, please let me know. [root@watricky mnt]# btrfs subvolume list / -a ID 258 gen 4226 top level 384 path media/smbshare ID 259 gen 4337 top level 384 path home ID 384 gen 4321 top level 5 path FS_TREE/__active ID 392 gen 4337 top level 384 path var ID 393 gen 4267 top level 384 path usr ID 428 gen 4267 top level 5 path FS_TREE/__snapshot/__active.20130121-23h44.snap ID 429 gen 3980 top level 5 path FS_TREE/__snapshot/__active_home.20130121-23h45.snap ID 430 gen 4043 top level 5 path FS_TREE/__snapshot/__active_var.20130121-23h45.snap ID 431 gen 4267 top level 5 path FS_TREE/__snapshot/__active_usr.20130121-23h45.snap [root@watricky mnt]# btrfs subvolume list /home -a ID 258 gen 4226 top level 5 path FS_TREE/__active/media/smbshare4.snap ID 259 gen 4337 top level 5 path FS_TREE/__active/home ID 384 gen 4321 top level 5 path FS_TREE/__active ID 392 gen 4337 top level 5 path FS_TREE/__active/var ID 393 gen 4267 top level 5 path FS_TREE/__active/usr ID 428 gen 4267 top level 5 path FS_TREE/__snapshot/__active.20130121-23h44.snap ID 429 gen 3980 top level 5 path FS_TREE/__snapshot/__active_home.20130121-23h45.snap ID 430 gen 4043 top level 5 path FS_TREE/__snapshot/__active_var.20130121-23h45.snap ID 431 gen 4267 top level 5 path FS_TREE/__snapshot/__active_usr.20130121-23h45.snap [root@watricky mnt]# Note that the only directly mounted share is __active, mounted at /. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Open for contribution towards Btrfs testing
Hi, On 24.01.2013 06:10, praneeth u wrote: Hello, We are team of 5 students, interns at Green turtles technologies, interested in contributing to btrfs. Any space for contribution in btrfs testing ? we will be updating pogress twice in a week. Need suggestions on how to proceed. The quota subsystem (qgroups) is new and there are no tests for it yet. Tests for space tracking and limiting are needed. Also hierarchical quota need testing. Stress testing would be useful, too. Would you be interested in that? -Arne -- Praneeth U 9448804728 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Warnings on file removal after Quota exceeded error (EDQUOT)
On 24.01.2013 10:39, Lev Vainblat wrote: Hi all, I executed the following simple sequence of commands: # mount /dev/sda /mnt/btrfs # btrfs quota enable /mnt/btrfs/ # btrfs subvolume create /mnt/btrfs/SV Create subvolume '/mnt/btrfs/SV' # btrfs qgroup limit 1m /mnt/btrfs/SV # dd if=/dev/zero of=/mnt/btrfs/SV/file bs=64x1024 dd: writing `/mnt/btrfs/SV/file': Disk quota exceeded 16+0 records in 15+0 records out 983040 bytes (983 kB) copied, 0.00192474 s, 511 MB/s # rm /mnt/btrfs/SV/file The file was removed, but in the kern.log I see: [snip] Is this an expected behavior? Am I doing anything wrong here? I can reproduce it here, will look into it. Thanks for reporting! -Arne Thanks, -Lev. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs/send: sparse and pre-allocated file support for, btrfs-send mechanism
On thu, 24 Jan 2013 10:53:17 +0200, Alex Lyakas wrote: Hi Chen, with all due respect, what do you mean by I see and OK for users? The semantics of the fallocate API with/without the FALLOC_FL_PUNCH_HOLE flag is not defined by me or you. As far as I understand, the file system *must* adhere to the semantics of this API, as defined elsewhere. Looking, for example, here: http://man7.org/linux/man-pages/man2/fallocate.2.html Allocating disk space The default operation (i.e., mode is zero) of fallocate() allocates and initializes to zero the disk space within the range specified by offset and len. Deallocating file space Specifying the FALLOC_FL_PUNCH_HOLE flag (available since Linux 2.6.38) in mode deallocates space (i.e., creates a hole) in the byte range starting at offset and continuing for len bytes. Within the specified range, partial file system blocks are zeroed, and whole file system blocks are removed from the file. These are clearly two different modes of operation, and I don't think you or me can decide otherwise, at this point. I think send/receive commands just need ensure the content of the file is right, not the disk format. That is also the reason why the developers just send out the zero data when they meet a hole or pre-allocated extent. From this viewpoint, they are the same. Thanks Miao However, I may be not knowledgeable enough to confirm this. Jan/Alexander, can you perhaps comment on this? Thanks, Alex. On Thu, Jan 24, 2013 at 3:54 AM, Chen Yang chenyang.f...@cn.fujitsu.com wrote: 于 2013-1-23 19:56, Alex Lyakas 写道: Hi, On Wed, Jan 23, 2013 at 1:04 PM, Chen Yang chenyang.f...@cn.fujitsu.com wrote: From: Chen Yang chenyang.f...@cn.fujitsu.com Date: Wed, 23 Jan 2013 11:21:51 +0800 Subject: [PATCH] Btrfs/send: sparse and pre-allocated file support for btrfs-send mechanism When sending a file with sparse or pre-allocated part, these parts will be sent as ZERO streams, and it's unnecessary. There are two ways to improve this, one is just skip the EMPTY parts, and the other one is to add a punch command to send, when an EMPTY parts was detected. But considering a case of incremental sends, if we choose the first one, when a hole got punched into the file after the initial send, the data will be unchanged on the receiving side when received incrementally. So the second choice is right. Signed-off-by: Cheng Yang chenyang.f...@cn.fujitsu.com --- fs/btrfs/send.c | 60 ++- fs/btrfs/send.h |3 +- 2 files changed, 61 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 5445454..31e9aef 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -3585,6 +3585,52 @@ out: return ret; } +static int send_punch(struct send_ctx *sctx, u64 offset, u32 len) +{ + int ret = 0; + struct fs_path *p; + mm_segment_t old_fs; + + p = fs_path_alloc(sctx); + if (!p) + return -ENOMEM; + + /* +* vfs normally only accepts user space buffers for security reasons. +* we only read from the file and also only provide the read_buf buffer +* to vfs. As this buffer does not come from a user space call, it's +* ok to temporary allow kernel space buffers. +*/ + old_fs = get_fs(); + set_fs(KERNEL_DS); + +verbose_printk(btrfs: send_fallocate offset=%llu, len=%d\n, offset, len); + + ret = open_cur_inode_file(sctx); + if (ret 0) + goto out; + + ret = begin_cmd(sctx, BTRFS_SEND_C_PUNCH); + if (ret 0) + goto out; + + ret = get_cur_path(sctx, sctx-cur_ino, sctx-cur_inode_gen, p); + if (ret 0) + goto out; + + TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); + TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); + TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, len); + + ret = send_cmd(sctx); + +tlv_put_failure: +out: + fs_path_free(sctx, p); + set_fs(old_fs); + return ret; +} + /* * Read some bytes from the current inode/file and send a write command to * user space. @@ -3718,6 +3764,7 @@ static int send_write_or_clone(struct send_ctx *sctx, u64 pos = 0; u64 len; u32 l; + u64 bytenr; u8 type; ei = btrfs_item_ptr(path-nodes[0], path-slots[0], @@ -3731,8 +3778,19 @@ static int send_write_or_clone(struct send_ctx *sctx, * sure to send the whole thing */ len = PAGE_CACHE_ALIGN(len); - } else { + } else if (type == BTRFS_FILE_EXTENT_REG) { len = btrfs_file_extent_num_bytes(path-nodes[0], ei); + bytenr = btrfs_file_extent_disk_bytenr(path-nodes[0], ei); + if (bytenr == 0) { + ret = send_punch(sctx, offset,
[PATCH] Btrfs-prog/send: fix wrong best-parent assignment in, find_good_parent()
We use find_good_parent() to look for a suit snapshot in the clone source snapshots as the parent, not the source subvolume of the snapshot which is about to be sent. fix it Signed-off-by: Cheng Yang chenyang.f...@cn.fujitsu.com --- cmds-send.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/cmds-send.c b/cmds-send.c index 9b47e70..3a88c54 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -150,7 +150,7 @@ static int find_good_parent(struct btrfs_send *s, u64 root_id, u64 *found) if (tmp 0) tmp *= -1; if (tmp best_diff) { - best_parent = parent; + best_parent = parent2; best_diff = tmp; } } -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs/send: sparse and pre-allocated file support for, btrfs-send mechanism
On Thu, Jan 24, 2013 at 07:39:17PM +0800, Miao Xie wrote: On thu, 24 Jan 2013 10:53:17 +0200, Alex Lyakas wrote: Hi Chen, with all due respect, what do you mean by I see and OK for users? The semantics of the fallocate API with/without the FALLOC_FL_PUNCH_HOLE flag is not defined by me or you. As far as I understand, the file system *must* adhere to the semantics of this API, as defined elsewhere. Looking, for example, here: http://man7.org/linux/man-pages/man2/fallocate.2.html Allocating disk space The default operation (i.e., mode is zero) of fallocate() allocates and initializes to zero the disk space within the range specified by offset and len. Deallocating file space Specifying the FALLOC_FL_PUNCH_HOLE flag (available since Linux 2.6.38) in mode deallocates space (i.e., creates a hole) in the byte range starting at offset and continuing for len bytes. Within the specified range, partial file system blocks are zeroed, and whole file system blocks are removed from the file. These are clearly two different modes of operation, and I don't think you or me can decide otherwise, at this point. I think send/receive commands just need ensure the content of the file is right, not the disk format. That is also the reason why the developers just send out the zero data when they meet a hole or pre-allocated extent. From this viewpoint, they are the same. I disagree on the content vs representation comment above. I would very much hope that the reference receive implementation can turn a string of zeroes (or a hole) back into a sparse file. As a user, I'd be quite irritated if, say, one of my sparse VM images ended up 5 times the size when I backed it up with send/receive, simply because it's gone from holey to a huge bunch of zeroes. That particular issue can be dealt with at the receiver level, though. Hugo. However, I may be not knowledgeable enough to confirm this. Jan/Alexander, can you perhaps comment on this? Thanks, Alex. On Thu, Jan 24, 2013 at 3:54 AM, Chen Yang chenyang.f...@cn.fujitsu.com wrote: 于 2013-1-23 19:56, Alex Lyakas 写道: Hi, On Wed, Jan 23, 2013 at 1:04 PM, Chen Yang chenyang.f...@cn.fujitsu.com wrote: From: Chen Yang chenyang.f...@cn.fujitsu.com Date: Wed, 23 Jan 2013 11:21:51 +0800 Subject: [PATCH] Btrfs/send: sparse and pre-allocated file support for btrfs-send mechanism When sending a file with sparse or pre-allocated part, these parts will be sent as ZERO streams, and it's unnecessary. There are two ways to improve this, one is just skip the EMPTY parts, and the other one is to add a punch command to send, when an EMPTY parts was detected. But considering a case of incremental sends, if we choose the first one, when a hole got punched into the file after the initial send, the data will be unchanged on the receiving side when received incrementally. So the second choice is right. Signed-off-by: Cheng Yang chenyang.f...@cn.fujitsu.com --- fs/btrfs/send.c | 60 ++- fs/btrfs/send.h |3 +- 2 files changed, 61 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 5445454..31e9aef 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -3585,6 +3585,52 @@ out: return ret; } +static int send_punch(struct send_ctx *sctx, u64 offset, u32 len) +{ + int ret = 0; + struct fs_path *p; + mm_segment_t old_fs; + + p = fs_path_alloc(sctx); + if (!p) + return -ENOMEM; + + /* +* vfs normally only accepts user space buffers for security reasons. +* we only read from the file and also only provide the read_buf buffer +* to vfs. As this buffer does not come from a user space call, it's +* ok to temporary allow kernel space buffers. +*/ + old_fs = get_fs(); + set_fs(KERNEL_DS); + +verbose_printk(btrfs: send_fallocate offset=%llu, len=%d\n, offset, len); + + ret = open_cur_inode_file(sctx); + if (ret 0) + goto out; + + ret = begin_cmd(sctx, BTRFS_SEND_C_PUNCH); + if (ret 0) + goto out; + + ret = get_cur_path(sctx, sctx-cur_ino, sctx-cur_inode_gen, p); + if (ret 0) + goto out; + + TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); + TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); + TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, len); + + ret = send_cmd(sctx); + +tlv_put_failure: +out: + fs_path_free(sctx, p); + set_fs(old_fs); + return ret; +} + /* * Read some bytes from the current inode/file and send a write command to * user space. @@ -3718,6 +3764,7 @@ static int send_write_or_clone(struct send_ctx *sctx,
Re: [PATCH] Btrfs/send: sparse and pre-allocated file support for, btrfs-send mechanism
On thu, 24 Jan 2013 11:58:13 +, Hugo Mills wrote: On Thu, Jan 24, 2013 at 07:39:17PM +0800, Miao Xie wrote: On thu, 24 Jan 2013 10:53:17 +0200, Alex Lyakas wrote: Hi Chen, with all due respect, what do you mean by I see and OK for users? The semantics of the fallocate API with/without the FALLOC_FL_PUNCH_HOLE flag is not defined by me or you. As far as I understand, the file system *must* adhere to the semantics of this API, as defined elsewhere. Looking, for example, here: http://man7.org/linux/man-pages/man2/fallocate.2.html Allocating disk space The default operation (i.e., mode is zero) of fallocate() allocates and initializes to zero the disk space within the range specified by offset and len. Deallocating file space Specifying the FALLOC_FL_PUNCH_HOLE flag (available since Linux 2.6.38) in mode deallocates space (i.e., creates a hole) in the byte range starting at offset and continuing for len bytes. Within the specified range, partial file system blocks are zeroed, and whole file system blocks are removed from the file. These are clearly two different modes of operation, and I don't think you or me can decide otherwise, at this point. I think send/receive commands just need ensure the content of the file is right, not the disk format. That is also the reason why the developers just send out the zero data when they meet a hole or pre-allocated extent. From this viewpoint, they are the same. I disagree on the content vs representation comment above. I would very much hope that the reference receive implementation can turn a string of zeroes (or a hole) back into a sparse file. As a user, I'd be quite irritated if, say, one of my sparse VM images ended up 5 times the size when I backed it up with send/receive, simply because it's gone from holey to a huge bunch of zeroes. That particular issue can be dealt with at the receiver level, though. My explanation is not clear, sorry! What I want to say is send command needn't differentiate between a hole and a pre-allocated extent, because both of them are considered as zero. So just send out a hole even we meet a pre-allocated extent is OK, I think. Thanks Miao Hugo. However, I may be not knowledgeable enough to confirm this. Jan/Alexander, can you perhaps comment on this? Thanks, Alex. On Thu, Jan 24, 2013 at 3:54 AM, Chen Yang chenyang.f...@cn.fujitsu.com wrote: 于 2013-1-23 19:56, Alex Lyakas 写道: Hi, On Wed, Jan 23, 2013 at 1:04 PM, Chen Yang chenyang.f...@cn.fujitsu.com wrote: From: Chen Yang chenyang.f...@cn.fujitsu.com Date: Wed, 23 Jan 2013 11:21:51 +0800 Subject: [PATCH] Btrfs/send: sparse and pre-allocated file support for btrfs-send mechanism When sending a file with sparse or pre-allocated part, these parts will be sent as ZERO streams, and it's unnecessary. There are two ways to improve this, one is just skip the EMPTY parts, and the other one is to add a punch command to send, when an EMPTY parts was detected. But considering a case of incremental sends, if we choose the first one, when a hole got punched into the file after the initial send, the data will be unchanged on the receiving side when received incrementally. So the second choice is right. Signed-off-by: Cheng Yang chenyang.f...@cn.fujitsu.com --- fs/btrfs/send.c | 60 ++- fs/btrfs/send.h |3 +- 2 files changed, 61 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 5445454..31e9aef 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -3585,6 +3585,52 @@ out: return ret; } +static int send_punch(struct send_ctx *sctx, u64 offset, u32 len) +{ + int ret = 0; + struct fs_path *p; + mm_segment_t old_fs; + + p = fs_path_alloc(sctx); + if (!p) + return -ENOMEM; + + /* +* vfs normally only accepts user space buffers for security reasons. +* we only read from the file and also only provide the read_buf buffer +* to vfs. As this buffer does not come from a user space call, it's +* ok to temporary allow kernel space buffers. +*/ + old_fs = get_fs(); + set_fs(KERNEL_DS); + +verbose_printk(btrfs: send_fallocate offset=%llu, len=%d\n, offset, len); + + ret = open_cur_inode_file(sctx); + if (ret 0) + goto out; + + ret = begin_cmd(sctx, BTRFS_SEND_C_PUNCH); + if (ret 0) + goto out; + + ret = get_cur_path(sctx, sctx-cur_ino, sctx-cur_inode_gen, p); + if (ret 0) + goto out; + + TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); + TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); + TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, len); + + ret = send_cmd(sctx); + +tlv_put_failure: +out: + fs_path_free(sctx, p); + set_fs(old_fs); +
Re: [PATCH] Btrfs/send: sparse and pre-allocated file support for, btrfs-send mechanism
On Thu, Jan 24, 2013 at 08:13:34PM +0800, Miao Xie wrote: Onthu, 24 Jan 2013 11:58:13 +, Hugo Mills wrote: On Thu, Jan 24, 2013 at 07:39:17PM +0800, Miao Xie wrote: On thu, 24 Jan 2013 10:53:17 +0200, Alex Lyakas wrote: Hi Chen, with all due respect, what do you mean by I see and OK for users? The semantics of the fallocate API with/without the FALLOC_FL_PUNCH_HOLE flag is not defined by me or you. As far as I understand, the file system *must* adhere to the semantics of this API, as defined elsewhere. Looking, for example, here: http://man7.org/linux/man-pages/man2/fallocate.2.html Allocating disk space The default operation (i.e., mode is zero) of fallocate() allocates and initializes to zero the disk space within the range specified by offset and len. Deallocating file space Specifying the FALLOC_FL_PUNCH_HOLE flag (available since Linux 2.6.38) in mode deallocates space (i.e., creates a hole) in the byte range starting at offset and continuing for len bytes. Within the specified range, partial file system blocks are zeroed, and whole file system blocks are removed from the file. These are clearly two different modes of operation, and I don't think you or me can decide otherwise, at this point. I think send/receive commands just need ensure the content of the file is right, not the disk format. That is also the reason why the developers just send out the zero data when they meet a hole or pre-allocated extent. From this viewpoint, they are the same. I disagree on the content vs representation comment above. I would very much hope that the reference receive implementation can turn a string of zeroes (or a hole) back into a sparse file. As a user, I'd be quite irritated if, say, one of my sparse VM images ended up 5 times the size when I backed it up with send/receive, simply because it's gone from holey to a huge bunch of zeroes. That particular issue can be dealt with at the receiver level, though. My explanation is not clear, sorry! What I want to say is send command needn't differentiate between a hole and a pre-allocated extent, because both of them are considered as zero. So just send out a hole even we meet a pre-allocated extent is OK, I think. Ah, gotcha. That sounds more sensible, and I agree with your point (for what little weight I carry -- I'm not as well-versed in the behaviour of fallocate as I might be). Hugo. Thanks Miao Hugo. However, I may be not knowledgeable enough to confirm this. Jan/Alexander, can you perhaps comment on this? Thanks, Alex. On Thu, Jan 24, 2013 at 3:54 AM, Chen Yang chenyang.f...@cn.fujitsu.com wrote: 于 2013-1-23 19:56, Alex Lyakas 写道: Hi, On Wed, Jan 23, 2013 at 1:04 PM, Chen Yang chenyang.f...@cn.fujitsu.com wrote: From: Chen Yang chenyang.f...@cn.fujitsu.com Date: Wed, 23 Jan 2013 11:21:51 +0800 Subject: [PATCH] Btrfs/send: sparse and pre-allocated file support for btrfs-send mechanism When sending a file with sparse or pre-allocated part, these parts will be sent as ZERO streams, and it's unnecessary. There are two ways to improve this, one is just skip the EMPTY parts, and the other one is to add a punch command to send, when an EMPTY parts was detected. But considering a case of incremental sends, if we choose the first one, when a hole got punched into the file after the initial send, the data will be unchanged on the receiving side when received incrementally. So the second choice is right. Signed-off-by: Cheng Yang chenyang.f...@cn.fujitsu.com --- fs/btrfs/send.c | 60 ++- fs/btrfs/send.h |3 +- 2 files changed, 61 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 5445454..31e9aef 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -3585,6 +3585,52 @@ out: return ret; } +static int send_punch(struct send_ctx *sctx, u64 offset, u32 len) +{ + int ret = 0; + struct fs_path *p; + mm_segment_t old_fs; + + p = fs_path_alloc(sctx); + if (!p) + return -ENOMEM; + + /* +* vfs normally only accepts user space buffers for security reasons. +* we only read from the file and also only provide the read_buf buffer +* to vfs. As this buffer does not come from a user space call, it's +* ok to temporary allow kernel space buffers. +*/ + old_fs = get_fs(); + set_fs(KERNEL_DS); + +verbose_printk(btrfs: send_fallocate offset=%llu, len=%d\n, offset, len); + + ret = open_cur_inode_file(sctx); + if (ret 0) + goto out; + + ret = begin_cmd(sctx, BTRFS_SEND_C_PUNCH); + if (ret 0) + goto out; + + ret = get_cur_path(sctx,
Re: unmountable filesystem with kernel 3.7.3
Am 24.01.2013 01:57, schrieb Liu Bo: On Wed, Jan 23, 2013 at 08:14:44PM +0100, F. Frederich wrote: Hi, os: Gentoo kernel: 3.7.3 from kernel.org 2 harddisks with btrfs and Raid1, lzo compression trying to start a virtual machine (qemu-kvm) the system crashed giving following messages: kernel: [177417.378526] [ cut here ] kernel: [177417.378562] kernel BUG at fs/btrfs/ctree.c:2950! kernel: [177417.378579] invalid opcode: [#1] SMP kernel: [177417.378599] Modules linked in: it87 hwmon_vid k10temp i2c_piix4 kernel: [177417.378630] CPU 2 kernel: [177417.378645] Pid: 13388, comm: qemu-system-x86 Not tainted 3.7.3-system #2 Gigabyte Technology Co., Ltd. GA-MA790XT-UD4P/GA-MA790XT-UD4P kernel: [177417.378673] RIP: 0010:[8137e278] [8137e278] btrfs_set_item_key_safe+0x168/0x170 kernel: [177417.378708] RSP: 0018:88011149d8a8 EFLAGS: 00010286 kernel: [177417.378723] RAX: RBX: 000c RCX: 3e999000 kernel: [177417.378744] RDX: RSI: 88011149d9d6 RDI: 88011149d887 kernel: [177417.378761] RBP: 88011149d908 R08: 0d3d R09: 88011149d8c8 kernel: [177417.378778] R10: R11: R12: 8802026aaec8 kernel: [177417.378795] R13: 88011149d9d6 R14: 88011149d8c7 R15: 880426c37b40 kernel: [177417.378813] FS: 7f3e7abdb700() GS:880437d0() knlGS: kernel: [177417.378831] CS: 0010 DS: ES: CR0: 8005003b kernel: [177417.378846] CR2: 7f3e3801e138 CR3: 0001cc7dc000 CR4: 07e0 kernel: [177417.378863] DR0: 0045 DR1: DR2: kernel: [177417.378880] DR3: 0005 DR6: 0ff0 DR7: 0400 kernel: [177417.378898] Process qemu-system-x86 (pid: 13388, threadinfo 88011149c000, task 8801a3053600) kernel: [177417.378916] Stack: kernel: [177417.378925] 88011149d908 8804064c1800 88002cbb4000 3000 kernel: [177417.378953] 6c000aef 3e998000 88011149d908 880426c37b40 kernel: [177417.378981] 8802026aaec8 3e998000 0001 kernel: [177417.379007] Call Trace: kernel: [177417.379025] [813b17fb] __btrfs_drop_extents+0x58b/0xb20 kernel: [177417.379046] [813d25c5] btrfs_log_changed_extents+0x625/0x690 kernel: [177417.379066] [813bc2b2] ? free_extent_buffer+0x32/0x90 kernel: [177417.379085] [813d45d3] btrfs_log_inode+0x513/0x5b0 kernel: [177417.379104] [819f1c84] ? __schedule+0x2a4/0x6a0 kernel: [177417.379123] [813d6198] btrfs_log_inode_parent+0x188/0x470 kernel: [177417.379142] [813d64bf] btrfs_log_dentry_safe+0x3f/0x60 kernel: [177417.379160] [813af2c2] btrfs_sync_file+0x122/0x230 kernel: [177417.379181] [81191d70] generic_write_sync+0x50/0x70 kernel: [177417.379198] [813b0886] btrfs_file_aio_write+0x2c6/0x490 kernel: [177417.379217] [810c3c4e] ? get_futex_key+0x7e/0x250 kernel: [177417.379237] [8116526b] do_sync_write+0x9b/0xe0 kernel: [177417.379255] [811658de] vfs_write+0xae/0x170 kernel: [177417.379272] [81165d6a] sys_pwrite64+0x9a/0xa0 kernel: [177417.379290] [819f3a92] system_call_fastpath+0x16/0x1b kernel: [177417.379305] Code: 89 d0 48 c1 e0 05 48 29 c8 b9 11 00 00 00 48 8d 54 02 65 e8 eb f0 03 00 4c 89 ee 4c 89 f7 e8 70 f2 ff ff 85 c0 0f 8f 41 ff ff ff 0f 0b 0f 0b 0f 1f 40 00 55 48 b8 00 00 00 00 00 16 00 00 48 89 kernel: [177417.379526] RIP [8137e278] btrfs_set_item_key_safe+0x168/0x170 kernel: [177417.379547] RSP 88011149d8a8 kernel: [177417.407514] ---[ end trace 9e7b00faf79e316d ]--- trying to mount the filesystem after a restart failed with following messages: kernel: [ 56.481202] device label BTRFS devid 3 transid 78332 /dev/sdc kernel: [ 56.482323] btrfs: use lzo compression kernel: [ 56.482330] btrfs: disk space caching is enabled kernel: [ 61.926886] btrfs: corrupt leaf, bad key order: block=2177786388480,root=1, slot=12 kernel: [ 61.926925] [ cut here ] kernel: [ 61.927356] kernel BUG at fs/btrfs/tree-log.c:3832! kernel: [ 61.927780] invalid opcode: [#1] SMP kernel: [ 61.927787] Modules linked in: it87 hwmon_vid k10temp i2c_piix4 kernel: [ 61.927797] CPU 1 kernel: [ 61.927797] Pid: 3816, comm: mount Not tainted 3.7.3-system #2 Gigabyte Technology Co., Ltd. GA-MA790XT-UD4P/GA-MA790XT-UD4P kernel: [ 61.927811] RIP: 0010:[813d685a] [813d685a] btrfs_recover_log_trees+0x37a/0x3e0 kernel: [ 61.927814] RSP: 0018:880417625958 EFLAGS: 00010282 kernel: [ 61.927824] RAX: fffb RBX: 880426c58900 RCX: 2c6f kernel: [ 61.927827] RDX: 2c6e RSI: 880426c58990 RDI: ea00109b1600 kernel: [ 61.927829] RBP: 880417625a18 R08: 00017970 R09: 8137b295 kernel: [
Re: Quota reached: can't delete
On 24.01.2013 16:12, Jerome M wrote: Hi, With the current btrfs quota implementation, when you reach a subvolume quota limit, you can't delete anything without first removing the limit or enlarge it: rm: cannot remove `testfile.bin': Disk quota exceeded Is there any plan to change that? Yes, there is. The problem is that even deletion needs space. So we need to allow remove to go over quota. The current implementation doesn't make this distinction. -Arne Thanks, Jerome -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: fix potential null pointer dereference bug
On Sat, Jan 19, 2013 at 08:27:45AM -0700, Cong Ding wrote: The bug happens when rb_node == NULL. It causes variable node to be NULL and then the NULL pointer is dereferenced this line: BUG_ON((struct btrfs_root *)node-data != root); Based on my analysis, function tree_search should not return NULL to variable rb_node in this case (otherwise here has to be something unknown thing wrong), so I replace if (rb_node) with UG_ON(!rb_node). Signed-off-by: Cong Ding ding...@gmail.com I don't want to add more BUG_ON()'s, just return an error. Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] V2 Avoid sending disknr==0 extents when possible.
On Mon, Jan 21, 2013 at 03:27:22AM -0700, Alex Lyakas wrote: In the following cases we do not send disknr==0 extents: 1) full send 2) new inode in a diff-send 3) when disknr==0 extents are added to the end of an inode Original-version-by: Chen Yang chenyang.f...@cn.fujitsu.com Signed-off-by: Alex Lyakas a...@zadarastorage.com Neither of these would apply to btrfs-next, please rebase against btrfs-next and resend. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs-progs: Fix minor problems with btrfslabel.c
Fixed indentation, replace spaces with tabs. Signed-off-by: Danny Kukawka danny.kuka...@bisect.de Rebased, also fixed compiler warnings Signed-off-by: Gene Czarcinski g...@czarc.net --- btrfslabel.c | 57 + 1 file changed, 29 insertions(+), 28 deletions(-) diff --git a/btrfslabel.c b/btrfslabel.c index 88a5196..bea1f89 100644 --- a/btrfslabel.c +++ b/btrfslabel.c @@ -48,42 +48,43 @@ static int change_label_unmounted(char *dev, char *nLabel) { - struct btrfs_root *root; - struct btrfs_trans_handle *trans; - - /* Open the super_block at the default location -* and as read-write. -*/ - root = open_ctree(dev, 0, 1); - if (!root) /* errors are printed by open_ctree() */ - return; - - trans = btrfs_start_transaction(root, 1); - strncpy(root-fs_info-super_copy.label, nLabel, BTRFS_LABEL_SIZE); - root-fs_info-super_copy.label[BTRFS_LABEL_SIZE-1] = 0; - btrfs_commit_transaction(trans, root); - - /* Now we close it since we are done. */ - close_ctree(root); + struct btrfs_root *root; + struct btrfs_trans_handle *trans; + + /* Open the super_block at the default location +* and as read-write. +*/ + root = open_ctree(dev, 0, 1); + if (!root) /* errors are printed by open_ctree() */ + return -1; + + trans = btrfs_start_transaction(root, 1); + strncpy(root-fs_info-super_copy.label, nLabel, BTRFS_LABEL_SIZE); + root-fs_info-super_copy.label[BTRFS_LABEL_SIZE-1] = 0; + btrfs_commit_transaction(trans, root); + + /* Now we close it since we are done. */ + close_ctree(root); + return 0; } int get_label_unmounted(char *dev) { - struct btrfs_root *root; + struct btrfs_root *root; - /* Open the super_block at the default location -* and as read-only. -*/ - root = open_ctree(dev, 0, 0); + /* Open the super_block at the default location +* and as read-only. +*/ + root = open_ctree(dev, 0, 0); - if(!root) - return -1; + if(!root) + return -1; - fprintf(stdout, %s\n, root-fs_info-super_copy.label); + fprintf(stdout, %s\n, root-fs_info-super_copy.label); - /* Now we close it since we are done. */ - close_ctree(root); - return 0; + /* Now we close it since we are done. */ + close_ctree(root); + return 0; } int get_label(char *btrfs_dev) -- 1.8.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Btrfs: fix repeated delalloc work allocation
Hi Miao, On Thu, Jan 24, 2013 at 4:14 AM, Miao Xie mi...@cn.fujitsu.com wrote: On wed, 23 Jan 2013 19:02:30 +0200, Alex Lyakas wrote: Hi Miao, I have tested your patch, and the two discussed issues (OOM and handling the same inode more than once) are solved with it. However, snap creation under IO still takes 5-10 minutes for me. Basically, now the situation is similar to kernel 3.6, before your change to push the work to the flush workers. I see that flush worker is stuck in one of two stacks like this: [81301f1d] get_request+0x14d/0x330 [813030a4] blk_queue_bio+0x84/0x3b0 [812ff9a7] generic_make_request.part.55+0x77/0xb0 [812fff48] generic_make_request+0x68/0x70 [812fffcb] submit_bio+0x7b/0x160 [a02fe76e] submit_stripe_bio+0x5e/0x80 [btrfs] [a03050db] btrfs_map_bio+0x1cb/0x420 [btrfs] [a02e1b14] btrfs_submit_bio_hook+0xb4/0x1d0 [btrfs] [a02f5b1a] submit_one_bio+0x6a/0xa0 [btrfs] [a02f99c4] submit_extent_page.isra.38+0xe4/0x200 [btrfs] [a02fa83c] __extent_writepage+0x5dc/0x750 [btrfs] [a02fac6a] extent_write_cache_pages.isra.29.constprop.42+0x2ba/0x410 [btrfs] [a02fb00e] extent_writepages+0x4e/0x70 [btrfs] [a02e0898] btrfs_writepages+0x28/0x30 [btrfs] [81136510] do_writepages+0x20/0x40 [8112c341] __filemap_fdatawrite_range+0x51/0x60 [8112cc3c] filemap_flush+0x1c/0x20 [a02e416d] btrfs_run_delalloc_work+0xad/0xe0 [btrfs] [a0308c1f] worker_loop+0x16f/0x5d0 [btrfs] [8107ba50] kthread+0xc0/0xd0 [8169d66c] ret_from_fork+0x7c/0xb0 [] 0x or [8112a58e] sleep_on_page+0xe/0x20 [8112a577] __lock_page+0x67/0x70 [a02fabe9] extent_write_cache_pages.isra.29.constprop.42+0x239/0x410 [btrfs] [a02fb00e] extent_writepages+0x4e/0x70 [btrfs] [a02e0898] btrfs_writepages+0x28/0x30 [btrfs] [81136510] do_writepages+0x20/0x40 [8112c341] __filemap_fdatawrite_range+0x51/0x60 [8112cc3c] filemap_flush+0x1c/0x20 [a02e416d] btrfs_run_delalloc_work+0xad/0xe0 [btrfs] [a0308c1f] worker_loop+0x16f/0x5d0 [btrfs] [8107ba50] kthread+0xc0/0xd0 [8169d66c] ret_from_fork+0x7c/0xb0 [] 0x while btrfs_start_delalloc_inodes() waits for it to complete in the commit thread. Also for some reason, I have only one flush_workers thread, so switching to another thread does not improve for me. it is because the number of the inodes is very few, so the worker don't create new workers to deal with the new works. Maybe we can change flush_workers-idle_thresh to improve this problem. Another operation that takes time (up to one minute) is btrfs_wait_ordered_extents(), which does similar thing by switching the work to the flush worker. In this case, the btrfs_start_ordered_extent() is stuck in stacks like follows: [a04e6fc5] btrfs_start_ordered_extent+0x85/0x130 [btrfs] [a04e70e1] btrfs_run_ordered_extent_work+0x71/0xd0 [btrfs] [a04facef] worker_loop+0x16f/0x5d0 [btrfs] [8107ba50] kthread+0xc0/0xd0 [8169d66c] ret_from_fork+0x7c/0xb0 [] 0x where it waits for BTRFS_ORDERED_COMPLETE. I have several questions, on how to possibly address this issue: - I see that btrfs_flush_all_pending_stuffs() is invoked at least twice during each transaction commit. It may be invoked more than twice if the do{ } while loop that waits for writes, performs more than one iteration. For me, each invocation takes a lot of time because of btrfs_start_delalloc_inodes() and btrfs_wait_ordered_extents(). Given the fixes you have made (handling each inode only once), is it still needed to call these functions several times during the same commit? Calling it in the while loop is because we don't know how much time we need to wait for the end of the other transaction handle, so we flush the delalloc inodes instead of just waiting. Maybe we need not call btrfs_wait_ordered_extents() in the while loop, just call btrfs_start_delalloc_inodes() to start flush. - I see that during a commit without pending snapshots, these two functions are not invoked at all. if (flush_on_commit || snap_pending) { ret = btrfs_start_delalloc_inodes(root, 1); if (ret) return ret; btrfs_wait_ordered_extents(root, 1); } The FLUSHONCOMMIT mount option is normally *not set*, I see in the wiki that it is Not needed with recent kernels, as it's the normal mode of operation. Can you pls explain why the delalloc is needed when there is a pending snap, but not with a non-snap commit? Or pls point me to the code, where I can better understand what delalloc inode is and how it is related to FLUSHONCOMMIT or snapshot? Can you pls explain what the
btrfs-undelete shell-script
Hi, I tried to recover an accidentally deleted text file from a btrfs volume using the trusty old 'grep --text -C 500' method and failed, since the filesystem was compressed. So I wrote a shell script that uses btrfs-progs for a proper undelete functionality. Attached is the script that implements a working btrfs-undelete using the find- root and restore tools from btrfs-progs. It is fairly complete and solid and it even has some command line help. It needs bash and common unix utilities (sed, grep, wc, dirname, sort). I have successfully used it to recover a couple of files I deleted accidentally and was able to recover 2/3 of them just fine. The rest was zero-sized, I assume that's because the file blocks have already been reused. If you like it, feel free to add it to btrfs-progs. I've chosen GPLv2 or later as license, as that's what btrfs-progs seems to use. Please CC me on replies, I am not subscribed (and don't intend to). -- Mit freundlichen Grüßen, Jörg Walter btrfs-undelete Description: application/shellscript
Re: BUG in btrfs_set_item_key_safe
Hi, On Fri, Oct 26, 2012 at 08:35:43AM +0200, Jan Schmidt wrote: while running extensive qgroup and tree mod log tests I got to the following BUG, which is probably not related to tree mod log: I've left the fsx DIO tester overnight, after Josef identified the fix for the csum mismatch bug, and found it crashed at the same point as Jan reported: [80582.012720] [ cut here ] [80582.013045] kernel BUG at fs/btrfs/ctree.c:2945! [80582.013045] invalid opcode: [#1] PREEMPT SMP io_blk virtio_pci sr_mod 8139cp parport microcode virtio_ring floppy virtio sg pcspkr button i2c_piix4 cdrom processor thermal_sys scsi_dh_alua scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh ata_ge neric ata_piix [last unloaded: btrfs] [80582.013045] CPU 16 [80582.013045] Pid: 13045, comm: btrfs-endio-wri Tainted: GW 3.8.0-rc4-desktop+ #54 Bochs Bochs [80582.013045] RIP: 0010:[a0393a25] [a0393a25] btrfs_set_item_key_safe+0x65/0x1e0 [btrfs] [80582.013045] RSP: 0018:880adb9d9ad8 EFLAGS: 00010206 [80582.013045] RAX: 006c RBX: 880adb9d9c18 RCX: 012e [80582.013045] RDX: 00016000 RSI: 880ad076 RDI: 880adb9d9af9 [80582.013045] RBP: 880adb9d9b38 R08: 880ae4cd9dc0 R09: [80582.013045] R10: 880bee7c0f40 R11: 0015 R12: 0001 [80582.013045] R13: 880add976c78 R14: 880adb9d9ae8 R15: 880be92301c0 [80582.013045] FS: () GS:880c1f20() knlGS: [80582.013045] CS: 0010 DS: ES: CR0: 8005003b [80582.013045] CR2: ff600400 CR3: 000be6cb8000 CR4: 06e0 [80582.013045] DR0: DR1: DR2: [80582.013045] DR3: DR6: 0ff0 DR7: 0400 [80582.013045] Process btrfs-endio-wri (pid: 13045, threadinfo 880adb9d8000, task 880b301545c0) [80582.013045] Stack: [80582.013045] 880bec37a000 880b932a5170 012e 0160006c [80582.013045] 880adb9d9b00 a03a1ad0 012e 0001 [80582.013045] 880add976c78 880be92301c0 00016000 0f96 [80582.013045] Call Trace: [80582.013045] [a03a1ad0] ? btrfs_inc_extent_ref+0x90/0xb0 [btrfs] [80582.013045] [a03cdc50] __btrfs_drop_extents+0x530/0xb70 [btrfs] [80582.013045] [a03cecab] btrfs_drop_extents+0x6b/0x90 [btrfs] [80582.013045] [a03bce1a] insert_reserved_file_extent+0x8a/0x2a0 [btrfs] [80582.013045] [a03c0c6e] btrfs_finish_ordered_io+0x39e/0x3e0 [btrfs] [80582.013045] [a03c0cc0] finish_ordered_fn+0x10/0x20 [btrfs] [80582.013045] [a03e7205] worker_loop+0x165/0x4c0 [btrfs] [80582.013045] [a03e70a0] ? check_pending_worker_creates+0xe0/0xe0 [btrfs] [80582.013045] [a03e70a0] ? check_pending_worker_creates+0xe0/0xe0 [btrfs] [80582.013045] [8106d026] kthread+0xc6/0xd0 [80582.013045] [8106cf60] ? kthread_freezable_should_stop+0x70/0x70 [80582.013045] [815f7a7c] ret_from_fork+0x7c/0xb0 [80582.013045] [8106cf60] ? kthread_freezable_should_stop+0x70/0x70 [80582.013045] Code: 00 00 00 4c 89 ef 48 63 d2 4c 89 f6 48 8d 14 92 48 8d 54 92 65 e8 6c 0c 04 00 48 8b 0b 48 39 4d b0 48 8b 55 b9 0f b6 45 b8 76 0b 0f 0b eb fe 0f 1f 80 00 00 00 00 72 1e 3a 43 08 77 ee 66 0f 1f [80582.013045] RIP [a0393a25] btrfs_set_item_key_safe+0x65/0x1e0 [btrfs] [80582.013045] RSP 880adb9d9ad8 [80582.071249] ---[ end trace 5360a95c1506db85 ]--- The lines are shifted, but it's the same code: 2933 void btrfs_set_item_key_safe(struct btrfs_trans_handle *trans, 2934 struct btrfs_root *root, struct btrfs_path *path, 2935 struct btrfs_key *new_key) 2936 { 2937 struct btrfs_disk_key disk_key; 2938 struct extent_buffer *eb; 2939 int slot; 2940 2941 eb = path-nodes[0]; 2942 slot = path-slots[0]; 2943 if (slot 0) { 2944 btrfs_item_key(eb, disk_key, slot - 1); 2945 BUG_ON(comp_keys(disk_key, new_key) = 0); 2946 } 2947 if (slot btrfs_header_nritems(eb) - 1) { 2948 btrfs_item_key(eb, disk_key, slot + 1); 2949 BUG_ON(comp_keys(disk_key, new_key) = 0); 2950 } 2951 2952 btrfs_cpu_key_to_disk(disk_key, new_key); 2953 btrfs_set_item_key(eb, disk_key, slot); 2954 btrfs_mark_buffer_dirty(eb); 2955 if (slot == 0) 2956 fixup_low_keys(trans, root, path, disk_key, 1); 2957 } It was unexpected so I haven't put any debugging into config or code. According to the timestamps, the test has been running for 16h 58m. Test setup: * kvm virtual machine with 48 cpus * 48G memory (single bit ECC) * 5x 136GB 15K SAS disks (virtio,cache=none) filesystem was freshly created before the test as
Re: [PATCH 2/2] Btrfs: fix memory leak on extent map after fsync
On Tue, Jan 08, 2013 at 07:49:21AM -0700, Liu Bo wrote: During fsync, we put the changed parts(i.e. extent map) into the log tree, and we ship these parts from a list of modified_extents to a local list to process, of course, we must increment the refs of the extent maps to avoid it from getting evicted from cache. The problem is we don't hold the tree writer lock all the time of iterating the local list, and it is possible that other threads hack in and delete the extent map from the local list silently. So we'll end up with memory leak here. I hit this when testing xfstest 274 with mount options 'autodefrag,compress=zlib'. With this fix, the memory leak has gone away. This isn't going to work, we use the LOGGING flag to make sure the em isn't merged as well. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Btrfs: fix memory leak on extent map after fsync
On Tue, Jan 08, 2013 at 07:49:21AM -0700, Liu Bo wrote: During fsync, we put the changed parts(i.e. extent map) into the log tree, and we ship these parts from a list of modified_extents to a local list to process, of course, we must increment the refs of the extent maps to avoid it from getting evicted from cache. The problem is we don't hold the tree writer lock all the time of iterating the local list, and it is possible that other threads hack in and delete the extent map from the local list silently. So we'll end up with memory leak here. I hit this when testing xfstest 274 with mount options 'autodefrag,compress=zlib'. With this fix, the memory leak has gone away. And actually I fixed this in one of my other fsync patches that didn't get pulled in yet, so I'll break it out of that patch and we can send that along. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-prog/send: fix wrong best-parent assignment in, find_good_parent()
On Thu, January 24, 2013 at 12:53 (+0100), Chen Yang wrote: We use find_good_parent() to look for a suit snapshot in the clone source snapshots as the parent, not the source subvolume of the snapshot which is about to be sent. fix it Signed-off-by: Cheng Yang chenyang.f...@cn.fujitsu.com --- cmds-send.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/cmds-send.c b/cmds-send.c index 9b47e70..3a88c54 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -150,7 +150,7 @@ static int find_good_parent(struct btrfs_send *s, u64 root_id, u64 *found) if (tmp 0) tmp *= -1; if (tmp best_diff) { - best_parent = parent; + best_parent = parent2; best_diff = tmp; } } Reviewed-by: Jan Schmidt list.bt...@jan-o-sch.net Thanks, -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs/send: sparse and pre-allocated file support for, btrfs-send mechanism
Hi Alex, On Thu, January 24, 2013 at 09:53 (+0100), Alex Lyakas wrote: Looking, for example, here: http://man7.org/linux/man-pages/man2/fallocate.2.html Allocating disk space The default operation (i.e., mode is zero) of fallocate() allocates and initializes to zero the disk space within the range specified by offset and len. Deallocating file space Specifying the FALLOC_FL_PUNCH_HOLE flag (available since Linux 2.6.38) in mode deallocates space (i.e., creates a hole) in the byte range starting at offset and continuing for len bytes. Within the specified range, partial file system blocks are zeroed, and whole file system blocks are removed from the file. These are clearly two different modes of operation, and I don't think you or me can decide otherwise, at this point. However, I may be not knowledgeable enough to confirm this. Jan/Alexander, can you perhaps comment on this? We don't transfer the metadata itself and that's for good reason. The data should look alike from a user's point of view where possible. In places where we have no good solution, we make compromises (inode numbers come to mind). So, as a general rule: If possible with reasonable effort, I like to keep as much of the structure as possible. Therefore, I'd rather not see a sparse detector in any receiver out there; if it's sparse, send it as sparse, and if it's on disk, send it as zeros on disk. Handling of preallocated space with this rule is, well, arguable. For me, such space is more on disk than it's not. Applications preallocating space might do so for a good reason, and I would not treat those blocks as if they were holes for send and receive. -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: put csums on the right ordered extent
On Wed, Jan 23, 2013 at 06:57:09PM -0500, Chris Mason wrote: Looks like Josef saw the same, so we have two different bugs (missing crcs vs incorrect crcs). Josef mentioned a much faster test for the incorrect crcs...Josef what was that test? Yeah, 2 bugs, this patch fixed the incorrect csums with DIO, your scriptlet was able to trigger it reliably. I don't see a mail from Josef about the 2nd problem, so let me tell the story: I've tested plain 3.7 with the script and also with the repeated xfstests (that reproduced the 1st csum problem) -- all clear after 5.5 hours. So we took it as a first good. All suspects in the post 3.7 queue. Josef had a quick reproducer and bisected it to commit 31e502298d80e2af9001d17dc419a3fd4b0bebef Author: Liu Bo bo.li@oracle.com Btrfs: put raid properties into global table Raid properties can be shared among raid calculation code, we can put them into a global table to keep it simple. https://patchwork.kernel.org/patch/1781061/ --- Which mistakenly specified 0 as a maximum number of devices in single profile and was fixed by Miao later on, but the fix was not merged due to Josef debugging the 1st csum problem. https://patchwork.kernel.org/patch/1987481/ - { 1, 1, 0, 1, 1, 1 /* single */ }, + { 1, 1, 1, 1, 1, 1 /* single */ }, does indeed fix the 2nd bug. The reproducer that worked for me was to create raid0/raid0 filesystem, run the 50x fsx DIO load and do balance convert to single/single. Then fsx fail. cheers, david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/10] Btrfs-progs: move open_file_or_dir() to utils.c
On 01/24/2013 10:23 AM, Stefan Behrens wrote: On Wed, 23 Jan 2013 22:39:29 -0600, Eric Sandeen wrote: instead of renaming keeping the btrfsctl.c copy There is a new momentum to improve the Btrfs-progs quality :) IMO, one step is to get rid of the legacy tools and sources. It wastes time to maintain them and these old tools cause confusion. btrfsctl.c, btrfs-vol.c and btrfs-show.c are not needed anymore. Please correct me if there are plans to use these old tools in future Linux distributions. The btrfs tool replaces the legacy btrfsctl, btrfs-vol and btrfs-show tools. Below, the usage text of the old tools is quoted. All these tasks are also offered in the btrfs tool, and this tool is the newer one. I fully agree: btrfsctl, btrfs-vol, btrfs-show are perfectly replaced by by btrfs. Moreover time to time the patches are more complex than the needing because exists these legacy programs. I checked the debian package, and to me seems that there is no need of {btrfsctl,btrfs-vol,btrfs-show} BR Goffredo -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs/send: sparse and pre-allocated file support for, btrfs-send mechanism
Hi Jan, On Thu, Jan 24, 2013 at 7:13 PM, Jan Schmidt list.bt...@jan-o-sch.net wrote: Hi Alex, On Thu, January 24, 2013 at 09:53 (+0100), Alex Lyakas wrote: Looking, for example, here: http://man7.org/linux/man-pages/man2/fallocate.2.html Allocating disk space The default operation (i.e., mode is zero) of fallocate() allocates and initializes to zero the disk space within the range specified by offset and len. Deallocating file space Specifying the FALLOC_FL_PUNCH_HOLE flag (available since Linux 2.6.38) in mode deallocates space (i.e., creates a hole) in the byte range starting at offset and continuing for len bytes. Within the specified range, partial file system blocks are zeroed, and whole file system blocks are removed from the file. These are clearly two different modes of operation, and I don't think you or me can decide otherwise, at this point. However, I may be not knowledgeable enough to confirm this. Jan/Alexander, can you perhaps comment on this? We don't transfer the metadata itself and that's for good reason. The data should look alike from a user's point of view where possible. In places where we have no good solution, we make compromises (inode numbers come to mind). So, as a general rule: If possible with reasonable effort, I like to keep as much of the structure as possible. Therefore, I'd rather not see a sparse detector in any receiver out there; if it's sparse, send it as sparse, and if it's on disk, send it as zeros on disk. Agree, but I don't think we have any such thing. Receiver just obeys the commands in the stream, not tries to analyze them. Or perhaps you mean something else by sparse detector? Handling of preallocated space with this rule is, well, arguable. For me, such space is more on disk than it's not. Applications preallocating space might do so for a good reason, and I would not treat those blocks as if they were holes for send and receive. So, if I understand you correctly, you do suggest to distinguish between punch-hole and preallocate on send side (e.g., by using different send commands), and do appropriate things on the receive side by using/not using the FALLOC_FL_PUNCH_HOLE flag to the fallocate API on the receive side. If yes, then this was also my opinion. Thanks! Alex. -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/13] Btrfs-progs: more patches for integration (integration-20130121)
On Tue, Jan 22, 2013 at 03:14:35PM -0500, Gene Czarcinski wrote: On 01/21/2013 01:40 PM, David Sterba wrote: On Sun, Jan 20, 2013 at 04:04:05PM -0500, Gene Czarcinski wrote: I have done some additional scraping of the mailing list to identify some low hanging fruit which I consider should be merged into the btrfs-progs repository. Thanks, I went through them and put together into an integration branch git://repo.or.cz/btrfs-progs-unstable/devel.git integration-20130121 There is a problem with one of the patches: detect if the disk we are formatting is a ssd The commit comments appear to come from the patch I submitted. However, as I said in the submission, the patch for mkfs.c has to be refitted because of a conflict. The conflict is a result of the patch: Fix compiler warnings on PPC64 which relocated the include kerncompat.h to a different line in the mkfs.c file. These merge conflicts are inevitable and normal (though sometimes not that trivial). My initial aim was to build one linear branch only with fixes and git-merge the patches that add features, but this will not be easy to merge for much longer, so I'm going to do a linear integration branch so others can take them as checkpoint to base the various patchsets. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Please revert 2794ed0 (fix permissions of empty files not affected by umask)
Hi, Please revert commit 2794ed013b3551cbae887ea1b93c52aaacb7370d. That patch was meant to be added to btrfs_create and not to btrfs_mknod. (It was indeed later added to btrfs_create on commit 9185aa587b7425f8f4520da2e66792f5f3c2b815, so that's already OK.) btrfs_mknod already has a call to btrfs_update_inode later in the code so I don't think the duplicate is really needed at that point. Thanks! Filipe -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please revert 2794ed0 (fix permissions of empty files not affected by umask)
On Thu, Jan 24, 2013 at 11:32:17AM -0700, Filipe Brandenburger wrote: Hi, Please revert commit 2794ed013b3551cbae887ea1b93c52aaacb7370d. That patch was meant to be added to btrfs_create and not to btrfs_mknod. (It was indeed later added to btrfs_create on commit 9185aa587b7425f8f4520da2e66792f5f3c2b815, so that's already OK.) btrfs_mknod already has a call to btrfs_update_inode later in the code so I don't think the duplicate is really needed at that point. Hrm sorry I'll fix that in btrfs-next, I probably screwed this up when merging the patch, I remember having to manually apply this one. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10] Btrfs-progs: add show subcommand to subvol cli
+ if (sv_id == 5) { + printf(%s is btrfs root\n, fullpath); + close(fd); + close(mntfd); + free(mnt); + free(fullpath); + return 1; Just wondering, at this point might a goto out; be cleaner error handling? Every error case is getting longer ;) No wondering needed! This pattern is a source of bugs. I'm hitting cases in the static analysis reports of people adding return paths without recognizing the pile of state they're expected to cleanup. Use safe unwinding in one exit path at the end, please. It's less maddening to audit and less likely to fail over time. - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/10] Btrfs-progs: move open_file_or_dir() to utils.c
On 1/24/13 11:57 AM, Goffredo Baroncelli wrote: On 01/24/2013 10:23 AM, Stefan Behrens wrote: On Wed, 23 Jan 2013 22:39:29 -0600, Eric Sandeen wrote: instead of renaming keeping the btrfsctl.c copy There is a new momentum to improve the Btrfs-progs quality :) IMO, one step is to get rid of the legacy tools and sources. It wastes time to maintain them and these old tools cause confusion. btrfsctl.c, btrfs-vol.c and btrfs-show.c are not needed anymore. Please correct me if there are plans to use these old tools in future Linux distributions. The btrfs tool replaces the legacy btrfsctl, btrfs-vol and btrfs-show tools. Below, the usage text of the old tools is quoted. All these tasks are also offered in the btrfs tool, and this tool is the newer one. I fully agree: btrfsctl, btrfs-vol, btrfs-show are perfectly replaced by by btrfs. Moreover time to time the patches are more complex than the needing because exists these legacy programs. I checked the debian package, and to me seems that there is no need of {btrfsctl,btrfs-vol,btrfs-show} Hm, they are shipped in the Fedora package. For backwards compat, could those be turned into shell scripts which invoke the btrfs tool? -Eric BR Goffredo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/10] add show sub-command for btrfs subvol cli
On 01/23/2013 03:12 AM, Anand Jain wrote: Please accept this patch set which is on top of git://repo.or.cz/btrfs-progs-unstable/devel.git for-chris to add 'show' sub-command feature for the btrfs subvol cli as in an example below.. OK, in spite of saying that these patches did not apply clean, it was not terribly difficult to refit what did not apply. I did this in two steps: 1. I created the equivalent of the for-chris branch. This did not apply clean which surprised me. However, minor things easily handled manually in the first patch for utils.c and utils.h plus btrfs-lists.c in a later patch. 2. After creating a new set of patches from the above effort, I then moved to my working branch with includes all of the patches in the integration-20130121 branch [careful, patch 0023 has a conflict in mkfs.c] please others. This time it was a very small problem in cmds-qgroup.c. I am running it now. Gene -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/13] Btrfs-progs: more patches for integration (integration-20130121)
On 01/24/2013 01:13 PM, David Sterba wrote: On Tue, Jan 22, 2013 at 03:14:35PM -0500, Gene Czarcinski wrote: On 01/21/2013 01:40 PM, David Sterba wrote: On Sun, Jan 20, 2013 at 04:04:05PM -0500, Gene Czarcinski wrote: I have done some additional scraping of the mailing list to identify some low hanging fruit which I consider should be merged into the btrfs-progs repository. Thanks, I went through them and put together into an integration branch git://repo.or.cz/btrfs-progs-unstable/devel.git integration-20130121 There is a problem with one of the patches: detect if the disk we are formatting is a ssd The commit comments appear to come from the patch I submitted. However, as I said in the submission, the patch for mkfs.c has to be refitted because of a conflict. The conflict is a result of the patch: Fix compiler warnings on PPC64 which relocated the include kerncompat.h to a different line in the mkfs.c file. These merge conflicts are inevitable and normal (though sometimes not that trivial). My initial aim was to build one linear branch only with fixes and git-merge the patches that add features, but this will not be easy to merge for much longer, so I'm going to do a linear integration branch so others can take them as checkpoint to base the various patchsets. Yes, trying to merge branches with lots of changes are effectively impossible (impractical?) as far as I am concerned. Unless it is simple I I understand exactly what is happening, I have found it easier/better to apply a set of patches one at a time and resolve problems manually. I just got the subvol show patches applied. I tried putting it into a separate branch and then doing a merge but I could not trust my results when I ran mergetool. And, BTW they subvol show patches to apply with not very many problems. My working branch currently has 44 commits over 91d9eec and subvol show added another ten. Gene -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Btrfs fixes
On Tue, Jan 22, 2013 at 05:48:33PM -0700, Chris Mason wrote: Hi Linus, My for-linus branch has our batch of btrfs fixes: git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus We've been hammering away at a crc corruption as well, which I was really hoping to get into this pull. It isn't nailed down yet, but we were finally able to get a solid way to reproduce. The only good news is it isn't a recent regression. Update on this, we've tracked down the crc errors and are doing final checks on the patches. Linus are you planning on taking this pull? If not I can just fold the new stuff into a bigger request. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/10] Btrfs-progs: move open_file_or_dir() to utils.c
On 01/24/2013 08:42 PM, Eric Sandeen wrote: On 1/24/13 11:57 AM, Goffredo Baroncelli wrote: On 01/24/2013 10:23 AM, Stefan Behrens wrote: On Wed, 23 Jan 2013 22:39:29 -0600, Eric Sandeen wrote: instead of renaming keeping the btrfsctl.c copy There is a new momentum to improve the Btrfs-progs quality :) IMO, one step is to get rid of the legacy tools and sources. It wastes time to maintain them and these old tools cause confusion. btrfsctl.c, btrfs-vol.c and btrfs-show.c are not needed anymore. Please correct me if there are plans to use these old tools in future Linux distributions. The btrfs tool replaces the legacy btrfsctl, btrfs-vol and btrfs-show tools. Below, the usage text of the old tools is quoted. All these tasks are also offered in the btrfs tool, and this tool is the newer one. I fully agree: btrfsctl, btrfs-vol, btrfs-show are perfectly replaced by by btrfs. Moreover time to time the patches are more complex than the needing because exists these legacy programs. I checked the debian package, and to me seems that there is no need of {btrfsctl,btrfs-vol,btrfs-show} Hm, they are shipped in the Fedora package. The same is true for the debian package, but are these used in Fedora ? For backwards compat, could those be turned into shell scripts which invoke the btrfs tool? I don't see any gain to maintains a script bash (which has to be written from scratch) instead of maintains the current C code. These programs were deprecated two years ago [1]. If some distribution need them, could maintain them as separate patch. But I think that the mainstream should remove. -Eric BR Goffredo [1] $ git log 002d021c^..002d021c commit 002d021c5f2d838394e850e304546ffad283518a Author: Goffredo Baroncelli kreij...@libero.it Date: Sun Dec 5 17:47:36 2010 + Deprecate btrfsctl, btrfs-show, btrfs-vol Hi all, the patch below deprecates the following programs * btrfsctl * btrfs-vol * btrfs-show the reason is simple, these programs are superseded by the btrfs utility, both in terms of documentation, usability and bug. The goal is to avoid to duplicate codes and avoid update two programs. The patch adds a warning in the man pages, in the INSTALL file and in the programs. [...] -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/10] Btrfs-progs: move open_file_or_dir() to utils.c
On Thu, Jan 24, 2013 at 03:09:53PM -0700, Goffredo Baroncelli wrote: On 01/24/2013 08:42 PM, Eric Sandeen wrote: On 1/24/13 11:57 AM, Goffredo Baroncelli wrote: On 01/24/2013 10:23 AM, Stefan Behrens wrote: On Wed, 23 Jan 2013 22:39:29 -0600, Eric Sandeen wrote: instead of renaming keeping the btrfsctl.c copy There is a new momentum to improve the Btrfs-progs quality :) IMO, one step is to get rid of the legacy tools and sources. It wastes time to maintain them and these old tools cause confusion. btrfsctl.c, btrfs-vol.c and btrfs-show.c are not needed anymore. Please correct me if there are plans to use these old tools in future Linux distributions. The btrfs tool replaces the legacy btrfsctl, btrfs-vol and btrfs-show tools. Below, the usage text of the old tools is quoted. All these tasks are also offered in the btrfs tool, and this tool is the newer one. I fully agree: btrfsctl, btrfs-vol, btrfs-show are perfectly replaced by by btrfs. Moreover time to time the patches are more complex than the needing because exists these legacy programs. I checked the debian package, and to me seems that there is no need of {btrfsctl,btrfs-vol,btrfs-show} Hm, they are shipped in the Fedora package. The same is true for the debian package, but are these used in Fedora ? For backwards compat, could those be turned into shell scripts which invoke the btrfs tool? I don't see any gain to maintains a script bash (which has to be written from scratch) instead of maintains the current C code. These programs were deprecated two years ago [1]. If some distribution need them, could maintain them as separate patch. But I think that the mainstream should remove. I'd say that if SuSE or oracle depend on them we keep them. Otherwise, I'm fine with removing them or just making the 3 line bash script. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/10] Btrfs-progs: move open_file_or_dir() to utils.c
On Thu, Jan 24, 2013 at 05:36:52PM -0500, Chris Mason wrote: On Thu, Jan 24, 2013 at 03:09:53PM -0700, Goffredo Baroncelli wrote: The same is true for the debian package, but are these used in Fedora ? For backwards compat, could those be turned into shell scripts which invoke the btrfs tool? I don't see any gain to maintains a script bash (which has to be written from scratch) instead of maintains the current C code. These programs were deprecated two years ago [1]. If some distribution need them, could maintain them as separate patch. But I think that the mainstream should remove. I'd say that if SuSE or oracle depend on them we keep them. Otherwise, I'm fine with removing them or just making the 3 line bash script. I wanted to remove them once from our packages, but some tool uses them. I'm fine with replacing them with a shellscript, this is just a syntactic conversion from btrfsctl style to the subcommands. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/10] Btrfs-progs: move open_file_or_dir() to utils.c
Hey Chris, On 25/01/2013, at 9:36 AM, Chris Mason chris.ma...@fusionio.com wrote: I'd say that if SuSE or oracle depend on them we keep them. Otherwise, I'm fine with removing them or just making the 3 line bash script. You can take this as an official response from Oracle: we don't need/want the old tools. :) All of our documentation uses the unified binary. Thanks, Avi -- Oracle http://www.oracle.com Avi Miller | Principal Program Manager | +61 (412) 229 687 Oracle Linux and Virtualization 417 St Kilda Road, Melbourne, Victoria 3004 Australia -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10] Btrfs-progs: add show subcommand to subvol cli
On Thu, Jan 24, 2013 at 12:42:38PM -0700, Zach Brown wrote: + if (sv_id == 5) { + printf(%s is btrfs root\n, fullpath); + close(fd); + close(mntfd); + free(mnt); + free(fullpath); + return 1; Just wondering, at this point might a goto out; be cleaner error handling? Every error case is getting longer ;) No wondering needed! This pattern is a source of bugs. I'm hitting cases in the static analysis reports of people adding return paths without recognizing the pile of state they're expected to cleanup. Use safe unwinding in one exit path at the end, please. It's less maddening to audit and less likely to fail over time. Really, goto is your friend ;) -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10] Btrfs-progs: add show subcommand to subvol cli
Use safe unwinding in one exit path at the end, please. It's less maddening to audit and less likely to fail over time. Really, goto is your friend ;) I'm not going to use the word frenemy. I'm just not :). - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs pretty/quiet build
A quieter build makes warnings more obvious. This could probably be improved, but just to see if people like this or if they hate it. :) make V=1 overrides it and gets you the full glory again. # make [CC] ctree.o [CC] disk-io.o [CC] radix-tree.o [CC] extent-tree.o ... Signed-off-by: Eric Sandeen sand...@redhat.com --- Makefile | 92 +- man/Makefile | 29 +++--- 2 files changed, 76 insertions(+), 45 deletions(-) diff --git a/Makefile b/Makefile index 4894903..73657bf 100644 --- a/Makefile +++ b/Makefile @@ -20,6 +20,21 @@ bindir = $(prefix)/bin LIBS=-luuid -lm RESTORE_LIBS=-lz +ifeq ($(origin V), command line) + BUILD_VERBOSE = $(V) +endif +ifndef BUILD_VERBOSE + BUILD_VERBOSE = 0 +endif + +ifeq ($(BUILD_VERBOSE),1) + Q = +else + Q = @ +endif + +MAKEOPTS = --no-print-directory Q=$(Q) + progs = btrfsctl mkfs.btrfs btrfs-debug-tree btrfs-show btrfs-vol btrfsck \ btrfs btrfs-map-logical btrfs-image btrfs-zero-log btrfs-convert \ btrfs-find-root btrfs-restore btrfstune @@ -28,90 +43,113 @@ progs = btrfsctl mkfs.btrfs btrfs-debug-tree btrfs-show btrfs-vol btrfsck \ ifdef C check = sparse $(CHECKFLAGS) else - check = ls + check = true endif .c.o: - $(check) $ - $(CC) $(DEPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c $ + $(Q)$(check) $ + @echo [CC] $@ + $(Q)$(CC) $(DEPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c $ all: version $(progs) manpages version: - bash version.sh + $(Q)bash version.sh btrfs: $(objects) btrfs.o help.o common.o $(cmds_objects) - $(CC) $(CFLAGS) -o btrfs btrfs.o help.o common.o $(cmds_objects) \ + @echo [LD] $@ + $(Q)$(CC) $(CFLAGS) -o btrfs btrfs.o help.o common.o $(cmds_objects) \ $(objects) $(LDFLAGS) $(LIBS) -lpthread calc-size: $(objects) calc-size.o - $(CC) $(CFLAGS) -o calc-size calc-size.o $(objects) $(LDFLAGS) $(LIBS) + @echo [LD] $@ + $(Q)$(CC) $(CFLAGS) -o calc-size calc-size.o $(objects) $(LDFLAGS) $(LIBS) btrfs-find-root: $(objects) find-root.o - $(CC) $(CFLAGS) -o btrfs-find-root find-root.o $(objects) $(LDFLAGS) $(LIBS) + @echo [LD] $@ + $(Q)$(CC) $(CFLAGS) -o btrfs-find-root find-root.o $(objects) $(LDFLAGS) $(LIBS) btrfs-restore: $(objects) restore.o - $(CC) $(CFLAGS) -o btrfs-restore restore.o $(objects) $(LDFLAGS) $(LIBS) $(RESTORE_LIBS) + @echo [LD] $@ + $(Q)$(CC) $(CFLAGS) -o btrfs-restore restore.o $(objects) $(LDFLAGS) $(LIBS) $(RESTORE_LIBS) btrfsctl: $(objects) btrfsctl.o - $(CC) $(CFLAGS) -o btrfsctl btrfsctl.o $(objects) $(LDFLAGS) $(LIBS) + @echo [LD] $@ + $(Q)$(CC) $(CFLAGS) -o btrfsctl btrfsctl.o $(objects) $(LDFLAGS) $(LIBS) btrfs-vol: $(objects) btrfs-vol.o - $(CC) $(CFLAGS) -o btrfs-vol btrfs-vol.o $(objects) $(LDFLAGS) $(LIBS) + @echo [LD] $@ + $(Q)$(CC) $(CFLAGS) -o btrfs-vol btrfs-vol.o $(objects) $(LDFLAGS) $(LIBS) btrfs-show: $(objects) btrfs-show.o - $(CC) $(CFLAGS) -o btrfs-show btrfs-show.o $(objects) $(LDFLAGS) $(LIBS) + @echo [LD] $@ + $(Q)$(CC) $(CFLAGS) -o btrfs-show btrfs-show.o $(objects) $(LDFLAGS) $(LIBS) btrfsck: $(objects) btrfsck.o - $(CC) $(CFLAGS) -o btrfsck btrfsck.o $(objects) $(LDFLAGS) $(LIBS) + @echo [LD] $@ + $(Q)$(CC) $(CFLAGS) -o btrfsck btrfsck.o $(objects) $(LDFLAGS) $(LIBS) mkfs.btrfs: $(objects) mkfs.o - $(CC) $(CFLAGS) -o mkfs.btrfs $(objects) mkfs.o $(LDFLAGS) $(LIBS) + @echo [LD] $@ + $(Q)$(CC) $(CFLAGS) -o mkfs.btrfs $(objects) mkfs.o $(LDFLAGS) $(LIBS) btrfs-debug-tree: $(objects) debug-tree.o - $(CC) $(CFLAGS) -o btrfs-debug-tree $(objects) debug-tree.o $(LDFLAGS) $(LIBS) + @echo [LD] $@ + $(Q)$(CC) $(CFLAGS) -o btrfs-debug-tree $(objects) debug-tree.o $(LDFLAGS) $(LIBS) btrfs-zero-log: $(objects) btrfs-zero-log.o - $(CC) $(CFLAGS) -o btrfs-zero-log $(objects) btrfs-zero-log.o $(LDFLAGS) $(LIBS) + @echo [LD] $@ + $(Q)$(CC) $(CFLAGS) -o btrfs-zero-log $(objects) btrfs-zero-log.o $(LDFLAGS) $(LIBS) btrfs-select-super: $(objects) btrfs-select-super.o - $(CC) $(CFLAGS) -o btrfs-select-super $(objects) btrfs-select-super.o $(LDFLAGS) $(LIBS) + @echo [LD] $@ + $(Q)$(CC) $(CFLAGS) -o btrfs-select-super $(objects) btrfs-select-super.o $(LDFLAGS) $(LIBS) btrfstune: $(objects) btrfstune.o - $(CC) $(CFLAGS) -o btrfstune $(objects) btrfstune.o $(LDFLAGS) $(LIBS) + @echo [LD] $@ + $(Q)$(CC) $(CFLAGS) -o btrfstune $(objects) btrfstune.o $(LDFLAGS) $(LIBS) btrfs-map-logical: $(objects) btrfs-map-logical.o - $(CC) $(CFLAGS) -o btrfs-map-logical $(objects) btrfs-map-logical.o $(LDFLAGS) $(LIBS) + @echo [LD] $@ +
Re: [PATCH] btrfs: fix potential null pointer dereference bug
On Thu, Jan 24, 2013 at 10:34:20AM -0500, Josef Bacik wrote: On Sat, Jan 19, 2013 at 08:27:45AM -0700, Cong Ding wrote: The bug happens when rb_node == NULL. It causes variable node to be NULL and then the NULL pointer is dereferenced this line: BUG_ON((struct btrfs_root *)node-data != root); Based on my analysis, function tree_search should not return NULL to variable rb_node in this case (otherwise here has to be something unknown thing wrong), so I replace if (rb_node) with UG_ON(!rb_node). Signed-off-by: Cong Ding ding...@gmail.com I don't want to add more BUG_ON()'s, just return an error. But rb_node really has no chance to be 0, so I think we should use BUG_ON rather than return an error. If we return an error number here, its caller should check the returned value and call BUG_ON(ret) - it makes no difference. The file system doesn't have any way to handle this kind of error (and I think if it happens, there must be some hardware error or some other program directly operates on the file system bypassing linux system call). If you don't want to add more BUG_ON, I suggest to check variable node in the existing BUG_ON as the following code. - BUG_ON((struct btrfs_root *)node-data != root); + BUG_ON(!node || (struct btrfs_root *)node-data != root); What's your opinion, or do you have a better solution? Thanks, - cong From 3a5b4df67dd177b7cbc61c555349fd7e87ef6b54 Mon Sep 17 00:00:00 2001 From: Cong Ding ding...@gmail.com Date: Thu, 24 Jan 2013 18:30:45 -0500 Subject: [PATCH] btrfs: fix potential null pointer dereference bug The bug happens when rb_node == NULL. It causes variable node to be NULL and then the NULL pointer is dereferenced this line: BUG_ON((struct btrfs_root *)node-data != root); So we check node before the dereference. Signed-off-by: Cong Ding ding...@gmail.com --- fs/btrfs/relocation.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 17c306b..938b037 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1269,7 +1269,7 @@ static int __update_reloc_root(struct btrfs_root *root, int del) } spin_unlock(rc-reloc_root_tree.lock); - BUG_ON((struct btrfs_root *)node-data != root); + BUG_ON(!node || (struct btrfs_root *)node-data != root); if (!del) { spin_lock(rc-reloc_root_tree.lock); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Btrfs: fix memory leak on extent map after fsync
On Thu, Jan 24, 2013 at 11:44:33AM -0500, Josef Bacik wrote: On Tue, Jan 08, 2013 at 07:49:21AM -0700, Liu Bo wrote: During fsync, we put the changed parts(i.e. extent map) into the log tree, and we ship these parts from a list of modified_extents to a local list to process, of course, we must increment the refs of the extent maps to avoid it from getting evicted from cache. The problem is we don't hold the tree writer lock all the time of iterating the local list, and it is possible that other threads hack in and delete the extent map from the local list silently. So we'll end up with memory leak here. I hit this when testing xfstest 274 with mount options 'autodefrag,compress=zlib'. With this fix, the memory leak has gone away. This isn't going to work, we use the LOGGING flag to make sure the em isn't merged as well. Thanks, A quick grep shows, 1 16 fs/btrfs/extent_map.h GLOBAL #define EXTENT_FLAG_LOGGING 4 2406 fs/btrfs/extent_map.c remove_extent_mapping if (!test_bit(EXTENT_FLAG_LOGGING, em-flags)) 3 3403 fs/btrfs/tree-log.c btrfs_log_changed_extents set_bit(EXTENT_FLAG_LOGGING, em-flags); 4 3413 fs/btrfs/tree-log.c btrfs_log_changed_extents clear_bit(EXTENT_FLAG_LOGGING, em-flags); how does the flag avoid merging em? Seems we lost the check. static int mergable_maps(struct extent_map *prev, struct extent_map *next) { if (test_bit(EXTENT_FLAG_PINNED, prev-flags)) return 0; /* * don't merge compressed extents, we need to know their * actual size */ if (test_bit(EXTENT_FLAG_COMPRESSED, prev-flags)) return 0; ... } thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Btrfs: fix memory leak on extent map after fsync
On Fri, Jan 25, 2013 at 09:38:06AM +0800, Liu Bo wrote: On Thu, Jan 24, 2013 at 11:44:33AM -0500, Josef Bacik wrote: On Tue, Jan 08, 2013 at 07:49:21AM -0700, Liu Bo wrote: During fsync, we put the changed parts(i.e. extent map) into the log tree, and we ship these parts from a list of modified_extents to a local list to process, of course, we must increment the refs of the extent maps to avoid it from getting evicted from cache. The problem is we don't hold the tree writer lock all the time of iterating the local list, and it is possible that other threads hack in and delete the extent map from the local list silently. So we'll end up with memory leak here. I hit this when testing xfstest 274 with mount options 'autodefrag,compress=zlib'. With this fix, the memory leak has gone away. This isn't going to work, we use the LOGGING flag to make sure the em isn't merged as well. Thanks, Well, never mind, I've seen the fix in your btrfs-next. But it'd be better if you can also send it to the list where everyone can review it easily. thanks, liubo A quick grep shows, 1 16 fs/btrfs/extent_map.h GLOBAL #define EXTENT_FLAG_LOGGING 4 2406 fs/btrfs/extent_map.c remove_extent_mapping if (!test_bit(EXTENT_FLAG_LOGGING, em-flags)) 3 3403 fs/btrfs/tree-log.c btrfs_log_changed_extents set_bit(EXTENT_FLAG_LOGGING, em-flags); 4 3413 fs/btrfs/tree-log.c btrfs_log_changed_extents clear_bit(EXTENT_FLAG_LOGGING, em-flags); how does the flag avoid merging em? Seems we lost the check. static int mergable_maps(struct extent_map *prev, struct extent_map *next) { if (test_bit(EXTENT_FLAG_PINNED, prev-flags)) return 0; /* * don't merge compressed extents, we need to know their * actual size */ if (test_bit(EXTENT_FLAG_COMPRESSED, prev-flags)) return 0; ... } thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: Fix minor problems with btrfslabel.c
Thanks. comments below. On 01/24/2013 11:52 PM, Gene Czarcinski wrote: Fixed indentation, replace spaces with tabs. Signed-off-by: Danny Kukawka danny.kuka...@bisect.de Rebased, also fixed compiler warnings Signed-off-by: Gene Czarcinski g...@czarc.net --- btrfslabel.c | 57 + 1 file changed, 29 insertions(+), 28 deletions(-) diff --git a/btrfslabel.c b/btrfslabel.c index 88a5196..bea1f89 100644 --- a/btrfslabel.c +++ b/btrfslabel.c @@ -48,42 +48,43 @@ static int change_label_unmounted(char *dev, char *nLabel) { - struct btrfs_root *root; - struct btrfs_trans_handle *trans; - - /* Open the super_block at the default location -* and as read-write. -*/ - root = open_ctree(dev, 0, 1); - if (!root) /* errors are printed by open_ctree() */ - return; - - trans = btrfs_start_transaction(root, 1); - strncpy(root-fs_info-super_copy.label, nLabel, BTRFS_LABEL_SIZE); - root-fs_info-super_copy.label[BTRFS_LABEL_SIZE-1] = 0; - btrfs_commit_transaction(trans, root); - - /* Now we close it since we are done. */ - close_ctree(root); + struct btrfs_root *root; + struct btrfs_trans_handle *trans; + + /* Open the super_block at the default location +* and as read-write. +*/ + root = open_ctree(dev, 0, 1); + if (!root) /* errors are printed by open_ctree() */ + return -1; + + trans = btrfs_start_transaction(root, 1); + strncpy(root-fs_info-super_copy.label, nLabel, BTRFS_LABEL_SIZE); + root-fs_info-super_copy.label[BTRFS_LABEL_SIZE-1] = 0; + btrfs_commit_transaction(trans, root); + + /* Now we close it since we are done. */ + close_ctree(root); + return 0; } int get_label_unmounted(char *dev) { - struct btrfs_root *root; + struct btrfs_root *root; - /* Open the super_block at the default location -* and as read-only. -*/ - root = open_ctree(dev, 0, 0); + /* Open the super_block at the default location +* and as read-only. +*/ + root = open_ctree(dev, 0, 0); - if(!root) - return -1; + if(!root) + return -1; minor: since you are here, should you add a space after if - fprintf(stdout, %s\n, root-fs_info-super_copy.label); + fprintf(stdout, %s\n, root-fs_info-super_copy.label); - /* Now we close it since we are done. */ - close_ctree(root); - return 0; + /* Now we close it since we are done. */ + close_ctree(root); + return 0; } int get_label(char *btrfs_dev) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Btrfs fixes
On Thu, Jan 24, 2013 at 1:52 PM, Chris Mason chris.ma...@fusionio.com wrote: Update on this, we've tracked down the crc errors and are doing final checks on the patches. Linus are you planning on taking this pull? If not I can just fold the new stuff into a bigger request. If you have them basically ready, add them to this, I haven't pulled yet. So I'll just ignore this and wait for another pull request. Linus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Btrfs fixes (v2)
Hi Linus, My for-linus branch has our batch of btrfs fixes: git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus It turns out that we had two crc bugs when running fsx-linux in a loop. Many thanks to Josef, Miao Xie, and Dave Sterba for nailing it all down. Miao also has a new OOM fix in this v2 pull as well. Ilya fixed a regression Liu Bo found in the balance ioctls for pausing and resuming a running balance across drives. Josef's orphan truncate patch fixes an obscure corruption we'd see during xfstests. Arne's patches address problems with subvolume quotas. If the user destroys quota groups incorrectly the FS will refuse to mount. The rest are smaller fixes and plugs for memory leaks. Miao Xie (8) commits (+76/-24): Btrfs: fix missing write access release in btrfs_ioctl_resize() (+1/-0) Btrfs: do not delete a subvolume which is in a R/O subvolume (+5/-5) Btrfs: Add ACCESS_ONCE() to transaction-abort accesses (+3/-2) Btrfs: fix wrong max device number for single profile (+1/-1) Btrfs: fix repeated delalloc work allocation (+41/-14) Btrfs: fix missed transaction-aborted check (+16/-0) Btrfs: fix resize a readonly device (+4/-2) Btrfs: disable qgroup id 0 (+5/-0) Ilya Dryomov (6) commits (+94/-32): Btrfs: reorder locks and sanity checks in btrfs_ioctl_defrag (+9/-8) Btrfs: fix mutually exclusive op is running error code (+4/-4) Btrfs: fix a regression in balance usage filter (+8/-1) Btrfs: bring back balance pause/resume logic (+71/-17) Btrfs: fix unlock order in btrfs_ioctl_rm_dev (+1/-1) Btrfs: fix unlock order in btrfs_ioctl_resize (+1/-1) Liu Bo (5) commits (+23/-7): Btrfs: fix a bug when llseek for delalloc bytes behind prealloc extents (+14/-6) Btrfs: use right range to find checksum for compressed extents (+5/-0) Btrfs: let allocation start from the right raid type (+1/-1) Btrfs: reset path lock state to zero (+2/-0) Btrfs: fix off-by-one in lseek (+1/-0) Josef Bacik (5) commits (+69/-29): Btrfs: do not allow logged extents to be merged or removed (+16/-3) Btrfs: add orphan before truncating pagecache (+38/-15) Btrfs: set flushing if we're limited flushing (+1/-1) Btrfs: put csums on the right ordered extent (+2/-2) Btrfs: fix panic when recovering tree log (+12/-8) Arne Jansen (2) commits (+19/-1): Btrfs: prevent qgroup destroy when there are still relations (+12/-1) Btrfs: ignore orphan qgroup relations (+7/-0) Zach Brown (1) commits (+1/-0): btrfs: fix btrfs_cont_expand() freeing IS_ERR em Lukas Czerner (1) commits (+1/-1): btrfs: get the device in write mode when deleting it Eric Sandeen (1) commits (+14/-3): btrfs: update timestamps on truncate() Tsutomu Itoh (1) commits (+3/-1): Btrfs: fix memory leak in name_cache_insert() Total: (30) commits (+300/-98) fs/btrfs/extent-tree.c | 6 +- fs/btrfs/extent_map.c | 13 - fs/btrfs/extent_map.h | 1 + fs/btrfs/file-item.c| 4 +- fs/btrfs/file.c | 10 +++- fs/btrfs/free-space-cache.c | 20 --- fs/btrfs/inode.c| 137 +--- fs/btrfs/ioctl.c| 129 ++--- fs/btrfs/qgroup.c | 20 ++- fs/btrfs/send.c | 4 +- fs/btrfs/super.c| 2 +- fs/btrfs/transaction.c | 19 +- fs/btrfs/tree-log.c | 10 +++- fs/btrfs/volumes.c | 23 ++-- 14 files changed, 300 insertions(+), 98 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Btrfs: fix repeated delalloc work allocation
On thu, 24 Jan 2013 18:20:06 +0200, Alex Lyakas wrote: - Is there something that user-space can do to avoid flushing the delalloc during snap-commit? For example, if the user-space stops the IO and does a normal commit, this will not call btrfs_start_delalloc_inodes(), but this should ensure that *some* data is safe on disk. Then the user-space triggers snap creation (which is another commit), and then resume the IO? Because without the ongoing IO, snap creation is very fast. We can sync the fs before creating snapshot, in this way, we needn't flush the delalloc while committing the transaction. But I don't think it is good way. Because for the users, the page cache is transparent. I was always under impression that in Linux you must be aware of the page cache. For exampe, if a C program writes to a file, it is also responsible to fsync() the file (and check return value), to make sure that data has been persisted. However, for snap creation, it is perhaps better to do this automatically for the user. I have a idea to improve this problem: I think the below idea is very good: - introduce per-subvolume delalloc inode list Perfect. - flush the delalloc inode list of the source subvolume in create_snapshot(), not flush it in commit_transaction(). In this way, we just flush once. Perfect. - don't do the delalloc flush in commit_transaction() if there are pending snapshots Perfect. - If FLUSHONCOMMIT is set, just call btrfs_start_delalloc_inodes() before while loop, and call btrfs_wait_ordered_extents() after the while loop. In case FLUSHONCOMMIT is not set, but there are pending snapshots, is it also needed to call btrfs_wait_ordered_extents(), since we have started the delalloc IO in create_snapshot()? Yes, because FLUSHONCOMMIT will flush all delalloc inodes, including the source subvolumes and the others which are not being snapshoted. I have run additional tests with your patchset. It definitely improves the two bugs. However, I have a question: if the IO to the filesystem continues during extent_write_cache_pages() loop, can it add more dirty pages to the flush this function is doing? Basically, I am looking for some way to separate old delallocs that must be flushed vs new delallocs that were added after we started committing. We can not add dirty pages into the extents which are under the disk IO, but we can add dirty pages into the other extents which belong to the same file, but is not under the disk IO. (Some developers have discuss the issue(unstable page problem) that if we can dirty the extent that is under the disk IO or not in the past. Their approach is allocate a new page to store the new data. But some developers disagree this idea because the disk become more and more fast, it is unnecessary, and if the free memory is not enough, we still need wait the IO.) I think flushing the delalloc inodes at the beginning of btrfs_commit_transaction() is a simple way to separate old delallocs and new delallocs. For example, with your current patches, the extent_write_cache_pages() is called at least twice per inode (I know that your new idea will fix it). In my case, the first call submitted 22671 pages (inode is 10Gb) and the second call submitted 33705 pages. Between those two calls there were 6531 pages that were submitted twice. I noticed that if I stop the IO and then immediately trigger snap creation, much less pages are submitted. I played with the freeze_super() API, which also syncs the filesystem, so delalloc flush is not needed in this case, like you mentioned. However, the snap creation flow also calls mnt_want_write_file(), which blocks if the FS is freezed. Does it make sense to have some kind of a light-freeze functionality, which would only block the new writers (and possibly wait for in-flight writer threads to complete), but will not do the sync_filesystem part, but only the SB_FREEZE_WRITE part? I have implemented a light-freeze functionality for the R/W-R/O change of the subvolume(The patch will send out later). But I don't think it is necessary to introduce it to the snapshot creation. Thanks Miao -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html