Re: No/bad auto-detection of fs type for small volumes (related to mixed metadata/data?)
On Tue, Jul 24, 2012 at 08:39:36PM -0400, Marios Titas wrote: > When I create a btrfs volume of size strictly less than 256 MiB then if I do > mount /dev/sdb1 /mnt/test > the kernel tries unsuccessfully to do the mount with many other file systems > before successfully trying with btrfs. For volumes of size larger than > or equal to > 256 MiB it just mounts the volume without doing that. Why is this discrepancy? Are you using the --mixed option when creating the filesystem? If not, you should do with something that small. Hugo. > Another possibly related symptom is that the volume does not appear in > /dev/disk/by-label and /dev/disk/by-uuid at all. This means that it is > impossible > to mount the volume by uuid or label. > > To make sure that this isn't a udev bug, I booted my system with > init=/bin/bash > in the kernel command line, and then I tried again to mount the > volume. This time > it would not mount it at all unless I explicitly specified the fs > type. On the other > hand, it could mount larger volumes without any issues. > > All the experiments were done in an initially zeroed out disk. I am > using 3.4.6 kernel > with btrfs from 3.5 and the latest btrfs-progs from git. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- In theory, theory and practice are the same. In --- practice, they're different. signature.asc Description: Digital signature
Re: No/bad auto-detection of fs type for small volumes (related to mixed metadata/data?)
On Tue, Jul 24, 2012 at 6:39 PM, Marios Titas wrote: > When I create a btrfs volume of size strictly less than 256 MiB then if I do > mount /dev/sdb1 /mnt/test > the kernel tries unsuccessfully to do the mount with many other file systems > before successfully trying with btrfs. For volumes of size larger than > or equal to > 256 MiB it just mounts the volume without doing that. Why is this discrepancy? If I understand correctly, the kernel does not implement any fs detection; this is performed by the mount utility, which indeed may try a bunch of different filesystems until it finds one that works. >From man mount: If no -t option is given, or if the auto type is specified, mount will try to guess the desired type. Mount uses the blkid or volume_id library for guessing the filesystem type; if that does not turn up anything that looks familiar, mount will try to read the file /etc/filesystems, or, if that does not exist, /proc/filesystems. All of the filesystem types listed there will be tried, except for those that are labeled "nodev" (e.g., devpts, proc and nfs). If /etc/filesystems ends in a line with a single * only, mount will read /proc/filesystems afterwards. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 6/6] Btrfs-progs: add btrfs send/receive commands
Thanks! So now: A_PATH -> path -> full_path -> newpath A_PATH_LINK -> lnk -> full_link_path -> oldpath while I viewed it the other way around. I guess it's not important what is left/right, old/new :) as long as it's consistent. Alex. On Tue, Jul 24, 2012 at 11:27 PM, Alexander Block wrote: > On Thu, Jul 19, 2012 at 3:25 PM, Alex Lyakas > wrote: >> +static int process_link(const char *path, const char *lnk, void *user) >> +{ >> + int ret; >> + struct btrfs_receive *r = user; >> + char *full_path = path_cat(r->full_subvol_path, path); >> + >> + if (g_verbose >= 1) >> + fprintf(stderr, "link %s -> %s\n", path, lnk); >> + >> + ret = link(lnk, full_path); >> + if (ret < 0) { >> + ret = -errno; >> + fprintf(stderr, "ERROR: link %s -> %s failed. %s\n", path, >> + lnk, strerror(-ret)); >> + } >> >> Actually it has to be: >> char *full_link_path = path_cat(r->full_subvol_path, lnk); >> ... >> ret = link(full_path/*oldpath*/, full_link_path/*newpath*/); >> ... >> free(full_link_path); >> >> Thanks, >> Alex. > > Actually, the pathes got mixed up in-kernel. You'll find a pushed fix > in the kernel repo. I also pushed a fix to btrfs-progs containing the > full_link_path. Thanks again :) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: No/bad auto-detection of fs type for small volumes (related to mixed metadata/data?)
On Wed, Jul 25, 2012 at 5:09 AM, cwillu wrote: > On Tue, Jul 24, 2012 at 6:39 PM, Marios Titas wrote: >> When I create a btrfs volume of size strictly less than 256 MiB then if I do >> mount /dev/sdb1 /mnt/test >> the kernel tries unsuccessfully to do the mount with many other file systems >> before successfully trying with btrfs. For volumes of size larger than >> or equal to >> 256 MiB it just mounts the volume without doing that. Why is this >> discrepancy? > > If I understand correctly, the kernel does not implement any fs > detection; this is performed by the mount utility, which indeed may > try a bunch of different filesystems until it finds one that works. > > From man mount: > If no -t option is given, or if the auto type is specified, > mount will try to guess the desired type. Mount uses the blkid > or volume_id library for guessing the filesystem type; if that > does not turn up anything that looks familiar, mount will try to > read the file /etc/filesystems, or, if that does not exist, > /proc/filesystems. All of the filesystem types listed there > will be tried, except for those that are labeled "nodev" (e.g., > devpts, proc and nfs). If /etc/filesystems ends in a line with > a single * only, mount will read /proc/filesystems afterwards. Thanks, that was helpful. It was a blkid bug. It was fixed [1] in util-linux 2.21. [1] https://git.kernel.org/?p=utils/util-linux/util-linux.git;a=commit;h=04f7020 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/6] Experimental btrfs send/receive (btrfs-progs)
On Mon, Jul 23, 2012 at 2:29 PM, Arne Jansen wrote: > On 04.07.2012 15:39, Alexander Block wrote: >> Hello all, >> >> This is the user space side of btrfs send/receive. >> >> You can apply them manually or use my git repo: >> >> git://github.com/ablock84/btrfs-progs.git (branch send) >> >> The branch is based on Hugo's integration-20120605 branch. I had to add a >> temporary >> commit to fix a bug introduced in one of the strncpy/overflow patches that >> got into >> btrfs-progs. This fix is not part of the btrfs send/receive patchset, but >> you'll >> probably need it if you want to base on the integration branch. I hope this >> is not >> required in the future when a new integration branch comes out. >> >> Example usage: >> >> Multiple snapshots at once: >> btrfs send /mnt/snap[123] > snap123.btrfs > > a) Do we really want a single token command here, not > btrfs filesystem send or subvol send? In my opinion the single token is easier to type and remember. But if enough speaks for normal subcommands this can be changed (but by someone else as I'm running out of time). > b) zfs makes sure stdout is not a tty, to prevent flooding > your console. This kinda makes sense. This makes sense. But again, this has to be done by someone else. > >> >> Single snapshot with manual parent: >> btrfs send -p /mnt/snap3 /mnt/snap4 > snap4.btrfs >> >> Receive both streams: >> btrfs receive /mnt2 < snap123.btrfs >> btrfs receive /mnt2 < snap4.btrfs >> >> (Please give suggestions for a file extension) >> >> Please read the kernel side email as well, especially the warnings! >> >> Alex. >> >> Alexander Block (6): >> Btrfs-progs: add BTRFS_IOC_SUBVOL_GET/SETFLAGS to ioctl.h >> Btrfs-progs: update ioctl.h to support clone range ioctl >> Btrfs-progs: print inode transid and dir item data field in >> debug-tree >> Btrfs-progs: update btrfs-progs for subvol uuid+times support >> Btrfs-progs: update ioctl.h to support btrfs send ioctl >> Btrfs-progs: add btrfs send/receive commands >> >> Makefile |7 +- >> btrfs.c|2 + >> cmds-receive.c | 910 >> >> cmds-send.c| 677 + >> commands.h |4 + >> ctree.h| 40 ++- >> ioctl.h| 35 ++- >> print-tree.c | 88 -- >> send-stream.c | 480 ++ >> send-stream.h | 58 >> send-utils.c | 337 + >> send-utils.h | 69 + >> send.h | 132 >> 13 files changed, 2815 insertions(+), 24 deletions(-) >> create mode 100644 cmds-receive.c >> create mode 100644 cmds-send.c >> create mode 100644 send-stream.c >> create mode 100644 send-stream.h >> create mode 100644 send-utils.c >> create mode 100644 send-utils.h >> create mode 100644 send.h >> > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4/7] Btrfs: introduce subvol uuids and times
On Tue, Jul 24, 2012 at 7:55 AM, Arne Jansen wrote: > On 23.07.2012 21:41, Alexander Block wrote: >> On Mon, Jul 16, 2012 at 4:56 PM, Arne Jansen wrote: >>> On 04.07.2012 15:38, Alexander Block wrote: + + ret = btrfs_update_root(trans, root->fs_info->tree_root, + &root->root_key, &root->root_item); + if (ret < 0) { + goto out; >>> >>> are you leaking a trans handle here? >>> >> btrfs_update_root is aborting the transaction in case of failure. Do I >> still need to call end_transaction? > > It's your handle, you should free it. Ahh...I thought abort_transaction already frees the handle. Fixed. > > ... > +struct btrfs_ioctl_received_subvol_args { + charuuid[BTRFS_UUID_SIZE]; /* in */ + __u64 stransid; /* in */ + __u64 rtransid; /* out */ + struct timespec stime; /* in */ + struct timespec rtime; /* out */ + __u64 reserved[16]; >>> >>> What is this reserved used for? I don't see a mechanism that could be >>> used to signal that there are useful information here, other than >>> using a different ioctl. >>> >> The reserved is a result of a suggestion made by David. I can remove >> it again if you want... > > I don't argue against some reserved space, I only have problems to > see how you can make use of them in the future when there's no way > to signal that they contain valid information. I should be sufficient > to define the reserved values to be 0 at the moment. Misunderstood that. Now I see the problem :) I've added a flags field before the reserved fields. It's unused for now but can later be used to signal that new fields are used. > +}; + #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \ struct btrfs_ioctl_vol_args) #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \ @@ -359,6 +368,10 @@ struct btrfs_ioctl_get_dev_stats { struct btrfs_ioctl_ino_path_args) #define BTRFS_IOC_LOGICAL_INO _IOWR(BTRFS_IOCTL_MAGIC, 36, \ struct btrfs_ioctl_ino_path_args) + +#define BTRFS_IOC_SET_RECEIVED_SUBVOL _IOWR(BTRFS_IOCTL_MAGIC, 37, \ + struct btrfs_ioctl_received_subvol_args) + #define BTRFS_IOC_GET_DEV_STATS _IOWR(BTRFS_IOCTL_MAGIC, 52, \ struct btrfs_ioctl_get_dev_stats) #define BTRFS_IOC_GET_AND_RESET_DEV_STATS _IOWR(BTRFS_IOCTL_MAGIC, 53, \ diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index 24fb8ce..17d638e 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -16,6 +16,7 @@ * Boston, MA 021110-1307, USA. */ +#include #include "ctree.h" #include "transaction.h" #include "disk-io.h" @@ -25,6 +26,9 @@ * lookup the root with the highest offset for a given objectid. The key we do * find is copied into 'key'. If we find something return 0, otherwise 1, < 0 * on error. + * We also check if the root was once mounted with an older kernel. If we detect + * this, the new fields coming after 'level' get overwritten with zeros so to + * invalidate the fields. >>> >>> ... "This is detected by a mismatch of the 2 generation fields" ... or >>> something >>> like that. >>> >> The current version (found in git only) has this new function which is >> called in find_last_root: >> void btrfs_read_root_item(struct btrfs_root *root, >>struct extent_buffer *eb, int slot, >>struct btrfs_root_item *item) >> >> The comment above this function explains what happens. > > ok. Please regard most of my comments as an expression of my thoughts while > reading it. So they mark places where it might be useful to add comments > to make it easier for the next reader :) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Question about btrfs metadata structure
Hi! I'm writing my bsc thesis about btrfs and I'm analyzing the btrfs metadata structure at the moment. I'm using fedora 17 with kernel 3.4 and btrfs-prog from your git-repository. For an overview I use "btrfs-debug-tree" and here are my questions: The root tree contains some inode_items, but I don't understand the meaning of them: item 4 key (ROOT_TREE_DIR INODE_ITEM 0) itemoff 3101 itemsize 160 inode generation 3 size 0 block group 0 mode 40555 links 1 item 5 key (ROOT_TREE_DIR INODE_REF 6) itemoff 3089 itemsize 12 inode ref index 0 namelen 2 name: .. The key for an inode_item is objectid=ROOT_TREE_DIR. ROOT-TREE-DIR is the inode-Nr.? What does offset=6 regarding INODE_REF mean? --> item 5 Also placed in the root tree, I've found entries of (FREE_SPACE UNTYPED ) What does UNTYPED mean in this context and the offset is the logical address of a block group (I guess...)? This item has a pointer to an inode_item, is that right? And last question about this entries: item 2 key (FS_TREE INODE_REF 6) itemoff 3500 itemsize 17 inode ref index 0 namelen 7 name: default ... item 6 key (ROOT_TREE_DIR DIR_ITEM 2378154706) itemoff 3052 itemsize 37 location key (FS_TREE ROOT_ITEM 18446744073709551615) type 2 namelen 7 datalen 0 name: default There is an INODE_REF (again, offset 6...?) pointing to which inode? I cannot find some... What's about that offset in location key (FS_TREE) Thx in advance for your patience and answers! Regards Guenther -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs-progs: segfault when re-creating filesystem with 3+ devices
Hello, Running on Linux 3.5.0, with the latest btrfs-progs from today. I was doing some tests with my 7 devices, and mkfs.btrfs segfaults when executed twice, creating a filesystem with 3+ devices: # ./mkfs.btrfs /dev/sdc /dev/sdd /dev/sde WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using adding device /dev/sdd id 2 adding device /dev/sde id 3 fs created label (null) on /dev/sdc nodesize 4096 leafsize 4096 sectorsize 4096 size 8.19TB Btrfs Btrfs v0.19 # ./mkfs.btrfs /dev/sdc /dev/sdd /dev/sde WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using adding device /dev/sdd id 2 Segmentation fault If executed a third time, everything is fine and the filesystem is created normally. The segfault doesn't happen with 2 or less devices (but still happens with 4 or more devices). Backtrace: #0 0x7fba810faaf0 in strcmp () from /lib/libc.so.6 #1 0x0041e17c in device_list_add (path=0x7fff10a8a810 "/dev/sdd", disk_super=0xac9570, devid=2, fs_devices_ret=0x7fff10a8ac88) at volumes.c:119 #2 0x0041e556 in btrfs_scan_one_device (fd=11, path=0x7fff10a8a810 "/dev/sdd", fs_devices_ret=0x7fff10a8ac88, total_devs=0x7fff10a8ac80, super_offset=65536) at volumes.c:221 #3 0x0042607b in btrfs_scan_block_devices (run_ioctl=1) at utils.c:1209 #4 0x00425bf0 in btrfs_scan_for_fsid (fs_devices=0xab5020, total_devs=7, run_ioctls=1) at utils.c:1057 #5 0x0042551e in check_mounted_where (fd=9, file=0x7fff10a8ca01 "/dev/sde", where=0x0, size=0, fs_dev_ret=0x0) at utils.c:843 #6 0x00425494 in check_mounted (file=0x7fff10a8ca01 "/dev/sde") at utils.c:820 #7 0x0042be74 in main (ac=4, av=0x7fff10a8b058) at mkfs.c:1402 Thanks. -- Cyril B. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mkfs devices ordering relevant with devices of different sizes?
Hello, When creating a filesystem with devices of different sizes, the resulting filesystem total size depends on the device order specified to mkfs. When the smaller device is specified first, the second (larger) device is seen as the same size as the first. This doesn't occur when the order is reversed. It's confusing. Is this expected? I'm using the latest btrfs-progs and Linux 3.5. # ./mkfs.btrfs /dev/sda4 /dev/sdc WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using adding device /dev/sdc id 2 fs created label (null) on /dev/sda4 nodesize 4096 leafsize 4096 sectorsize 4096 size 3.97TB Btrfs Btrfs v0.19 backup6:~/btrfs-progs# ./btrfs fi show /dev/sdc Label: none uuid: 806b237e-53ed-409e-a7c1-02f101798384 Total devices 2 FS bytes used 28.00KB devid2 size 1.98TB used 2.01GB path /dev/sdc devid1 size 1.98TB used 2.03GB path /dev/sda4 Btrfs Btrfs v0.19 # ./mkfs.btrfs /dev/sdc /dev/sda4 WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using adding device /dev/sda4 id 2 fs created label (null) on /dev/sdc nodesize 4096 leafsize 4096 sectorsize 4096 size 4.71TB Btrfs Btrfs v0.19 backup6:~/btrfs-progs# ./btrfs fi show /dev/sdc Label: none uuid: 8f99c072-521b-4827-a2be-41de6ab11b4f Total devices 2 FS bytes used 28.00KB devid1 size 2.73TB used 2.03GB path /dev/sdc devid2 size 1.98TB used 2.01GB path /dev/sda4 Btrfs Btrfs v0.19 Thanks. -- Cyril B. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)
On Tue, Jul 10, 2012 at 5:26 PM, Alex Lyakas wrote: > Alexander, > this focuses on area of sending file extents: > >> +static int is_extent_unchanged(struct send_ctx *sctx, >> + struct btrfs_path *left_path, >> + struct btrfs_key *ekey) >> +{ >> + int ret = 0; >> + struct btrfs_key key; >> + struct btrfs_path *path = NULL; >> + struct extent_buffer *eb; >> + int slot; >> + struct btrfs_key found_key; >> + struct btrfs_file_extent_item *ei; >> + u64 left_disknr; >> + u64 right_disknr; >> + u64 left_offset; >> + u64 right_offset; >> + u64 left_len; >> + u64 right_len; >> + u8 left_type; >> + u8 right_type; >> + >> + path = alloc_path_for_send(); >> + if (!path) >> + return -ENOMEM; >> + >> + eb = left_path->nodes[0]; >> + slot = left_path->slots[0]; >> + >> + ei = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item); >> + left_type = btrfs_file_extent_type(eb, ei); >> + left_disknr = btrfs_file_extent_disk_bytenr(eb, ei); >> + left_len = btrfs_file_extent_num_bytes(eb, ei); >> + left_offset = btrfs_file_extent_offset(eb, ei); >> + >> + if (left_type != BTRFS_FILE_EXTENT_REG) { >> + ret = 0; >> + goto out; >> + } >> + >> + key.objectid = ekey->objectid; >> + key.type = BTRFS_EXTENT_DATA_KEY; >> + key.offset = ekey->offset; >> + >> + while (1) { >> + ret = btrfs_search_slot_for_read(sctx->parent_root, &key, >> path, >> + 0, 0); >> + if (ret < 0) >> + goto out; >> + if (ret) { >> + ret = 0; >> + goto out; >> + } >> + btrfs_item_key_to_cpu(path->nodes[0], &found_key, >> + path->slots[0]); >> + if (found_key.objectid != key.objectid || >> + found_key.type != key.type) { >> + ret = 0; >> + goto out; >> + } >> + >> + eb = path->nodes[0]; >> + slot = path->slots[0]; >> + >> + ei = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item); >> + right_type = btrfs_file_extent_type(eb, ei); >> + right_disknr = btrfs_file_extent_disk_bytenr(eb, ei); >> + right_len = btrfs_file_extent_num_bytes(eb, ei); >> + right_offset = btrfs_file_extent_offset(eb, ei); >> + btrfs_release_path(path); >> + >> + if (right_type != BTRFS_FILE_EXTENT_REG) { >> + ret = 0; >> + goto out; >> + } >> + >> + if (left_disknr != right_disknr) { >> + ret = 0; >> + goto out; >> + } >> + >> + key.offset = found_key.offset + right_len; >> + if (key.offset >= ekey->offset + left_len) { >> + ret = 1; >> + goto out; >> + } >> + } >> + >> +out: >> + btrfs_free_path(path); >> + return ret; >> +} >> + > > Should we always treat left extent with bytenr==0 as not changed? No, as we may have bytenr!=0 on the right side. > Because right now, it simply reads and sends data of such extent, > while bytenr==0 means "no data allocated here". Since we always do > send_truncate() afterwards, file size will always be correct, so we > can just skip bytenr==0 extents. This is something that could be done for full sends only. Full sends however do not call is_extent_unchanged, so the optimization is something for process_extent. In the incremental case, it may happen that left_disknr==0 and right_disknr!=0 or vice versa, so we need to do the compare no matter if one of them is ==0. process_extents could then again do some optimization and send a special command to instruct a preallocated zero block. > Same is true for BTRFS_FILE_EXTENT_PREALLOC extents, I think. Those > also don't contain real data. > So something like: > if (left_disknr == 0 || left_type == BTRFS_FILE_EXTENT_REG) { > ret = 1; > goto out; > } Do you mean "|| left_type == BTRFS_FILE_EXTENT_PREALLOC"? > before we check for BTRFS_FILE_EXTENT_REG. > > Now I have a question about the rest of the logic that decides that > extent is unchanged. I understand that if we see the same extent (same > disk_bytenr) shared between parent_root and send_root, then it must > contain the same data, even in nodatacow mode, because on a first > write to such shared extent, it is cow'ed even with nodatacow. > > However, shouldn't we check btrfs_file_extent_offset(), to make sure > that both send_root and parent_root point at the same offset into > extent from the same file offset? Because if extent_of
Re: [RFC PATCH 0/6] Experimental btrfs send/receive (btrfs-progs)
On Wed, Jul 25, 2012 at 12:41:56PM +0200, Alexander Block wrote: > On Mon, Jul 23, 2012 at 2:29 PM, Arne Jansen wrote: > > On 04.07.2012 15:39, Alexander Block wrote: > >> Hello all, > >> > >> This is the user space side of btrfs send/receive. > >> > >> You can apply them manually or use my git repo: > >> > >> git://github.com/ablock84/btrfs-progs.git (branch send) > >> > >> The branch is based on Hugo's integration-20120605 branch. I had to add a > >> temporary > >> commit to fix a bug introduced in one of the strncpy/overflow patches that > >> got into > >> btrfs-progs. This fix is not part of the btrfs send/receive patchset, but > >> you'll > >> probably need it if you want to base on the integration branch. I hope > >> this is not > >> required in the future when a new integration branch comes out. > >> > >> Example usage: > >> > >> Multiple snapshots at once: > >> btrfs send /mnt/snap[123] > snap123.btrfs > > > > a) Do we really want a single token command here, not > > btrfs filesystem send or subvol send? > In my opinion the single token is easier to type and remember. But if > enough speaks for normal subcommands this can be changed (but by > someone else as I'm running out of time). Since everything else is two commands, yes, I think we need it for consistency. (And, since it's a publically-visible interface, for acceptance of the patches -- we don't want to be changing the way the commands work after the fact). > > b) zfs makes sure stdout is not a tty, to prevent flooding > > your console. This kinda makes sense. > This makes sense. But again, this has to be done by someone else. Can you keep a brief list of such cleanups/features and dump it on the wiki as a proposed project when your time does run out, please. That way the details don't get lost, and they can be found by other people and dealt with independently. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Turning, pages turning in the widening bath, / The spine --- cannot bear the humidity. / Books fall apart; the binding cannot hold. / Page 129 is loosed upon the world. signature.asc Description: Digital signature
[PATCH 07/16] btrfs: nuke write_super from comments
From: Artem Bityutskiy The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from btrfs. Cc: Chris Mason Cc: linux-btrfs@vger.kernel.org Signed-off-by: Artem Bityutskiy --- I expect this patch to be merged via Al Viro's VFS tree. fs/btrfs/super.c |4 fs/btrfs/volumes.c |4 2 files changed, 8 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index e239915..ad31627 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -100,10 +100,6 @@ static void __save_error_info(struct btrfs_fs_info *fs_info) fs_info->fs_state = BTRFS_SUPER_FLAG_ERROR; } -/* NOTE: - * We move write_super stuff at umount in order to avoid deadlock - * for umount hold all lock. - */ static void save_error_info(struct btrfs_fs_info *fs_info) { __save_error_info(fs_info); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index ecaad40..9f2416c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1738,10 +1738,6 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) device->fs_devices = root->fs_info->fs_devices; - /* -* we don't want write_supers to jump in here with our device -* half setup -*/ mutex_lock(&root->fs_info->fs_devices->device_list_mutex); list_add_rcu(&device->dev_list, &root->fs_info->fs_devices->devices); list_add(&device->dev_alloc_list, -- 1.7.10 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/16] btrfs: nuke pdflush from comments
From: Artem Bityutskiy The pdflush thread is long gone, so this patch removes references to pdflush from btrfs comments. Cc: Chris Mason Cc: linux-btrfs@vger.kernel.org Signed-off-by: Artem Bityutskiy --- I expect this patch to be merged via Al Viro's VFS tree. fs/btrfs/inode.c|3 ++- fs/btrfs/ordered-data.c |2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a7d1921..ca8b759 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -324,7 +324,8 @@ static noinline int add_async_extent(struct async_cow *cow, * If this code finds it can't get good compression, it puts an * entry onto the work queue to write the uncompressed bytes. This * makes sure that both compressed inodes and uncompressed inodes - * are written in the same order that pdflush sent them down. + * are written in the same order that the flusher thread sent them + * down. */ static noinline int compress_file_range(struct inode *inode, struct page *locked_page, diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 643335a..051c7fe 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -596,7 +596,7 @@ void btrfs_start_ordered_extent(struct inode *inode, /* * pages in the range can be dirty, clean or writeback. We * start IO on any dirty ones so the wait doesn't stall waiting -* for pdflush to find them +* for the flusher thread to find them */ if (!test_bit(BTRFS_ORDERED_DIRECT, &entry->flags)) filemap_fdatawrite_range(inode->i_mapping, start, end); -- 1.7.10 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/6] Experimental btrfs send/receive (btrfs-progs)
On Wed, Jul 25, 2012 at 08:00:36AM -0600, Hugo Mills wrote: > On Wed, Jul 25, 2012 at 12:41:56PM +0200, Alexander Block wrote: > > On Mon, Jul 23, 2012 at 2:29 PM, Arne Jansen wrote: > > > On 04.07.2012 15:39, Alexander Block wrote: > > >> Hello all, > > >> > > >> This is the user space side of btrfs send/receive. > > >> > > >> You can apply them manually or use my git repo: > > >> > > >> git://github.com/ablock84/btrfs-progs.git (branch send) > > >> > > >> The branch is based on Hugo's integration-20120605 branch. I had to add > > >> a temporary > > >> commit to fix a bug introduced in one of the strncpy/overflow patches > > >> that got into > > >> btrfs-progs. This fix is not part of the btrfs send/receive patchset, > > >> but you'll > > >> probably need it if you want to base on the integration branch. I hope > > >> this is not > > >> required in the future when a new integration branch comes out. > > >> > > >> Example usage: > > >> > > >> Multiple snapshots at once: > > >> btrfs send /mnt/snap[123] > snap123.btrfs > > > > > > a) Do we really want a single token command here, not > > > btrfs filesystem send or subvol send? > > In my opinion the single token is easier to type and remember. But if > > enough speaks for normal subcommands this can be changed (but by > > someone else as I'm running out of time). > >Since everything else is two commands, yes, I think we need it for > consistency. (And, since it's a publically-visible interface, for > acceptance of the patches -- we don't want to be changing the way the > commands work after the fact). I've been sending and receiving while getting this code integrated. These are really first class operations, and I'd prefer they not be sub-commands. I'm afraid there isn't a lot of logic here, just what feels good to type. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/16] btrfs: nuke write_super from comments
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index ecaad40..9f2416c 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -1738,10 +1738,6 @@ int btrfs_init_new_device(struct btrfs_root *root, > char *device_path) > > device->fs_devices = root->fs_info->fs_devices; > > - /* > -* we don't want write_supers to jump in here with our device > -* half setup > -*/ > mutex_lock(&root->fs_info->fs_devices->device_list_mutex); > list_add_rcu(&device->dev_list, &root->fs_info->fs_devices->devices); > list_add(&device->dev_alloc_list, Is the locking still required for approximately the same reason? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/16] btrfs: nuke write_super from comments
On Wed, 2012-07-25 at 09:46 -0600, cwillu wrote: > > mutex_lock(&root->fs_info->fs_devices->device_list_mutex); > > list_add_rcu(&device->dev_list, > > &root->fs_info->fs_devices->devices); > > list_add(&device->dev_alloc_list, > > Is the locking still required for approximately the same reason? I do not know, I assume Chris would check that. -- Best Regards, Artem Bityutskiy signature.asc Description: This is a digitally signed message part
Re: [RFC PATCH 0/6] Experimental btrfs send/receive (btrfs-progs)
On Wed, Jul 25, 2012 at 4:00 PM, Hugo Mills wrote: > On Wed, Jul 25, 2012 at 12:41:56PM +0200, Alexander Block wrote: >> On Mon, Jul 23, 2012 at 2:29 PM, Arne Jansen wrote: >> > On 04.07.2012 15:39, Alexander Block wrote: >> >> Hello all, >> >> >> >> This is the user space side of btrfs send/receive. >> >> >> >> You can apply them manually or use my git repo: >> >> >> >> git://github.com/ablock84/btrfs-progs.git (branch send) >> >> >> >> The branch is based on Hugo's integration-20120605 branch. I had to add a >> >> temporary >> >> commit to fix a bug introduced in one of the strncpy/overflow patches >> >> that got into >> >> btrfs-progs. This fix is not part of the btrfs send/receive patchset, but >> >> you'll >> >> probably need it if you want to base on the integration branch. I hope >> >> this is not >> >> required in the future when a new integration branch comes out. >> >> >> >> Example usage: >> >> >> >> Multiple snapshots at once: >> >> btrfs send /mnt/snap[123] > snap123.btrfs >> > >> > a) Do we really want a single token command here, not >> > btrfs filesystem send or subvol send? >> In my opinion the single token is easier to type and remember. But if >> enough speaks for normal subcommands this can be changed (but by >> someone else as I'm running out of time). > >Since everything else is two commands, yes, I think we need it for > consistency. (And, since it's a publically-visible interface, for > acceptance of the patches -- we don't want to be changing the way the > commands work after the fact). > >> > b) zfs makes sure stdout is not a tty, to prevent flooding >> > your console. This kinda makes sense. >> This makes sense. But again, this has to be done by someone else. > >Can you keep a brief list of such cleanups/features and dump it on > the wiki as a proposed project when your time does run out, please. > That way the details don't get lost, and they can be found by other > people and dealt with independently. Added a page to the wiki: https://btrfs.wiki.kernel.org/index.php/Btrfs_Send/Receive > >Hugo. > > -- > === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === > PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk > --- Turning, pages turning in the widening bath, / The spine --- > cannot bear the humidity. / Books fall apart; the binding > cannot hold. / Page 129 is loosed upon the world. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/6] Experimental btrfs send/receive (btrfs-progs)
Alexander, can you pls let know like a day or two before you run out of time? I have compiled a list of questions, but also want to do more testing before I publish them all. Thanks for your work, Alex. On Wed, Jul 25, 2012 at 7:56 PM, Alexander Block wrote: > On Wed, Jul 25, 2012 at 4:00 PM, Hugo Mills wrote: >> On Wed, Jul 25, 2012 at 12:41:56PM +0200, Alexander Block wrote: >>> On Mon, Jul 23, 2012 at 2:29 PM, Arne Jansen wrote: >>> > On 04.07.2012 15:39, Alexander Block wrote: >>> >> Hello all, >>> >> >>> >> This is the user space side of btrfs send/receive. >>> >> >>> >> You can apply them manually or use my git repo: >>> >> >>> >> git://github.com/ablock84/btrfs-progs.git (branch send) >>> >> >>> >> The branch is based on Hugo's integration-20120605 branch. I had to add >>> >> a temporary >>> >> commit to fix a bug introduced in one of the strncpy/overflow patches >>> >> that got into >>> >> btrfs-progs. This fix is not part of the btrfs send/receive patchset, >>> >> but you'll >>> >> probably need it if you want to base on the integration branch. I hope >>> >> this is not >>> >> required in the future when a new integration branch comes out. >>> >> >>> >> Example usage: >>> >> >>> >> Multiple snapshots at once: >>> >> btrfs send /mnt/snap[123] > snap123.btrfs >>> > >>> > a) Do we really want a single token command here, not >>> > btrfs filesystem send or subvol send? >>> In my opinion the single token is easier to type and remember. But if >>> enough speaks for normal subcommands this can be changed (but by >>> someone else as I'm running out of time). >> >>Since everything else is two commands, yes, I think we need it for >> consistency. (And, since it's a publically-visible interface, for >> acceptance of the patches -- we don't want to be changing the way the >> commands work after the fact). >> >>> > b) zfs makes sure stdout is not a tty, to prevent flooding >>> > your console. This kinda makes sense. >>> This makes sense. But again, this has to be done by someone else. >> >>Can you keep a brief list of such cleanups/features and dump it on >> the wiki as a proposed project when your time does run out, please. >> That way the details don't get lost, and they can be found by other >> people and dealt with independently. > Added a page to the wiki: > https://btrfs.wiki.kernel.org/index.php/Btrfs_Send/Receive >> >>Hugo. >> >> -- >> === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === >> PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk >> --- Turning, pages turning in the widening bath, / The spine --- >> cannot bear the humidity. / Books fall apart; the binding >> cannot hold. / Page 129 is loosed upon the world. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/6] Experimental btrfs send/receive (btrfs-progs)
On Wed, Jul 25, 2012 at 7:10 PM, Alex Lyakas wrote: > Alexander, > can you pls let know like a day or two before you run out of time? > I have compiled a list of questions, but also want to do more testing > before I publish them all. My flight goes on 6. August...after that I don't know when I'm back. I try my best to work on the most important issues until then, but time is very limited already now due to the preparations I need to take care of. > > Thanks for your work, > Alex. > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)
Alexander, >> Same is true for BTRFS_FILE_EXTENT_PREALLOC extents, I think. Those >> also don't contain real data. >> So something like: >> if (left_disknr == 0 || left_type == BTRFS_FILE_EXTENT_REG) { >> ret = 1; >> goto out; >> } > Do you mean "|| left_type == BTRFS_FILE_EXTENT_PREALLOC"? I see your point about bytenr==0, I missed that on the parent tree it can be something else. As for PREALLOC: can it happen that on differential send we see extent of type BTRFS_FILE_EXTENT_PREALLOC? And can it happen that parent had some real data extent in that place? I don't know the answer, but if yes, then we must treat PREALLOC as normal extent. So this case is similar to bytenr==0. Thanks, Alex. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 6/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 1)
Thanks for the review :) On 07/18/2012 08:59 AM, Arne Jansen wrote: On 04.07.2012 15:38, Alexander Block wrote: This patch introduces the BTRFS_IOC_SEND ioctl that is required for send. It allows btrfs-progs to implement full and incremental sends. Patches for btrfs-progs will follow. I had to split the patch as it got larger then 100k which is the limit for the mailing list. The first part only contains the send.h header and the helper functions for TLV handling and long path name handling and some other helpers. The second part contains the actual send logic from send.c Signed-off-by: Alexander Block --- fs/btrfs/Makefile |2 +- fs/btrfs/ioctl.h | 10 + fs/btrfs/send.c | 1009 + fs/btrfs/send.h | 126 +++ 4 files changed, 1146 insertions(+), 1 deletion(-) create mode 100644 fs/btrfs/send.c create mode 100644 fs/btrfs/send.h diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 0c4fa2b..f740644 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -8,7 +8,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \ export.o tree-log.o free-space-cache.o zlib.o lzo.o \ compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ - reada.o backref.o ulist.o + reada.o backref.o ulist.o send.o btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index c9e3fac..282bc64 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -304,6 +304,15 @@ struct btrfs_ioctl_received_subvol_args { __u64 reserved[16]; }; +struct btrfs_ioctl_send_args { + __s64 send_fd; /* in */ + __u64 clone_sources_count; /* in */ + __u64 __user *clone_sources;/* in */ + __u64 parent_root; /* in */ + __u64 flags;/* in */ + __u64 reserved[4]; /* in */ +}; + #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \ struct btrfs_ioctl_vol_args) #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \ @@ -371,6 +380,7 @@ struct btrfs_ioctl_received_subvol_args { #define BTRFS_IOC_SET_RECEIVED_SUBVOL _IOWR(BTRFS_IOCTL_MAGIC, 37, \ struct btrfs_ioctl_received_subvol_args) +#define BTRFS_IOC_SEND _IOW(BTRFS_IOCTL_MAGIC, 38, struct btrfs_ioctl_send_args) #define BTRFS_IOC_GET_DEV_STATS _IOWR(BTRFS_IOCTL_MAGIC, 52, \ struct btrfs_ioctl_get_dev_stats) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c new file mode 100644 index 000..47a2557 --- /dev/null +++ b/fs/btrfs/send.c @@ -0,0 +1,1009 @@ +/* + * Copyright (C) 2012 Alexander Block. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; if not, write to the + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, + * Boston, MA 021110-1307, USA. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "send.h" +#include "backref.h" +#include "locking.h" +#include "disk-io.h" +#include "btrfs_inode.h" +#include "transaction.h" + +static int g_verbose = 0; + +#define verbose_printk(...) if (g_verbose) printk(__VA_ARGS__) Maybe pr_debug is interesting to you. The advantage of this solution was that I could enable verbose output while I was in the debugger and single stepping. Not sure if I could do that with pr_debug. When send/receive gets stable and less debugging is needed, we can change this to pr_debug. + +/* + * A fs_path is a helper to dynamically build path names with unknown size. + * It reallocates the internal buffer on demand. + * It allows fast adding of path elements on the right side (normal path) and + * fast adding to the left side (reversed path). A reversed path can also be + * unreversed if needed. + */ +struct fs_path { + union { + struct { + char *start; + char *end; + char *prepared; + + char *buf; + int buf_len; + int reversed:1; + int virtual_mem:1; s/int/unsigned int/ Changed for the bitfields but not for buf_len. + char inline_buf[]; +
Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)
On Wed, Jul 25, 2012 at 7:20 PM, Alex Lyakas wrote: > Alexander, > >>> Same is true for BTRFS_FILE_EXTENT_PREALLOC extents, I think. Those >>> also don't contain real data. >>> So something like: >>> if (left_disknr == 0 || left_type == BTRFS_FILE_EXTENT_REG) { >>> ret = 1; >>> goto out; >>> } >> Do you mean "|| left_type == BTRFS_FILE_EXTENT_PREALLOC"? > > I see your point about bytenr==0, I missed that on the parent tree it > can be something else. > > As for PREALLOC: can it happen that on differential send we see extent > of type BTRFS_FILE_EXTENT_PREALLOC? And can it happen that parent had > some real data extent in that place? I don't know the answer, but if > yes, then we must treat PREALLOC as normal extent. So this case is > similar to bytenr==0. > I also don't know if that may happen. Currently, only REG extents are checked by is_extent_unchanged. All other types are regarded as changed and will be sent. So in the worst case the stream gets larget then it should be, but we won't loose data. I need to leave in a few minutes and will continue working on btrfs send/receive v2 later today. We should probably postpone "optimizations" (actually bug fixing) here for later...don't know if I find enough time to investigate more. > Thanks, > Alex. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Btrfs: fix error path in create_pending_snapshot()
This patch fixes the following problem: - If we failed to deal with the delayed dir items, we should abort transaction, just as its comment said. Fix it. - If root reference or root back reference insertion failed, we should abort transaction. Fix it. - Do not restore the trans->rsv if we doesn't change it. - make the error path more clearly. Signed-off-by: Miao Xie --- fs/btrfs/transaction.c | 38 +- 1 files changed, 17 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index b72b068..6d89603 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -932,18 +932,16 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, u64 objectid; u64 root_flags; - rsv = trans->block_rsv; - new_root_item = kmalloc(sizeof(*new_root_item), GFP_NOFS); if (!new_root_item) { ret = pending->error = -ENOMEM; - goto fail; + goto root_item_alloc_fail; } ret = btrfs_find_free_objectid(tree_root, &objectid); if (ret) { pending->error = ret; - goto fail; + goto no_free_objectid; } btrfs_reloc_pre_snapshot(trans, pending, &to_reserve); @@ -953,7 +951,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, to_reserve); if (ret) { pending->error = ret; - goto fail; + goto no_free_objectid; } } @@ -961,6 +959,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, key.offset = (u64)-1; key.type = BTRFS_ROOT_ITEM_KEY; + rsv = trans->block_rsv; trans->block_rsv = &pending->block_rsv; dentry = pending->dentry; @@ -980,17 +979,16 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, BTRFS_FT_DIR, index); if (ret == -EEXIST) { pending->error = -EEXIST; - dput(parent); goto fail; } else if (ret) { - goto abort_trans_dput; + goto abort_trans; } btrfs_i_size_write(parent_inode, parent_inode->i_size + dentry->d_name.len * 2); ret = btrfs_update_inode(trans, parent_root, parent_inode); if (ret) - goto abort_trans_dput; + goto abort_trans; /* * pull in the delayed directory update @@ -999,10 +997,8 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, * snapshot */ ret = btrfs_run_delayed_items(trans, root); - if (ret) { /* Transaction aborted */ - dput(parent); - goto fail; - } + if (ret)/* Transaction aborted */ + goto abort_trans; record_root_in_trans(trans, root); btrfs_set_root_last_snapshot(&root->root_item, trans->transid); @@ -1021,7 +1017,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, if (ret) { btrfs_tree_unlock(old); free_extent_buffer(old); - goto abort_trans_dput; + goto abort_trans; } btrfs_set_lock_blocking(old); @@ -1031,7 +1027,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, btrfs_tree_unlock(old); free_extent_buffer(old); if (ret) - goto abort_trans_dput; + goto abort_trans; /* see comments in should_cow_block() */ root->force_cow = 1; @@ -1044,7 +1040,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, btrfs_tree_unlock(tmp); free_extent_buffer(tmp); if (ret) - goto abort_trans_dput; + goto abort_trans; /* * insert root back/forward references @@ -1053,9 +1049,8 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, parent_root->root_key.objectid, btrfs_ino(parent_inode), index, dentry->d_name.name, dentry->d_name.len); - dput(parent); if (ret) - goto fail; + goto abort_trans; key.offset = (u64)-1; pending->snap = btrfs_read_fs_root_no_name(root->fs_info, &key); @@ -1067,16 +1062,17 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, ret = btrfs_reloc_post_snapshot(trans, pending); if (ret) goto abort_trans; - ret = 0; fail: - kfree(new_root_item); + dput(parent); trans->block_rsv = rs
[PATCH 2/2] Btrfs: fix the snapshot that should not exist
The snapshot should be the image of the fs tree before it was created, so the metadata of the snapshot should not exist in the its tree. But now, we found the directory item and directory name index is in both the snapshot tree and the fs tree. It introduces some problem and makes the users feel strange: # mkfs.btrfs /dev/sda1 # mount /dev/sda1 /mnt # mkdir /mnt/1 # cd /mnt/1 # btrfs subvolume snapshot /mnt snap0 # ll /mnt/1 total 0 drwxr-xr-x 1 root root 10 Ju1 24 12:11 1 ^^^ # ll /mnt/1/snap0/ total 0 drwxr-xr-x 1 root root 10 Ju1 24 12:11 1 ^^^ It is also 10, but... # ll /mnt/1/snap0/1 total 0 [None] # cd /mnt/1/snap0/1/snap0 [Enter a unexisted directory successfully...] There is nothing in the directory 1 in snap0, but btrfs told the length of this directory is 10. Beside that, we can enter an unexisted directory, it is very strange to the users. # btrfs subvolume snapshot /mnt/1/snap0 /mnt/snap1 # ll /mnt/1/snap0/1/ total 0 [None] # ll /mnt/snap1/1/ total 0 drwxr-xr-x 1 root root 0 Ju1 24 12:14 snap0 And the source of snap1 did have any directory in Directory 1, but snap1 have a snap0, it is different between the source and the snapshot. So I think we should insert directory item and directory name index and update the parent inode as the last step of snapshot creation, and do not leave the useless metadata in the tree. Signed-off-by: Miao Xie --- fs/btrfs/transaction.c | 52 ++- 1 files changed, 37 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 6d89603..e9eceee 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -922,6 +922,8 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, struct btrfs_root *parent_root; struct btrfs_block_rsv *rsv; struct inode *parent_inode; + struct btrfs_path *path; + struct btrfs_dir_item *dir_item; struct dentry *parent; struct dentry *dentry; struct extent_buffer *tmp; @@ -932,6 +934,12 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, u64 objectid; u64 root_flags; + path = btrfs_alloc_path(); + if (!path) { + ret = pending->error = -ENOMEM; + goto path_alloc_fail; + } + new_root_item = kmalloc(sizeof(*new_root_item), GFP_NOFS); if (!new_root_item) { ret = pending->error = -ENOMEM; @@ -973,22 +981,20 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, */ ret = btrfs_set_inode_index(parent_inode, &index); BUG_ON(ret); /* -ENOMEM */ - ret = btrfs_insert_dir_item(trans, parent_root, - dentry->d_name.name, dentry->d_name.len, - parent_inode, &key, - BTRFS_FT_DIR, index); - if (ret == -EEXIST) { + + /* check if there is a file/dir which has the same name. */ + dir_item = btrfs_lookup_dir_item(NULL, parent_root, path, +btrfs_ino(parent_inode), +dentry->d_name.name, +dentry->d_name.len, 0); + if (dir_item != NULL && !IS_ERR(dir_item)) { pending->error = -EEXIST; goto fail; - } else if (ret) { + } else if (IS_ERR(dir_item)) { + ret = PTR_ERR(dir_item); goto abort_trans; } - - btrfs_i_size_write(parent_inode, parent_inode->i_size + -dentry->d_name.len * 2); - ret = btrfs_update_inode(trans, parent_root, parent_inode); - if (ret) - goto abort_trans; + btrfs_release_path(path); /* * pull in the delayed directory update @@ -1062,17 +1068,33 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, ret = btrfs_reloc_post_snapshot(trans, pending); if (ret) goto abort_trans; + + ret = btrfs_insert_dir_item(trans, parent_root, + dentry->d_name.name, dentry->d_name.len, + parent_inode, &key, + BTRFS_FT_DIR, index); + /* We have check the name at the beginning, so it is impossible. */ + BUG_ON(ret == -EEXIST); + if (ret) + goto abort_trans; + + btrfs_i_size_write(parent_inode, parent_inode->i_size + +dentry->d_name.len * 2); + ret = btrfs_update_inode(trans, parent_root, parent_inode); + if (ret) + goto abort_trans; fail: dput(parent); trans->block_rsv = rsv; no_free_objectid: kfree(new_root_item); root_item_a