Re: btrfs send problems
I'm on my phone so apologies for top posting but please try btrfs-next, I recently fixed a pretty epic performance problem with send which should help you, I'd like to see how much. Thanks, Josef Jim Salter wrote: Hi list - I'm having problems with btrfs send in general, and incremental send in particular. 1. Performance: in kernel 3.11, btrfs send would send data at 500+MB/sec from a Samsung 840 series solid state drive. In kernel 3.12 and up, btrfs send will only send 30-ish MB/sec from the same drive - though if you interrupt a btrfs send in progress, it will "catch up" to where it was at 500+ MB/sec. This is pretty weird and frustrating. Even weirder and more frustrating, even at 30-ish MB/sec, a btrfs send has a very significant performance impact on the underlying system - which is very, very odd; 30MB/sec isn't even a tiny fraction of the throughput that drive is capable of, and being an SSD, it isn't really subject to degradation with a little extra IOPS concurrency. 2. Precalculation: There's no way that I'm aware of currently to pre-determine the size of an incremental send, so I can't get any kind of predictive progress bar; this is something I SORELY miss from ZFS. It also makes snapshot management more difficult, because AFAICT there's no way to see how much space on disk is referenced solely by a given snapshot. 3. Incremental sends too big?: incremental btrfs send appears to be sending too much data. I have a "test production" system with a couple of Windows 2008 VMs on it, and it takes hourly rolling snapshots, then does an incremental btrfs send to another system from each snapshot to the next periodically. Problem is, EACH hourly snapshot replication is running 6-10GB of data, which seems like far too much. I don't have any particular way to prove it, since I don't know of a great way to actually calculate the number of changed blocks - but the two Windows 2008 VMs have no native pagefile, so they aren't burning data that way, they're each running VirtIO drivers, and the users aren't changing 6-10GB of data per DAY, much less per hour. Finally, the 6-10GB incremental send size doesn't change significantly whether the increment in question is during the middle of the working day, or in the middle of the night when no users are connected (and when it isn't Patch Tuesday, so it's not like jillions of Windows Updates are coming in either - not that they constitute 120GB-240GB of data!) I know that last is maddeningly vague, but FWIW I have 30-ish similar setups on ZFS, operating the same way, each with roughly the same number of users running roughly the same set of applications, and those ZFS incrementals are all very consistent; middle-of-the-night incrementals on ZFS running well under 100MB apiece and total bandwidth for an entire day's incremental replication being well under how much bandwidth btrfs send is eating every hour. =\ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at https://urldefense.proofpoint.com/v1/url?u=http://vger.kernel.org/majordomo-info.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=yEXVJ85k3S52RAxVbFHaIbe5eV6dKbkfn%2FXLiZd%2BG8E%3D%0A&s=43ef0c011bfafafb636ec0fa76c0e5076fba95df51b64302e1632478b5880fb4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] Add command btrfs filesystem disk-usage
On Feb 14, 2014, at 11:34 AM, Hugo Mills wrote: > On Fri, Feb 14, 2014 at 07:27:57PM +0100, Goffredo Baroncelli wrote: >> On 02/14/2014 07:11 PM, Roman Mamedov wrote: >>> On Fri, 14 Feb 2014 18:57:03 +0100 >>> Goffredo Baroncelli wrote: >>> On 02/13/2014 10:00 PM, Roman Mamedov wrote: > On Thu, 13 Feb 2014 20:49:08 +0100 > Goffredo Baroncelli wrote: > >> Thanks for the comments, however I don't like du not usage; but you are >> right >> when you don't like "disk-usage". What about "btrfs filesystem >> chunk-usage" ? > > Personally I don't see the point of being super-pedantic here, i.e. "look > this > is not just filesystem usage, this is filesystem CHUNK usage"... > Consistency > of having a matching "dev usage" and "fi usage" would have been nicer. What about "btrfs filesystem chunk-usage" ? >>> >>> Uhm? Had to reread this several times, but it looks like you're repeating >>> exactly the same question that I was already answering in the quoted part. >>> >>> To clarify even more, personally I'd like if there would have been "btrfs >>> dev >>> usage" and "btrfs fi usage". Do not see the need to specifically make the >>> 2nd >>> one "chunk-usage" instead of simply "usage". >> >> I don't like "usage" because it to me seems to be too much generic. >> Because both "btrfs filesystem disk-usage" and "btrfs device disk-usage" >> report about chunk (and/or block group) infos, I am investigating >> about >> - btrfs filesystem chunk-usage >> - btrfs device chunk-usage > > Most people aren't going to know (or care) what a chunk is. I'm > much happier with Roman's suggestion of btrfs {fi,dev} usage. Or btrfs filesystem examine, or btrfs filesystem detail, which are semi-consistent with mdadm for obtaining similar data. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests: test for atime-related mount options
On Fri, Feb 14, 2014 at 09:02:08PM -0600, Eric Sandeen wrote: > On 2/14/14, 7:39 PM, Dave Chinner wrote: > > On Fri, Feb 14, 2014 at 05:48:59PM -0600, Eric Sandeen wrote: > >> On 2/14/14, 4:24 PM, Dave Chinner wrote: > >>> On Fri, Feb 14, 2014 at 10:41:16AM -0600, Eric Sandeen wrote: > On 2/14/14, 10:39 AM, David Sterba wrote: > > On Thu, Feb 13, 2014 at 10:42:55AM -0600, Eric Sandeen wrote: > >>> +cat /proc/mounts | grep "$SCRATCH_MNT" | grep relatime >> > >>> $seqres.full > >>> +[ $? -ne 0 ] && echo "The relatime mount option should be the > >>> default." > >> > >> Ok, I guess "relatime" in /proc/mounts is from core vfs code and > >> should be there for the foreseeable future, so seems ok. > >> > >> But - relatime was added in v2.6.20, and made default in 2.6.30. So > >> testing older kernels may not go as expected; it'd probably be best to > >> catch situations where relatime isn't available (< 2.6.20) or not > >> default (< 2.6.30), by explicitly mounting with relatime, and skipping > >> relatime/strictatime tests if that fails? > > > > Is there some consensus what's the lowest kernel version to be supported > > by xfstests? 2.6.32 is the lowest base for kernels in use today, so > > worrying about anything older does not seem necessary. > > > > I don't know that it's been discussed - selfishly, I know our QE uses > xfstests on RHEL5, which is 2.6.18-based. > >>> > >>> Sure, but they can just add the test to a "rhel5-expunged" file and > >>> they don't have to care about tests that won't work on RHEL 5 or > >>> other older kernels. Or to send patches to add "_requires_relatime" > >>> so that it automatically does the right thing for older kernels. > >> > >> sure but some of this test is still valid on a kernel w/o relatime. > >> And since it's the default, "relatime" might disappear from /proc/mounts > >> some day anyway, so explicitly mounting with the option & failing > >> if that fails might be good future-proofind in any case. > >> > >> *shrug* > >> > >> It was just a request, not a demand. :) Koen, you can do with > >> it whatever you like. Reviews aren't ultimatums. :) > >> > >> If xfstests upstream is only targeted at the current kernel, that's > >> fine, but maye we should make that a little more explicit. > > > > That's not what I meant. ;) > > > > Really, all I'm saying is that we can't expect people who are > > writing tests that work on current kernels to know what is necessary > > to make tests work on 7 year old distros that don't support a > > feature that has been in mainline for 5 years. Hence that shouldn't > > be a barrier to having a test committed as we have mechanisms for > > distro QE to handle these sorts of issues... > > Sure, that's perfectly fair. > > I wasn't really thinking of RHEL5 when I made my first comment, > just general portability across kernels. dsterba suggested that > 2.6.32 is the oldest kernel used, and I pointed out that we do > still use 2.6.18. :) > > Anyway, for general portability across releases, perhaps rather than: > > +cat /proc/mounts | grep "$SCRATCH_MNT" | grep relatime >> $seqres.full > +[ $? -ne 0 ] && echo "The relatime mount option should be the default." > > which would fail the test, it should just [notrun] if relatime > isn't there, for any reason, on any kernel, if relatime is not > default as expected for the test framework. i.e. > > +[ $? -ne 0 ] && _notrun "The relatime mount option is not the default." I disagree - the test doesn't need to care what the default mount option is - it's a relatively silly thing to test because it doesn't determine whether the behaviour of the option is correct or not. Especially as the test is checking the behaviour of specific atime mount options so it should just specify each one it is testing and ignoring what the default is. IOWs, the default atime behaviour just doesn't matter for the purpose of the test What the test actually cares about is this: _require_relatime() { _scratch_mkfs > /dev/null 2>&1 _mount -t $FSTYP -o relatime $SCRATCH_DEV $SCRATCH_MNT || \ _notrun "relatime not supported by the current kernel" } Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs send problems
Hi list - I'm having problems with btrfs send in general, and incremental send in particular. 1. Performance: in kernel 3.11, btrfs send would send data at 500+MB/sec from a Samsung 840 series solid state drive. In kernel 3.12 and up, btrfs send will only send 30-ish MB/sec from the same drive - though if you interrupt a btrfs send in progress, it will "catch up" to where it was at 500+ MB/sec. This is pretty weird and frustrating. Even weirder and more frustrating, even at 30-ish MB/sec, a btrfs send has a very significant performance impact on the underlying system - which is very, very odd; 30MB/sec isn't even a tiny fraction of the throughput that drive is capable of, and being an SSD, it isn't really subject to degradation with a little extra IOPS concurrency. 2. Precalculation: There's no way that I'm aware of currently to pre-determine the size of an incremental send, so I can't get any kind of predictive progress bar; this is something I SORELY miss from ZFS. It also makes snapshot management more difficult, because AFAICT there's no way to see how much space on disk is referenced solely by a given snapshot. 3. Incremental sends too big?: incremental btrfs send appears to be sending too much data. I have a "test production" system with a couple of Windows 2008 VMs on it, and it takes hourly rolling snapshots, then does an incremental btrfs send to another system from each snapshot to the next periodically. Problem is, EACH hourly snapshot replication is running 6-10GB of data, which seems like far too much. I don't have any particular way to prove it, since I don't know of a great way to actually calculate the number of changed blocks - but the two Windows 2008 VMs have no native pagefile, so they aren't burning data that way, they're each running VirtIO drivers, and the users aren't changing 6-10GB of data per DAY, much less per hour. Finally, the 6-10GB incremental send size doesn't change significantly whether the increment in question is during the middle of the working day, or in the middle of the night when no users are connected (and when it isn't Patch Tuesday, so it's not like jillions of Windows Updates are coming in either - not that they constitute 120GB-240GB of data!) I know that last is maddeningly vague, but FWIW I have 30-ish similar setups on ZFS, operating the same way, each with roughly the same number of users running roughly the same set of applications, and those ZFS incrementals are all very consistent; middle-of-the-night incrementals on ZFS running well under 100MB apiece and total bandwidth for an entire day's incremental replication being well under how much bandwidth btrfs send is eating every hour. =\ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Fix 32/64-bit problem with BTRFS_SET_RECEIVED_SUBVOL ioctl
On Sat, Feb 15, 2014 at 09:42:30PM +0200, Alex Lyakas wrote: > Hello Hugo, > > Is this issue specific to the receive ioctl? Yes. Everything else I've tried has worked perfectly on that test system. The issue is not pointer size (which is, I think, your thunking idea below), but structure alignment. > Because what you are describing applies to any user-kernel ioctl-based > interface, when you compile the user-space as 32-bit, which the kernel > space has been compiled as 64-bit. For that purpose, I believe, there > exists the "compat_ioctl" callback. It's implementation should do > "thunking", i.e., treat the user-space structure as if it were > compiled for 32-bit, and unpack it properly. > > Today, however, btrfs supplies the same callback both for > "unlocked_ioctl" and "compat_ioctl". So we should see the same problem > with all ioctls, if I am not missing anything. As far as I can see, the other ioctls are using structures which end up being aligned to 8-byte boundaries regardless of the bitness of the compiler target. Also, we don't tend to pass pointers in the btrfs ioctls, so we don't have that problem. Hugo. > On Mon, Jan 6, 2014 at 10:50 AM, Hugo Mills wrote: > > On Sun, Jan 05, 2014 at 06:26:11PM +, Hugo Mills wrote: > >> On Sun, Jan 05, 2014 at 05:55:27PM +, Hugo Mills wrote: > >> > The structure for BTRFS_SET_RECEIVED_IOCTL packs differently on 32-bit > >> > and 64-bit systems. This means that it is impossible to use btrfs > >> > receive on a system with a 64-bit kernel and 32-bit userspace, because > >> > the structure size (and hence the ioctl number) is different. > >> > > >> > This patch adds a compatibility structure and ioctl to deal with the > >> > above case. > >> > >>Oops, forgot to mention -- this has been compile tested, but not > >> actually run yet. The machine in question is several miles away and is > >> a production machine (it's my work desktop, and I can't afford much > >> downtime on it). > > > >... And it doesn't even compile properly, now I come to build a > > .deb. I'm still interested in comments about the general approach, but > > the specific patch is a load of balls. > > > >Hugo. > > > >>Hugo. > >> > >> > Signed-off-by: Hugo Mills > >> > --- > >> > fs/btrfs/ioctl.c | 95 > >> > +++- > >> > 1 file changed, 87 insertions(+), 8 deletions(-) > >> > > >> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > >> > index 21da576..e186439 100644 > >> > --- a/fs/btrfs/ioctl.c > >> > +++ b/fs/btrfs/ioctl.c > >> > @@ -57,6 +57,32 @@ > >> > #include "send.h" > >> > #include "dev-replace.h" > >> > > >> > +#ifdef CONFIG_64BIT > >> > +/* If we have a 32-bit userspace and 64-bit kernel, then the UAPI > >> > + * structures are incorrect, as the timespec structure from userspace > >> > + * is 4 bytes too small. We define these alternatives here to teach > >> > + * the kernel about the 32-bit struct packing. > >> > + */ > >> > +struct btrfs_ioctl_timespec { > >> > + __u64 sec; > >> > + __u32 nsec; > >> > +} ((__packed__)); > >> > + > >> > +struct btrfs_ioctl_received_subvol_args { > >> > + charuuid[BTRFS_UUID_SIZE]; /* in */ > >> > + __u64 stransid; /* in */ > >> > + __u64 rtransid; /* out */ > >> > + struct btrfs_ioctl_timespec stime; /* in */ > >> > + struct btrfs_ioctl_timespec rtime; /* out */ > >> > + __u64 flags; /* in */ > >> > + __u64 reserved[16]; /* in */ > >> > +} ((__packed__)); > >> > +#endif > >> > + > >> > +#define BTRFS_IOC_SET_RECEIVED_SUBVOL_32 _IOWR(BTRFS_IOCTL_MAGIC, 37, \ > >> > + struct btrfs_ioctl_received_subvol_args_32) > >> > + > >> > + > >> > static int btrfs_clone(struct inode *src, struct inode *inode, > >> >u64 off, u64 olen, u64 olen_aligned, u64 destoff); > >> > > >> > @@ -4313,10 +4339,69 @@ static long btrfs_ioctl_quota_rescan_wait(struct > >> > file *file, void __user *arg) > >> > return btrfs_qgroup_wait_for_completion(root->fs_info); > >> > } > >> > > >> > +#ifdef CONFIG_64BIT > >> > +static long btrfs_ioctl_set_received_subvol_32(struct file *file, > >> > + void __user *arg) > >> > +{ > >> > + struct btrfs_ioctl_received_subvol_args_32 *args32 = NULL; > >> > + struct btrfs_ioctl_received_subvol_args *args64 = NULL; > >> > + int ret = 0; > >> > + > >> > + args32 = memdup_user(arg, sizeof(*args32)); > >> > + if (IS_ERR(args32)) { > >> > + ret = PTR_ERR(args32); > >> > + args32 = NULL; > >> > + goto out; > >> > + } > >> > + > >> > + args64 = malloc(sizeof(*args64)); > >> > + if (IS_ERR(args64)) { > >> > + ret = PTR_ERR(args64); > >> > + args64 = NULL; > >> > + goto out; > >> > + } > >> > + > >> > + memcpy(args64->uuid, args32->uuid, BTRFS_UUID_SIZE); > >> > + args64->stransid = args32->stransid; > >>
Re: [PATCH] btrfs: Fix 32/64-bit problem with BTRFS_SET_RECEIVED_SUBVOL ioctl
Hello Hugo, Is this issue specific to the receive ioctl? Because what you are describing applies to any user-kernel ioctl-based interface, when you compile the user-space as 32-bit, which the kernel space has been compiled as 64-bit. For that purpose, I believe, there exists the "compat_ioctl" callback. It's implementation should do "thunking", i.e., treat the user-space structure as if it were compiled for 32-bit, and unpack it properly. Today, however, btrfs supplies the same callback both for "unlocked_ioctl" and "compat_ioctl". So we should see the same problem with all ioctls, if I am not missing anything. Thanks, Alex. On Mon, Jan 6, 2014 at 10:50 AM, Hugo Mills wrote: > On Sun, Jan 05, 2014 at 06:26:11PM +, Hugo Mills wrote: >> On Sun, Jan 05, 2014 at 05:55:27PM +, Hugo Mills wrote: >> > The structure for BTRFS_SET_RECEIVED_IOCTL packs differently on 32-bit >> > and 64-bit systems. This means that it is impossible to use btrfs >> > receive on a system with a 64-bit kernel and 32-bit userspace, because >> > the structure size (and hence the ioctl number) is different. >> > >> > This patch adds a compatibility structure and ioctl to deal with the >> > above case. >> >>Oops, forgot to mention -- this has been compile tested, but not >> actually run yet. The machine in question is several miles away and is >> a production machine (it's my work desktop, and I can't afford much >> downtime on it). > >... And it doesn't even compile properly, now I come to build a > .deb. I'm still interested in comments about the general approach, but > the specific patch is a load of balls. > >Hugo. > >>Hugo. >> >> > Signed-off-by: Hugo Mills >> > --- >> > fs/btrfs/ioctl.c | 95 >> > +++- >> > 1 file changed, 87 insertions(+), 8 deletions(-) >> > >> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c >> > index 21da576..e186439 100644 >> > --- a/fs/btrfs/ioctl.c >> > +++ b/fs/btrfs/ioctl.c >> > @@ -57,6 +57,32 @@ >> > #include "send.h" >> > #include "dev-replace.h" >> > >> > +#ifdef CONFIG_64BIT >> > +/* If we have a 32-bit userspace and 64-bit kernel, then the UAPI >> > + * structures are incorrect, as the timespec structure from userspace >> > + * is 4 bytes too small. We define these alternatives here to teach >> > + * the kernel about the 32-bit struct packing. >> > + */ >> > +struct btrfs_ioctl_timespec { >> > + __u64 sec; >> > + __u32 nsec; >> > +} ((__packed__)); >> > + >> > +struct btrfs_ioctl_received_subvol_args { >> > + charuuid[BTRFS_UUID_SIZE]; /* in */ >> > + __u64 stransid; /* in */ >> > + __u64 rtransid; /* out */ >> > + struct btrfs_ioctl_timespec stime; /* in */ >> > + struct btrfs_ioctl_timespec rtime; /* out */ >> > + __u64 flags; /* in */ >> > + __u64 reserved[16]; /* in */ >> > +} ((__packed__)); >> > +#endif >> > + >> > +#define BTRFS_IOC_SET_RECEIVED_SUBVOL_32 _IOWR(BTRFS_IOCTL_MAGIC, 37, \ >> > + struct btrfs_ioctl_received_subvol_args_32) >> > + >> > + >> > static int btrfs_clone(struct inode *src, struct inode *inode, >> >u64 off, u64 olen, u64 olen_aligned, u64 destoff); >> > >> > @@ -4313,10 +4339,69 @@ static long btrfs_ioctl_quota_rescan_wait(struct >> > file *file, void __user *arg) >> > return btrfs_qgroup_wait_for_completion(root->fs_info); >> > } >> > >> > +#ifdef CONFIG_64BIT >> > +static long btrfs_ioctl_set_received_subvol_32(struct file *file, >> > + void __user *arg) >> > +{ >> > + struct btrfs_ioctl_received_subvol_args_32 *args32 = NULL; >> > + struct btrfs_ioctl_received_subvol_args *args64 = NULL; >> > + int ret = 0; >> > + >> > + args32 = memdup_user(arg, sizeof(*args32)); >> > + if (IS_ERR(args32)) { >> > + ret = PTR_ERR(args32); >> > + args32 = NULL; >> > + goto out; >> > + } >> > + >> > + args64 = malloc(sizeof(*args64)); >> > + if (IS_ERR(args64)) { >> > + ret = PTR_ERR(args64); >> > + args64 = NULL; >> > + goto out; >> > + } >> > + >> > + memcpy(args64->uuid, args32->uuid, BTRFS_UUID_SIZE); >> > + args64->stransid = args32->stransid; >> > + args64->rtransid = args32->rtransid; >> > + args64->stime.sec = args32->stime.sec; >> > + args64->stime.nsec = args32->stime.nsec; >> > + args64->rtime.sec = args32->rtime.sec; >> > + args64->rtime.nsec = args32->rtime.nsec; >> > + args64->flags = args32->flags; >> > + >> > + ret = _btrfs_ioctl_set_received_subvol(file, args64); >> > + >> > +out: >> > + kfree(args32); >> > + kfree(args64); >> > + return ret; >> > +} >> > +#endif >> > + >> > static long btrfs_ioctl_set_received_subvol(struct file *file, >> > void __user *arg) >> > { >> > struct btrfs_ioctl_received_subvol_args *sa = NULL; >> > + int ret = 0; >> > + >> > + s
Re: [PATCH] Btrfs: more send support for parent/child dir relationship inversion
On Sat, Feb 1, 2014 at 2:02 AM, Filipe David Borba Manana wrote: > The commit titled "Btrfs: fix infinite path build loops in incremental send" > didn't cover a particular case where the parent-child relationship inversion > of directories doesn't imply a rename of the new parent directory. This was > due to a simple logic mistake, a logical and instead of a logical or. > > Steps to reproduce: > > $ mkfs.btrfs -f /dev/sdb3 > $ mount /dev/sdb3 /mnt/btrfs > $ mkdir -p /mnt/btrfs/a/b/bar1/bar2/bar3/bar4 > $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap1 > $ mv /mnt/btrfs/a/b/bar1/bar2/bar3/bar4 /mnt/btrfs/a/b/k44 > $ mv /mnt/btrfs/a/b/bar1/bar2/bar3 /mnt/btrfs/a/b/k44 > $ mv /mnt/btrfs/a/b/bar1/bar2 /mnt/btrfs/a/b/k44/bar3 > $ mv /mnt/btrfs/a/b/bar1 /mnt/btrfs/a/b/k44/bar3/bar2/k11 > $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap2 > $ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 > /tmp/incremental.send > > A patch to update the test btrfs/030 from xfstests, so that it covers > this case, will be submitted soon. > > Signed-off-by: Filipe David Borba Manana > --- Hi, Should this go to 3.14 too? It is covered by the test case btrfs/030 for xfstests. thanks > fs/btrfs/send.c |4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c > index e0b49f6..c47fcef 100644 > --- a/fs/btrfs/send.c > +++ b/fs/btrfs/send.c > @@ -3041,8 +3041,8 @@ static int wait_for_parent_move(struct send_ctx *sctx, > > len1 = fs_path_len(path_before); > len2 = fs_path_len(path_after); > - if ((parent_ino_before != parent_ino_after) && (len1 != len2 || > -memcmp(path_before->start, path_after->start, len1))) { > + if (parent_ino_before != parent_ino_after || len1 != len2 || > +memcmp(path_before->start, path_after->start, len1)) { > ret = 1; > goto out; > } > -- > 1.7.9.5 > -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix send dealing with file renames and directory moves
On Sat, Feb 1, 2014 at 2:00 AM, Filipe David Borba Manana wrote: > This fixes a case that the commit titled: > >Btrfs: fix infinite path build loops in incremental send > > didn't cover. If the parent-child relationship between 2 directories > is inverted, both get renamed, and the former parent has a file that > got renamed too (but remains a child of that directory), the incremental > send operation would use the file's old path after sending an unlink > operation for that old path, causing receive to fail on future operations > like changing owner, permissions or utimes of the corresponding inode. > > This is not a regression from the commit mentioned before, as without > that commit we would fall into the issues that commit fixed, so it's > just one case that wasn't covered before. > > Simple steps to reproduce this issue are: > > $ mkfs.btrfs -f /dev/sdb3 > $ mount /dev/sdb3 /mnt/btrfs > $ mkdir -p /mnt/btrfs/a/b/c/d > $ touch /mnt/btrfs/a/b/c/d/file > $ mkdir -p /mnt/btrfs/a/b/x > $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap1 > $ mv /mnt/btrfs/a/b/x /mnt/btrfs/a/b/c/x2 > $ mv /mnt/btrfs/a/b/c/d /mnt/btrfs/a/b/c/x2/d2 > $ mv /mnt/btrfs/a/b/c/x2/d2/file /mnt/btrfs/a/b/c/x2/d2/file2 > $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap2 > $ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 > > /tmp/incremental.send > > A patch to update the test btrfs/030 from xfstests, so that it covers > this case, will be submitted soon. > > Signed-off-by: Filipe David Borba Manana Hi, Should this go to 3.14? It is covered by the test case btrfs/030 for xfstests. thanks > --- > fs/btrfs/send.c | 10 +- > 1 file changed, 1 insertion(+), 9 deletions(-) > > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c > index 7250d86..e0b49f6 100644 > --- a/fs/btrfs/send.c > +++ b/fs/btrfs/send.c > @@ -2121,8 +2121,6 @@ static int get_cur_path(struct send_ctx *sctx, u64 ino, > u64 gen, > u64 parent_inode = 0; > u64 parent_gen = 0; > int stop = 0; > - u64 start_ino = ino; > - u64 start_gen = gen; > int skip_name_cache = 0; > > name = fs_path_alloc(); > @@ -2134,7 +2132,6 @@ static int get_cur_path(struct send_ctx *sctx, u64 ino, > u64 gen, > if (is_waiting_for_move(sctx, ino)) > skip_name_cache = 1; > > -again: > dest->reversed = 1; > fs_path_reset(dest); > > @@ -2149,13 +2146,8 @@ again: > stop = 1; > > if (!skip_name_cache && > - is_waiting_for_move(sctx, parent_inode)) { > - ino = start_ino; > - gen = start_gen; > - stop = 0; > + is_waiting_for_move(sctx, parent_inode)) > skip_name_cache = 1; > - goto again; > - } > > ret = fs_path_add_path(dest, name); > if (ret < 0) > -- > 1.7.9.5 > -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: throttle delayed refs better
On Fri, 14 Feb 2014 14:29:35 -0500 Josef Bacik wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > > > On 02/14/2014 02:25 PM, Johannes Hirte wrote: > > On Thu, 6 Feb 2014 16:19:46 -0500 Josef Bacik > > wrote: > > > >> Ok so I thought I reproduced the problem but I just reproduced a > >> different problem. Please undo any changes you've made and > >> apply this patch and reproduce and then provide me with any debug > >> output that gets spit out. I'm sending this via thunderbird with > >> 6 different extensions to make sure it comes out right so if it > >> doesn't work let me know and I'll just paste it somewhere. > >> Thanks, > > > > Sorry for the long delay. Was to busy last week. > > > > Ok perfect this is fixed by > > [PATCH] Btrfs: don't loop forever if we can't run because of the tree > mod log > > and it went into -rc2 iirc, so give that a whirl and make sure it > fixes your problem. Thanks, Yes, seems to be fixed now. I wasn't able to reproduce it anymore. regards, Johannes -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: don't insert useless holes when punching beyond the inode's size
If we punch beyond the size of an inode, we'll correctly remove any prealloc extents, but we'll also insert file extent items representing holes (disk bytenr == 0) that start with a key offset that lies beyond the inode's size and are not contiguous with the last file extent item. Example: $XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo $XFS_IO_PROG -c "fpunch 582007 864596" $SCRATCH_MNT/foo $XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" $SCRATCH_MNT/foo btrfs-debug-tree output: item 4 key (257 INODE_ITEM 0) itemoff 15885 itemsize 160 inode generation 6 transid 6 size 132254 block group 0 mode 100600 links 1 item 5 key (257 INODE_REF 256) itemoff 15872 itemsize 13 inode ref index 2 namelen 3 name: foo item 6 key (257 EXTENT_DATA 0) itemoff 15819 itemsize 53 extent data disk byte 0 nr 0 gen 6 extent data offset 0 nr 90112 ram 122880 extent compression 0 item 7 key (257 EXTENT_DATA 90112) itemoff 15766 itemsize 53 extent data disk byte 12845056 nr 4096 gen 6 extent data offset 0 nr 45056 ram 45056 extent compression 2 item 8 key (257 EXTENT_DATA 585728) itemoff 15713 itemsize 53 extent data disk byte 0 nr 0 gen 6 extent data offset 0 nr 860160 ram 860160 extent compression 0 The last extent item, which represents a hole, is useless as it lies beyond the inode's size. Signed-off-by: Filipe David Borba Manana --- fs/btrfs/file.c | 28 +--- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 006af2f..5133d71 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2165,6 +2165,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) bool same_page = ((offset >> PAGE_CACHE_SHIFT) == ((offset + len - 1) >> PAGE_CACHE_SHIFT)); bool no_holes = btrfs_fs_incompat(root->fs_info, NO_HOLES); + u64 ino_size = round_up(inode->i_size, PAGE_CACHE_SIZE); ret = btrfs_wait_ordered_range(inode, offset, len); if (ret) @@ -2180,14 +2181,14 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) * entire page. */ if (same_page && len < PAGE_CACHE_SIZE) { - if (offset < round_up(inode->i_size, PAGE_CACHE_SIZE)) + if (offset < ino_size) ret = btrfs_truncate_page(inode, offset, len, 0); mutex_unlock(&inode->i_mutex); return ret; } /* zero back part of the first page */ - if (offset < round_up(inode->i_size, PAGE_CACHE_SIZE)) { + if (offset < ino_size) { ret = btrfs_truncate_page(inode, offset, 0, 0); if (ret) { mutex_unlock(&inode->i_mutex); @@ -2196,7 +2197,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) } /* zero the front end of the last page */ - if (offset + len < round_up(inode->i_size, PAGE_CACHE_SIZE)) { + if (offset + len < ino_size) { ret = btrfs_truncate_page(inode, offset + len, 0, 1); if (ret) { mutex_unlock(&inode->i_mutex); @@ -2285,10 +2286,13 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) trans->block_rsv = &root->fs_info->trans_block_rsv; - ret = fill_holes(trans, inode, path, cur_offset, drop_end); - if (ret) { - err = ret; - break; + if (cur_offset < ino_size) { + ret = fill_holes(trans, inode, path, cur_offset, +drop_end); + if (ret) { + err = ret; + break; + } } cur_offset = drop_end; @@ -2321,10 +2325,12 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) } trans->block_rsv = &root->fs_info->trans_block_rsv; - ret = fill_holes(trans, inode, path, cur_offset, drop_end); - if (ret) { - err = ret; - goto out_trans; + if (cur_offset < ino_size) { + ret = fill_holes(trans, inode, path, cur_offset, drop_end); + if (ret) { + err = ret; + goto out_trans; + } } out_trans: -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Btrfs: use right clone root offset for compressed extents
For non compressed extents, iterate_extent_inodes() gives us offsets that take into account the data offset from the file extent items, while for compressed extents it doesn't. Therefore we have to adjust them before placing them in a send clone instruction. Not doing this adjustment leads to the receiving end requesting for a wrong a file range to the clone ioctl, which results in different file content from the one in the original send root. Issue reproducible with the following excerpt from the test I made for xfstests: _scratch_mkfs _scratch_mount "-o compress-force=lzo" $XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo $XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" $SCRATCH_MNT/foo $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 $XFS_IO_PROG -c "pwrite -S 0x3e -b 8 20 8" $SCRATCH_MNT/foo $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT $XFS_IO_PROG -c "pwrite -S 0xdc -b 1 25 1" $SCRATCH_MNT/foo $XFS_IO_PROG -c "pwrite -S 0xff -b 1 30 1" $SCRATCH_MNT/foo # will be used for incremental send to be able to issue clone operations $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/clones_snap $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2 $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1 $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \ -x $SCRATCH_MNT/mysnap2/clones_snap $SCRATCH_MNT/mysnap2 $FSSUM_PROG -A -f -w $tmp/clones.fssum $SCRATCH_MNT/clones_snap \ -x $SCRATCH_MNT/clones_snap/mysnap1 -x $SCRATCH_MNT/clones_snap/mysnap2 $BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap $BTRFS_UTIL_PROG send $SCRATCH_MNT/clones_snap -f $tmp/clones.snap $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 \ -c $SCRATCH_MNT/clones_snap $SCRATCH_MNT/mysnap2 -f $tmp/2.snap _scratch_unmount _scratch_mkfs _scratch_mount $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 2>> $seqres.full $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/clones.snap $FSSUM_PROG -r $tmp/clones.fssum $SCRATCH_MNT/clones_snap 2>> $seqres.full $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 2>> $seqres.full Signed-off-by: Filipe David Borba Manana --- V2: Updated commit message and the comment. fs/btrfs/send.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index f46c43f..4addace 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -1295,6 +1295,16 @@ verbose_printk(KERN_DEBUG "btrfs: find_extent_clone: data_offset=%llu, " } if (cur_clone_root) { + if (compressed != BTRFS_COMPRESS_NONE) { + /* +* Offsets given by iterate_extent_inodes() are relative +* to the start of the extent, we need to add logical +* offset from the file extent item. +* (See why at backref.c:check_extent_in_eb()) +*/ + cur_clone_root->offset += btrfs_file_extent_offset(eb, + fi); + } *found = cur_clone_root; ret = 0; } else { -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] xfstests: add regression test for btrfs incremental send
Test for a btrfs incremental send issue where we end up sending a wrong section of data from a file extent if the corresponding file extent is compressed and the respective file extent item has a non zero data offset. Fixed by the following linux kernel btrfs patch: Btrfs: use right clone root offset for compressed extents Signed-off-by: Filipe David Borba Manana --- V2: Made the test more reliable. Now it doesn't depend anymore of btrfs' hole punch implementation leaving hole file extent items when we punch beyond the file's current size. tests/btrfs/040 | 115 +++ tests/btrfs/040.out |1 + tests/btrfs/group |1 + 3 files changed, 117 insertions(+) create mode 100755 tests/btrfs/040 create mode 100644 tests/btrfs/040.out diff --git a/tests/btrfs/040 b/tests/btrfs/040 new file mode 100755 index 000..d6b37bf --- /dev/null +++ b/tests/btrfs/040 @@ -0,0 +1,115 @@ +#! /bin/bash +# FS QA Test No. btrfs/040 +# +# Test for a btrfs incremental send issue where we end up sending a +# wrong section of data from a file extent if the corresponding file +# extent is compressed and the respective file extent item has a non +# zero data offset. +# +# Fixed by the following linux kernel btrfs patch: +# +# Btrfs: use right clone root offset for compressed extents +# +#--- +# Copyright (c) 2014 Filipe Manana. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=`mktemp -d` +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ +rm -fr $tmp +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_scratch +_need_to_be_root + +FSSUM_PROG=$here/src/fssum +[ -x $FSSUM_PROG ] || _notrun "fssum not built" + +rm -f $seqres.full + +_scratch_mkfs >/dev/null 2>&1 +_scratch_mount "-o compress-force=lzo" + +run_check $XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo +run_check $XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" \ + $SCRATCH_MNT/foo + +run_check $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \ + $SCRATCH_MNT/mysnap1 + +run_check $XFS_IO_PROG -c "pwrite -S 0x3e -b 8 20 8" \ + $SCRATCH_MNT/foo +run_check $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT +run_check $XFS_IO_PROG -c "pwrite -S 0xdc -b 1 25 1" \ + $SCRATCH_MNT/foo +run_check $XFS_IO_PROG -c "pwrite -S 0xff -b 1 30 1" \ + $SCRATCH_MNT/foo + +# will be used for incremental send to be able to issue clone operations +run_check $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \ + $SCRATCH_MNT/clones_snap + +run_check $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \ + $SCRATCH_MNT/mysnap2 + +run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1 +run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \ + -x $SCRATCH_MNT/mysnap2/clones_snap $SCRATCH_MNT/mysnap2 +run_check $FSSUM_PROG -A -f -w $tmp/clones.fssum $SCRATCH_MNT/clones_snap \ + -x $SCRATCH_MNT/clones_snap/mysnap1 -x $SCRATCH_MNT/clones_snap/mysnap2 + +run_check $BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap +run_check $BTRFS_UTIL_PROG send $SCRATCH_MNT/clones_snap -f $tmp/clones.snap +run_check $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 \ + -c $SCRATCH_MNT/clones_snap $SCRATCH_MNT/mysnap2 -f $tmp/2.snap + +_scratch_unmount +_check_btrfs_filesystem $SCRATCH_DEV + +_scratch_mkfs >/dev/null 2>&1 +_scratch_mount + +run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap +run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 2>> $seqres.full + +run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/clones.snap +run_check $FSSUM_PROG -r $tmp/clones.fssum $SCRATCH_MNT/clones_snap 2>> $seqres.full + +run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap +run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 2>> $seqres.full + +_scratch_unmount +_check_btrfs_filesystem $SCRATCH_DEV + +status=0 +exit diff --git a/tests/btrfs/040.out b/tests/btrfs/040