Re: btrfs send problems

2014-02-15 Thread Josef Bacik
I'm on my phone so apologies for top posting but please try btrfs-next, I 
recently fixed a pretty epic performance problem with send which should help 
you, I'd like to see how much.  Thanks,

Josef

Jim Salter  wrote:


Hi list - I'm having problems with btrfs send in general, and
incremental send in particular.

1. Performance: in kernel 3.11, btrfs send would send data at 500+MB/sec
from a Samsung 840 series solid state drive.  In kernel 3.12 and up,
btrfs send will only send 30-ish MB/sec from the same drive - though if
you interrupt a btrfs send in progress, it will "catch up" to where it
was at 500+ MB/sec.  This is pretty weird and frustrating.  Even weirder
and more frustrating, even at 30-ish MB/sec, a btrfs send has a very
significant performance impact on the underlying system - which is very,
very odd; 30MB/sec isn't even a tiny fraction of the throughput that
drive is capable of, and being an SSD, it isn't really subject to
degradation with a little extra IOPS concurrency.

2. Precalculation: There's no way that I'm aware of currently to
pre-determine the size of an incremental send, so I can't get any kind
of predictive progress bar; this is something I SORELY miss from ZFS. It
also makes snapshot management more difficult, because AFAICT there's no
way to see how much space on disk is referenced solely by a given snapshot.

3. Incremental sends too big?: incremental btrfs send appears to be
sending too much data.  I have a "test production" system with a couple
of Windows 2008 VMs on it, and it takes hourly rolling snapshots, then
does an incremental btrfs send to another system from each snapshot to
the next periodically.  Problem is, EACH hourly snapshot replication is
running 6-10GB of data, which seems like far too much.  I don't have any
particular way to prove it, since I don't know of a great way to
actually calculate the number of changed blocks - but the two Windows
2008 VMs have no native pagefile, so they aren't burning data that way,
they're each running VirtIO drivers, and the users aren't changing
6-10GB of data per DAY, much less per hour.  Finally, the 6-10GB
incremental send size doesn't change significantly whether the increment
in question is during the middle of the working day, or in the middle of
the night when no users are connected (and when it isn't Patch Tuesday,
so it's not like jillions of Windows Updates are coming in either - not
that they constitute 120GB-240GB of data!)

I know that last is maddeningly vague, but FWIW I have 30-ish similar
setups on ZFS, operating the same way, each with roughly the same number
of users running roughly the same set of applications, and those ZFS
incrementals are all very consistent; middle-of-the-night incrementals
on ZFS running well under 100MB apiece and total bandwidth for an entire
day's incremental replication being well under how much bandwidth btrfs
send is eating every hour. =\
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  
https://urldefense.proofpoint.com/v1/url?u=http://vger.kernel.org/majordomo-info.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=yEXVJ85k3S52RAxVbFHaIbe5eV6dKbkfn%2FXLiZd%2BG8E%3D%0A&s=43ef0c011bfafafb636ec0fa76c0e5076fba95df51b64302e1632478b5880fb4
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] Add command btrfs filesystem disk-usage

2014-02-15 Thread Chris Murphy

On Feb 14, 2014, at 11:34 AM, Hugo Mills  wrote:

> On Fri, Feb 14, 2014 at 07:27:57PM +0100, Goffredo Baroncelli wrote:
>> On 02/14/2014 07:11 PM, Roman Mamedov wrote:
>>> On Fri, 14 Feb 2014 18:57:03 +0100
>>> Goffredo Baroncelli  wrote:
>>> 
 On 02/13/2014 10:00 PM, Roman Mamedov wrote:
> On Thu, 13 Feb 2014 20:49:08 +0100
> Goffredo Baroncelli  wrote:
> 
>> Thanks for the comments, however I don't like du not usage; but you are 
>> right 
>> when you don't like "disk-usage". What about "btrfs filesystem 
>> chunk-usage" ?
> 
> Personally I don't see the point of being super-pedantic here, i.e. "look 
> this
> is not just filesystem usage, this is filesystem CHUNK usage"... 
> Consistency
> of having a matching "dev usage" and "fi usage" would have been nicer.
 
 
 What about "btrfs filesystem chunk-usage" ? 
>>> 
>>> Uhm? Had to reread this several times, but it looks like you're repeating
>>> exactly the same question that I was already answering in the quoted part.
>>> 
>>> To clarify even more, personally I'd like if there would have been "btrfs 
>>> dev
>>> usage" and "btrfs fi usage". Do not see the need to specifically make the 
>>> 2nd
>>> one "chunk-usage" instead of simply "usage".
>> 
>> I don't like "usage" because it to me seems to be too much generic.
>> Because both "btrfs filesystem disk-usage" and "btrfs device disk-usage"
>> report about chunk (and/or block group) infos, I am investigating 
>> about 
>> - btrfs filesystem chunk-usage
>> - btrfs device chunk-usage
> 
>   Most people aren't going to know (or care) what a chunk is. I'm
> much happier with Roman's suggestion of btrfs {fi,dev} usage.

Or btrfs filesystem examine, or btrfs filesystem detail, which are 
semi-consistent with mdadm for obtaining similar data.


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests: test for atime-related mount options

2014-02-15 Thread Dave Chinner
On Fri, Feb 14, 2014 at 09:02:08PM -0600, Eric Sandeen wrote:
> On 2/14/14, 7:39 PM, Dave Chinner wrote:
> > On Fri, Feb 14, 2014 at 05:48:59PM -0600, Eric Sandeen wrote:
> >> On 2/14/14, 4:24 PM, Dave Chinner wrote:
> >>> On Fri, Feb 14, 2014 at 10:41:16AM -0600, Eric Sandeen wrote:
>  On 2/14/14, 10:39 AM, David Sterba wrote:
> > On Thu, Feb 13, 2014 at 10:42:55AM -0600, Eric Sandeen wrote:
> >>> +cat /proc/mounts | grep "$SCRATCH_MNT" | grep relatime >> 
> >>> $seqres.full
> >>> +[ $? -ne 0 ] && echo "The relatime mount option should be the 
> >>> default."
> >>
> >> Ok, I guess "relatime" in /proc/mounts is from core vfs code and
> >> should be there for the foreseeable future, so seems ok.
> >>
> >> But - relatime was added in v2.6.20, and made default in 2.6.30.  So
> >> testing older kernels may not go as expected; it'd probably be best to
> >> catch situations where relatime isn't available (< 2.6.20) or not
> >> default (< 2.6.30), by explicitly mounting with relatime, and skipping
> >> relatime/strictatime tests if that fails?
> >
> > Is there some consensus what's the lowest kernel version to be supported
> > by xfstests? 2.6.32 is the lowest base for kernels in use today, so
> > worrying about anything older does not seem necessary.
> >
> 
>  I don't know that it's been discussed - selfishly, I know our QE uses
>  xfstests on RHEL5, which is 2.6.18-based.
> >>>
> >>> Sure, but they can just add the test to a "rhel5-expunged" file and
> >>> they don't have to care about tests that won't work on RHEL 5 or
> >>> other older kernels. Or to send patches to add "_requires_relatime"
> >>> so that it automatically does the right thing for older kernels.
> >>
> >> sure but some of this test is still valid on a kernel w/o relatime.
> >> And since it's the default, "relatime" might disappear from /proc/mounts
> >> some day anyway, so explicitly mounting with the option & failing
> >> if that fails might be good future-proofind in any case.
> >>
> >> *shrug*
> >>
> >> It was just a request, not a demand.  :)  Koen, you can do with
> >> it whatever you like.  Reviews aren't ultimatums.  :)
> >>
> >> If xfstests upstream is only targeted at the current kernel, that's
> >> fine, but maye we should make that a little more explicit.
> > 
> > That's not what I meant. ;)
> > 
> > Really, all I'm saying is that we can't expect people who are
> > writing tests that work on current kernels to know what is necessary
> > to make tests work on 7 year old distros that don't support a
> > feature that has been in mainline for 5 years. Hence that shouldn't
> > be a barrier to having a test committed as we have mechanisms for
> > distro QE to handle these sorts of issues...
> 
> Sure, that's perfectly fair.
> 
> I wasn't really thinking of RHEL5 when I made my first comment,
> just general portability across kernels.  dsterba suggested that
> 2.6.32 is the oldest kernel used, and I pointed out that we do
> still use 2.6.18.  :)
> 
> Anyway, for general portability across releases, perhaps rather than:
> 
> +cat /proc/mounts | grep "$SCRATCH_MNT" | grep relatime >> $seqres.full
> +[ $? -ne 0 ] && echo "The relatime mount option should be the default."
> 
> which would fail the test, it should just [notrun] if relatime
> isn't there, for any reason, on any kernel, if relatime is not
> default as expected for the test framework.  i.e.
> 
> +[ $? -ne 0 ] && _notrun "The relatime mount option is not the default."

I disagree - the test doesn't need to care what the default mount
option is - it's a relatively silly thing to test because it doesn't
determine whether the behaviour of the option is correct or not.
Especially as the test is checking the behaviour of specific atime
mount options so it should just specify each one it is testing and
ignoring what the default is. IOWs, the default atime behaviour just
doesn't matter for the purpose of the test

What the test actually cares about is this:

_require_relatime()
{
_scratch_mkfs > /dev/null 2>&1
_mount -t $FSTYP -o relatime $SCRATCH_DEV $SCRATCH_MNT || \
_notrun "relatime not supported by the current kernel"
}

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs send problems

2014-02-15 Thread Jim Salter
Hi list - I'm having problems with btrfs send in general, and 
incremental send in particular.


1. Performance: in kernel 3.11, btrfs send would send data at 500+MB/sec 
from a Samsung 840 series solid state drive.  In kernel 3.12 and up, 
btrfs send will only send 30-ish MB/sec from the same drive - though if 
you interrupt a btrfs send in progress, it will "catch up" to where it 
was at 500+ MB/sec.  This is pretty weird and frustrating.  Even weirder 
and more frustrating, even at 30-ish MB/sec, a btrfs send has a very 
significant performance impact on the underlying system - which is very, 
very odd; 30MB/sec isn't even a tiny fraction of the throughput that 
drive is capable of, and being an SSD, it isn't really subject to 
degradation with a little extra IOPS concurrency.


2. Precalculation: There's no way that I'm aware of currently to 
pre-determine the size of an incremental send, so I can't get any kind 
of predictive progress bar; this is something I SORELY miss from ZFS. It 
also makes snapshot management more difficult, because AFAICT there's no 
way to see how much space on disk is referenced solely by a given snapshot.


3. Incremental sends too big?: incremental btrfs send appears to be 
sending too much data.  I have a "test production" system with a couple 
of Windows 2008 VMs on it, and it takes hourly rolling snapshots, then 
does an incremental btrfs send to another system from each snapshot to 
the next periodically.  Problem is, EACH hourly snapshot replication is 
running 6-10GB of data, which seems like far too much.  I don't have any 
particular way to prove it, since I don't know of a great way to 
actually calculate the number of changed blocks - but the two Windows 
2008 VMs have no native pagefile, so they aren't burning data that way, 
they're each running VirtIO drivers, and the users aren't changing 
6-10GB of data per DAY, much less per hour.  Finally, the 6-10GB 
incremental send size doesn't change significantly whether the increment 
in question is during the middle of the working day, or in the middle of 
the night when no users are connected (and when it isn't Patch Tuesday, 
so it's not like jillions of Windows Updates are coming in either - not 
that they constitute 120GB-240GB of data!)


I know that last is maddeningly vague, but FWIW I have 30-ish similar 
setups on ZFS, operating the same way, each with roughly the same number 
of users running roughly the same set of applications, and those ZFS 
incrementals are all very consistent; middle-of-the-night incrementals 
on ZFS running well under 100MB apiece and total bandwidth for an entire 
day's incremental replication being well under how much bandwidth btrfs 
send is eating every hour. =\

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Fix 32/64-bit problem with BTRFS_SET_RECEIVED_SUBVOL ioctl

2014-02-15 Thread Hugo Mills
On Sat, Feb 15, 2014 at 09:42:30PM +0200, Alex Lyakas wrote:
> Hello Hugo,
> 
> Is this issue specific to the receive ioctl?

   Yes. Everything else I've tried has worked perfectly on that test
system. The issue is not pointer size (which is, I think, your
thunking idea below), but structure alignment.

> Because what you are describing applies to any user-kernel ioctl-based
> interface, when you compile the user-space as 32-bit, which the kernel
> space has been compiled as 64-bit. For that purpose, I believe, there
> exists the "compat_ioctl" callback. It's implementation should do
> "thunking", i.e., treat the user-space structure as if it were
> compiled for 32-bit, and unpack it properly.
> 
> Today, however, btrfs supplies the same callback both for
> "unlocked_ioctl" and "compat_ioctl". So we should see the same problem
> with all ioctls, if I am not missing anything.

   As far as I can see, the other ioctls are using structures which
end up being aligned to 8-byte boundaries regardless of the bitness of
the compiler target. Also, we don't tend to pass pointers in the btrfs
ioctls, so we don't have that problem.

   Hugo.

> On Mon, Jan 6, 2014 at 10:50 AM, Hugo Mills  wrote:
> > On Sun, Jan 05, 2014 at 06:26:11PM +, Hugo Mills wrote:
> >> On Sun, Jan 05, 2014 at 05:55:27PM +, Hugo Mills wrote:
> >> > The structure for BTRFS_SET_RECEIVED_IOCTL packs differently on 32-bit
> >> > and 64-bit systems. This means that it is impossible to use btrfs
> >> > receive on a system with a 64-bit kernel and 32-bit userspace, because
> >> > the structure size (and hence the ioctl number) is different.
> >> >
> >> > This patch adds a compatibility structure and ioctl to deal with the
> >> > above case.
> >>
> >>Oops, forgot to mention -- this has been compile tested, but not
> >> actually run yet. The machine in question is several miles away and is
> >> a production machine (it's my work desktop, and I can't afford much
> >> downtime on it).
> >
> >... And it doesn't even compile properly, now I come to build a
> > .deb. I'm still interested in comments about the general approach, but
> > the specific patch is a load of balls.
> >
> >Hugo.
> >
> >>Hugo.
> >>
> >> > Signed-off-by: Hugo Mills 
> >> > ---
> >> >  fs/btrfs/ioctl.c | 95 
> >> > +++-
> >> >  1 file changed, 87 insertions(+), 8 deletions(-)
> >> >
> >> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> >> > index 21da576..e186439 100644
> >> > --- a/fs/btrfs/ioctl.c
> >> > +++ b/fs/btrfs/ioctl.c
> >> > @@ -57,6 +57,32 @@
> >> >  #include "send.h"
> >> >  #include "dev-replace.h"
> >> >
> >> > +#ifdef CONFIG_64BIT
> >> > +/* If we have a 32-bit userspace and 64-bit kernel, then the UAPI
> >> > + * structures are incorrect, as the timespec structure from userspace
> >> > + * is 4 bytes too small. We define these alternatives here to teach
> >> > + * the kernel about the 32-bit struct packing.
> >> > + */
> >> > +struct btrfs_ioctl_timespec {
> >> > +   __u64 sec;
> >> > +   __u32 nsec;
> >> > +} ((__packed__));
> >> > +
> >> > +struct btrfs_ioctl_received_subvol_args {
> >> > +   charuuid[BTRFS_UUID_SIZE];  /* in */
> >> > +   __u64   stransid;   /* in */
> >> > +   __u64   rtransid;   /* out */
> >> > +   struct btrfs_ioctl_timespec stime; /* in */
> >> > +   struct btrfs_ioctl_timespec rtime; /* out */
> >> > +   __u64   flags;  /* in */
> >> > +   __u64   reserved[16];   /* in */
> >> > +} ((__packed__));
> >> > +#endif
> >> > +
> >> > +#define BTRFS_IOC_SET_RECEIVED_SUBVOL_32 _IOWR(BTRFS_IOCTL_MAGIC, 37, \
> >> > +   struct btrfs_ioctl_received_subvol_args_32)
> >> > +
> >> > +
> >> >  static int btrfs_clone(struct inode *src, struct inode *inode,
> >> >u64 off, u64 olen, u64 olen_aligned, u64 destoff);
> >> >
> >> > @@ -4313,10 +4339,69 @@ static long btrfs_ioctl_quota_rescan_wait(struct 
> >> > file *file, void __user *arg)
> >> > return btrfs_qgroup_wait_for_completion(root->fs_info);
> >> >  }
> >> >
> >> > +#ifdef CONFIG_64BIT
> >> > +static long btrfs_ioctl_set_received_subvol_32(struct file *file,
> >> > +   void __user *arg)
> >> > +{
> >> > +   struct btrfs_ioctl_received_subvol_args_32 *args32 = NULL;
> >> > +   struct btrfs_ioctl_received_subvol_args *args64 = NULL;
> >> > +   int ret = 0;
> >> > +
> >> > +   args32 = memdup_user(arg, sizeof(*args32));
> >> > +   if (IS_ERR(args32)) {
> >> > +   ret = PTR_ERR(args32);
> >> > +   args32 = NULL;
> >> > +   goto out;
> >> > +   }
> >> > +
> >> > +   args64 = malloc(sizeof(*args64));
> >> > +   if (IS_ERR(args64)) {
> >> > +   ret = PTR_ERR(args64);
> >> > +   args64 = NULL;
> >> > +   goto out;
> >> > +   }
> >> > +
> >> > +   memcpy(args64->uuid, args32->uuid, BTRFS_UUID_SIZE);
> >> > +   args64->stransid = args32->stransid;
> >> 

Re: [PATCH] btrfs: Fix 32/64-bit problem with BTRFS_SET_RECEIVED_SUBVOL ioctl

2014-02-15 Thread Alex Lyakas
Hello Hugo,

Is this issue specific to the receive ioctl?

Because what you are describing applies to any user-kernel ioctl-based
interface, when you compile the user-space as 32-bit, which the kernel
space has been compiled as 64-bit. For that purpose, I believe, there
exists the "compat_ioctl" callback. It's implementation should do
"thunking", i.e., treat the user-space structure as if it were
compiled for 32-bit, and unpack it properly.

Today, however, btrfs supplies the same callback both for
"unlocked_ioctl" and "compat_ioctl". So we should see the same problem
with all ioctls, if I am not missing anything.

Thanks,
Alex.



On Mon, Jan 6, 2014 at 10:50 AM, Hugo Mills  wrote:
> On Sun, Jan 05, 2014 at 06:26:11PM +, Hugo Mills wrote:
>> On Sun, Jan 05, 2014 at 05:55:27PM +, Hugo Mills wrote:
>> > The structure for BTRFS_SET_RECEIVED_IOCTL packs differently on 32-bit
>> > and 64-bit systems. This means that it is impossible to use btrfs
>> > receive on a system with a 64-bit kernel and 32-bit userspace, because
>> > the structure size (and hence the ioctl number) is different.
>> >
>> > This patch adds a compatibility structure and ioctl to deal with the
>> > above case.
>>
>>Oops, forgot to mention -- this has been compile tested, but not
>> actually run yet. The machine in question is several miles away and is
>> a production machine (it's my work desktop, and I can't afford much
>> downtime on it).
>
>... And it doesn't even compile properly, now I come to build a
> .deb. I'm still interested in comments about the general approach, but
> the specific patch is a load of balls.
>
>Hugo.
>
>>Hugo.
>>
>> > Signed-off-by: Hugo Mills 
>> > ---
>> >  fs/btrfs/ioctl.c | 95 
>> > +++-
>> >  1 file changed, 87 insertions(+), 8 deletions(-)
>> >
>> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> > index 21da576..e186439 100644
>> > --- a/fs/btrfs/ioctl.c
>> > +++ b/fs/btrfs/ioctl.c
>> > @@ -57,6 +57,32 @@
>> >  #include "send.h"
>> >  #include "dev-replace.h"
>> >
>> > +#ifdef CONFIG_64BIT
>> > +/* If we have a 32-bit userspace and 64-bit kernel, then the UAPI
>> > + * structures are incorrect, as the timespec structure from userspace
>> > + * is 4 bytes too small. We define these alternatives here to teach
>> > + * the kernel about the 32-bit struct packing.
>> > + */
>> > +struct btrfs_ioctl_timespec {
>> > +   __u64 sec;
>> > +   __u32 nsec;
>> > +} ((__packed__));
>> > +
>> > +struct btrfs_ioctl_received_subvol_args {
>> > +   charuuid[BTRFS_UUID_SIZE];  /* in */
>> > +   __u64   stransid;   /* in */
>> > +   __u64   rtransid;   /* out */
>> > +   struct btrfs_ioctl_timespec stime; /* in */
>> > +   struct btrfs_ioctl_timespec rtime; /* out */
>> > +   __u64   flags;  /* in */
>> > +   __u64   reserved[16];   /* in */
>> > +} ((__packed__));
>> > +#endif
>> > +
>> > +#define BTRFS_IOC_SET_RECEIVED_SUBVOL_32 _IOWR(BTRFS_IOCTL_MAGIC, 37, \
>> > +   struct btrfs_ioctl_received_subvol_args_32)
>> > +
>> > +
>> >  static int btrfs_clone(struct inode *src, struct inode *inode,
>> >u64 off, u64 olen, u64 olen_aligned, u64 destoff);
>> >
>> > @@ -4313,10 +4339,69 @@ static long btrfs_ioctl_quota_rescan_wait(struct 
>> > file *file, void __user *arg)
>> > return btrfs_qgroup_wait_for_completion(root->fs_info);
>> >  }
>> >
>> > +#ifdef CONFIG_64BIT
>> > +static long btrfs_ioctl_set_received_subvol_32(struct file *file,
>> > +   void __user *arg)
>> > +{
>> > +   struct btrfs_ioctl_received_subvol_args_32 *args32 = NULL;
>> > +   struct btrfs_ioctl_received_subvol_args *args64 = NULL;
>> > +   int ret = 0;
>> > +
>> > +   args32 = memdup_user(arg, sizeof(*args32));
>> > +   if (IS_ERR(args32)) {
>> > +   ret = PTR_ERR(args32);
>> > +   args32 = NULL;
>> > +   goto out;
>> > +   }
>> > +
>> > +   args64 = malloc(sizeof(*args64));
>> > +   if (IS_ERR(args64)) {
>> > +   ret = PTR_ERR(args64);
>> > +   args64 = NULL;
>> > +   goto out;
>> > +   }
>> > +
>> > +   memcpy(args64->uuid, args32->uuid, BTRFS_UUID_SIZE);
>> > +   args64->stransid = args32->stransid;
>> > +   args64->rtransid = args32->rtransid;
>> > +   args64->stime.sec = args32->stime.sec;
>> > +   args64->stime.nsec = args32->stime.nsec;
>> > +   args64->rtime.sec = args32->rtime.sec;
>> > +   args64->rtime.nsec = args32->rtime.nsec;
>> > +   args64->flags = args32->flags;
>> > +
>> > +   ret = _btrfs_ioctl_set_received_subvol(file, args64);
>> > +
>> > +out:
>> > +   kfree(args32);
>> > +   kfree(args64);
>> > +   return ret;
>> > +}
>> > +#endif
>> > +
>> >  static long btrfs_ioctl_set_received_subvol(struct file *file,
>> > void __user *arg)
>> >  {
>> > struct btrfs_ioctl_received_subvol_args *sa = NULL;
>> > +   int ret = 0;
>> > +
>> > +   s

Re: [PATCH] Btrfs: more send support for parent/child dir relationship inversion

2014-02-15 Thread Filipe David Manana
On Sat, Feb 1, 2014 at 2:02 AM, Filipe David Borba Manana
 wrote:
> The commit titled "Btrfs: fix infinite path build loops in incremental send"
> didn't cover a particular case where the parent-child relationship inversion
> of directories doesn't imply a rename of the new parent directory. This was
> due to a simple logic mistake, a logical and instead of a logical or.
>
> Steps to reproduce:
>
>   $ mkfs.btrfs -f /dev/sdb3
>   $ mount /dev/sdb3 /mnt/btrfs
>   $ mkdir -p /mnt/btrfs/a/b/bar1/bar2/bar3/bar4
>   $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap1
>   $ mv /mnt/btrfs/a/b/bar1/bar2/bar3/bar4 /mnt/btrfs/a/b/k44
>   $ mv /mnt/btrfs/a/b/bar1/bar2/bar3 /mnt/btrfs/a/b/k44
>   $ mv /mnt/btrfs/a/b/bar1/bar2 /mnt/btrfs/a/b/k44/bar3
>   $ mv /mnt/btrfs/a/b/bar1 /mnt/btrfs/a/b/k44/bar3/bar2/k11
>   $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap2
>   $ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 > /tmp/incremental.send
>
> A patch to update the test btrfs/030 from xfstests, so that it covers
> this case, will be submitted soon.
>
> Signed-off-by: Filipe David Borba Manana 
> ---

Hi,

Should this go to 3.14 too? It is covered by the test case btrfs/030
for xfstests.

thanks

>  fs/btrfs/send.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index e0b49f6..c47fcef 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -3041,8 +3041,8 @@ static int wait_for_parent_move(struct send_ctx *sctx,
>
> len1 = fs_path_len(path_before);
> len2 = fs_path_len(path_after);
> -   if ((parent_ino_before != parent_ino_after) && (len1 != len2 ||
> -memcmp(path_before->start, path_after->start, len1))) {
> +   if (parent_ino_before != parent_ino_after || len1 != len2 ||
> +memcmp(path_before->start, path_after->start, len1)) {
> ret = 1;
> goto out;
> }
> --
> 1.7.9.5
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix send dealing with file renames and directory moves

2014-02-15 Thread Filipe David Manana
On Sat, Feb 1, 2014 at 2:00 AM, Filipe David Borba Manana
 wrote:
> This fixes a case that the commit titled:
>
>Btrfs: fix infinite path build loops in incremental send
>
> didn't cover. If the parent-child relationship between 2 directories
> is inverted, both get renamed, and the former parent has a file that
> got renamed too (but remains a child of that directory), the incremental
> send operation would use the file's old path after sending an unlink
> operation for that old path, causing receive to fail on future operations
> like changing owner, permissions or utimes of the corresponding inode.
>
> This is not a regression from the commit mentioned before, as without
> that commit we would fall into the issues that commit fixed, so it's
> just one case that wasn't covered before.
>
> Simple steps to reproduce this issue are:
>
>   $ mkfs.btrfs -f /dev/sdb3
>   $ mount /dev/sdb3 /mnt/btrfs
>   $ mkdir -p /mnt/btrfs/a/b/c/d
>   $ touch /mnt/btrfs/a/b/c/d/file
>   $ mkdir -p /mnt/btrfs/a/b/x
>   $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap1
>   $ mv /mnt/btrfs/a/b/x /mnt/btrfs/a/b/c/x2
>   $ mv /mnt/btrfs/a/b/c/d /mnt/btrfs/a/b/c/x2/d2
>   $ mv /mnt/btrfs/a/b/c/x2/d2/file /mnt/btrfs/a/b/c/x2/d2/file2
>   $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap2
>   $ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 > 
> /tmp/incremental.send
>
> A patch to update the test btrfs/030 from xfstests, so that it covers
> this case, will be submitted soon.
>
> Signed-off-by: Filipe David Borba Manana 

Hi,

Should this go to 3.14? It is covered by the test case btrfs/030 for xfstests.

thanks

> ---
>  fs/btrfs/send.c |   10 +-
>  1 file changed, 1 insertion(+), 9 deletions(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 7250d86..e0b49f6 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -2121,8 +2121,6 @@ static int get_cur_path(struct send_ctx *sctx, u64 ino, 
> u64 gen,
> u64 parent_inode = 0;
> u64 parent_gen = 0;
> int stop = 0;
> -   u64 start_ino = ino;
> -   u64 start_gen = gen;
> int skip_name_cache = 0;
>
> name = fs_path_alloc();
> @@ -2134,7 +2132,6 @@ static int get_cur_path(struct send_ctx *sctx, u64 ino, 
> u64 gen,
> if (is_waiting_for_move(sctx, ino))
> skip_name_cache = 1;
>
> -again:
> dest->reversed = 1;
> fs_path_reset(dest);
>
> @@ -2149,13 +2146,8 @@ again:
> stop = 1;
>
> if (!skip_name_cache &&
> -   is_waiting_for_move(sctx, parent_inode)) {
> -   ino = start_ino;
> -   gen = start_gen;
> -   stop = 0;
> +   is_waiting_for_move(sctx, parent_inode))
> skip_name_cache = 1;
> -   goto again;
> -   }
>
> ret = fs_path_add_path(dest, name);
> if (ret < 0)
> --
> 1.7.9.5
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: throttle delayed refs better

2014-02-15 Thread Johannes Hirte
On Fri, 14 Feb 2014 14:29:35 -0500
Josef Bacik  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> 
> 
> On 02/14/2014 02:25 PM, Johannes Hirte wrote:
> > On Thu, 6 Feb 2014 16:19:46 -0500 Josef Bacik 
> > wrote:
> > 
> >> Ok so I thought I reproduced the problem but I just reproduced a 
> >> different problem.  Please undo any changes you've made and
> >> apply this patch and reproduce and then provide me with any debug
> >> output that gets spit out.  I'm sending this via thunderbird with
> >> 6 different extensions to make sure it comes out right so if it
> >> doesn't work let me know and I'll just paste it somewhere.
> >> Thanks,
> > 
> > Sorry for the long delay. Was to busy last week.
> > 
> 
> Ok perfect this is fixed by
> 
> [PATCH] Btrfs: don't loop forever if we can't run because of the tree
> mod log
> 
> and it went into -rc2 iirc, so give that a whirl and make sure it
> fixes your problem.  Thanks,

Yes, seems to be fixed now. I wasn't able to reproduce it anymore.

regards,
  Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: don't insert useless holes when punching beyond the inode's size

2014-02-15 Thread Filipe David Borba Manana
If we punch beyond the size of an inode, we'll correctly remove any prealloc 
extents,
but we'll also insert file extent items representing holes (disk bytenr == 0) 
that start
with a key offset that lies beyond the inode's size and are not contiguous with 
the last
file extent item.

Example:

  $XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo
  $XFS_IO_PROG -c "fpunch 582007 864596" $SCRATCH_MNT/foo
  $XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" $SCRATCH_MNT/foo

btrfs-debug-tree output:

  item 4 key (257 INODE_ITEM 0) itemoff 15885 itemsize 160
inode generation 6 transid 6 size 132254 block group 0 mode 100600 
links 1
  item 5 key (257 INODE_REF 256) itemoff 15872 itemsize 13
inode ref index 2 namelen 3 name: foo
  item 6 key (257 EXTENT_DATA 0) itemoff 15819 itemsize 53
extent data disk byte 0 nr 0 gen 6
extent data offset 0 nr 90112 ram 122880
extent compression 0
  item 7 key (257 EXTENT_DATA 90112) itemoff 15766 itemsize 53
extent data disk byte 12845056 nr 4096 gen 6
extent data offset 0 nr 45056 ram 45056
extent compression 2
  item 8 key (257 EXTENT_DATA 585728) itemoff 15713 itemsize 53
extent data disk byte 0 nr 0 gen 6
extent data offset 0 nr 860160 ram 860160
extent compression 0

The last extent item, which represents a hole, is useless as it lies beyond the 
inode's
size.

Signed-off-by: Filipe David Borba Manana 
---
 fs/btrfs/file.c |   28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 006af2f..5133d71 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2165,6 +2165,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
bool same_page = ((offset >> PAGE_CACHE_SHIFT) ==
  ((offset + len - 1) >> PAGE_CACHE_SHIFT));
bool no_holes = btrfs_fs_incompat(root->fs_info, NO_HOLES);
+   u64 ino_size = round_up(inode->i_size, PAGE_CACHE_SIZE);
 
ret = btrfs_wait_ordered_range(inode, offset, len);
if (ret)
@@ -2180,14 +2181,14 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
 * entire page.
 */
if (same_page && len < PAGE_CACHE_SIZE) {
-   if (offset < round_up(inode->i_size, PAGE_CACHE_SIZE))
+   if (offset < ino_size)
ret = btrfs_truncate_page(inode, offset, len, 0);
mutex_unlock(&inode->i_mutex);
return ret;
}
 
/* zero back part of the first page */
-   if (offset < round_up(inode->i_size, PAGE_CACHE_SIZE)) {
+   if (offset < ino_size) {
ret = btrfs_truncate_page(inode, offset, 0, 0);
if (ret) {
mutex_unlock(&inode->i_mutex);
@@ -2196,7 +2197,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
}
 
/* zero the front end of the last page */
-   if (offset + len < round_up(inode->i_size, PAGE_CACHE_SIZE)) {
+   if (offset + len < ino_size) {
ret = btrfs_truncate_page(inode, offset + len, 0, 1);
if (ret) {
mutex_unlock(&inode->i_mutex);
@@ -2285,10 +2286,13 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
 
trans->block_rsv = &root->fs_info->trans_block_rsv;
 
-   ret = fill_holes(trans, inode, path, cur_offset, drop_end);
-   if (ret) {
-   err = ret;
-   break;
+   if (cur_offset < ino_size) {
+   ret = fill_holes(trans, inode, path, cur_offset,
+drop_end);
+   if (ret) {
+   err = ret;
+   break;
+   }
}
 
cur_offset = drop_end;
@@ -2321,10 +2325,12 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
}
 
trans->block_rsv = &root->fs_info->trans_block_rsv;
-   ret = fill_holes(trans, inode, path, cur_offset, drop_end);
-   if (ret) {
-   err = ret;
-   goto out_trans;
+   if (cur_offset < ino_size) {
+   ret = fill_holes(trans, inode, path, cur_offset, drop_end);
+   if (ret) {
+   err = ret;
+   goto out_trans;
+   }
}
 
 out_trans:
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: use right clone root offset for compressed extents

2014-02-15 Thread Filipe David Borba Manana
For non compressed extents, iterate_extent_inodes() gives us offsets
that take into account the data offset from the file extent items, while
for compressed extents it doesn't. Therefore we have to adjust them before
placing them in a send clone instruction. Not doing this adjustment leads to
the receiving end requesting for a wrong a file range to the clone ioctl,
which results in different file content from the one in the original send
root.

Issue reproducible with the following excerpt from the test I made for
xfstests:

  _scratch_mkfs
  _scratch_mount "-o compress-force=lzo"

  $XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo
  $XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" $SCRATCH_MNT/foo

  $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1

  $XFS_IO_PROG -c "pwrite -S 0x3e -b 8 20 8" $SCRATCH_MNT/foo
  $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
  $XFS_IO_PROG -c "pwrite -S 0xdc -b 1 25 1" $SCRATCH_MNT/foo
  $XFS_IO_PROG -c "pwrite -S 0xff -b 1 30 1" $SCRATCH_MNT/foo

  # will be used for incremental send to be able to issue clone operations
  $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/clones_snap

  $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2

  $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
  $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
  -x $SCRATCH_MNT/mysnap2/clones_snap $SCRATCH_MNT/mysnap2
  $FSSUM_PROG -A -f -w $tmp/clones.fssum $SCRATCH_MNT/clones_snap \
  -x $SCRATCH_MNT/clones_snap/mysnap1 -x $SCRATCH_MNT/clones_snap/mysnap2

  $BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
  $BTRFS_UTIL_PROG send $SCRATCH_MNT/clones_snap -f $tmp/clones.snap
  $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 \
  -c $SCRATCH_MNT/clones_snap $SCRATCH_MNT/mysnap2 -f $tmp/2.snap

  _scratch_unmount
  _scratch_mkfs
  _scratch_mount

  $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap
  $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 2>> $seqres.full

  $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/clones.snap
  $FSSUM_PROG -r $tmp/clones.fssum $SCRATCH_MNT/clones_snap 2>> $seqres.full

  $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap
  $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 2>> $seqres.full

Signed-off-by: Filipe David Borba Manana 
---

V2: Updated commit message and the comment.

 fs/btrfs/send.c |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index f46c43f..4addace 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -1295,6 +1295,16 @@ verbose_printk(KERN_DEBUG "btrfs: find_extent_clone: 
data_offset=%llu, "
}
 
if (cur_clone_root) {
+   if (compressed != BTRFS_COMPRESS_NONE) {
+   /*
+* Offsets given by iterate_extent_inodes() are relative
+* to the start of the extent, we need to add logical
+* offset from the file extent item.
+* (See why at backref.c:check_extent_in_eb())
+*/
+   cur_clone_root->offset += btrfs_file_extent_offset(eb,
+  fi);
+   }
*found = cur_clone_root;
ret = 0;
} else {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] xfstests: add regression test for btrfs incremental send

2014-02-15 Thread Filipe David Borba Manana
Test for a btrfs incremental send issue where we end up sending a
wrong section of data from a file extent if the corresponding file
extent is compressed and the respective file extent item has a non
zero data offset.

Fixed by the following linux kernel btrfs patch:

   Btrfs: use right clone root offset for compressed extents

Signed-off-by: Filipe David Borba Manana 
---

V2: Made the test more reliable. Now it doesn't depend anymore of btrfs'
hole punch implementation leaving hole file extent items when we punch
beyond the file's current size.

 tests/btrfs/040 |  115 +++
 tests/btrfs/040.out |1 +
 tests/btrfs/group   |1 +
 3 files changed, 117 insertions(+)
 create mode 100755 tests/btrfs/040
 create mode 100644 tests/btrfs/040.out

diff --git a/tests/btrfs/040 b/tests/btrfs/040
new file mode 100755
index 000..d6b37bf
--- /dev/null
+++ b/tests/btrfs/040
@@ -0,0 +1,115 @@
+#! /bin/bash
+# FS QA Test No. btrfs/040
+#
+# Test for a btrfs incremental send issue where we end up sending a
+# wrong section of data from a file extent if the corresponding file
+# extent is compressed and the respective file extent item has a non
+# zero data offset.
+#
+# Fixed by the following linux kernel btrfs patch:
+#
+#   Btrfs: use right clone root offset for compressed extents
+#
+#---
+# Copyright (c) 2014 Filipe Manana.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=`mktemp -d`
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+rm -fr $tmp
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_need_to_be_root
+
+FSSUM_PROG=$here/src/fssum
+[ -x $FSSUM_PROG ] || _notrun "fssum not built"
+
+rm -f $seqres.full
+
+_scratch_mkfs >/dev/null 2>&1
+_scratch_mount "-o compress-force=lzo"
+
+run_check $XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo
+run_check $XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" \
+   $SCRATCH_MNT/foo
+
+run_check $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \
+   $SCRATCH_MNT/mysnap1
+
+run_check $XFS_IO_PROG -c "pwrite -S 0x3e -b 8 20 8" \
+   $SCRATCH_MNT/foo
+run_check $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
+run_check $XFS_IO_PROG -c "pwrite -S 0xdc -b 1 25 1" \
+   $SCRATCH_MNT/foo
+run_check $XFS_IO_PROG -c "pwrite -S 0xff -b 1 30 1" \
+   $SCRATCH_MNT/foo
+
+# will be used for incremental send to be able to issue clone operations
+run_check $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \
+   $SCRATCH_MNT/clones_snap
+
+run_check $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \
+   $SCRATCH_MNT/mysnap2
+
+run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
+run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
+   -x $SCRATCH_MNT/mysnap2/clones_snap $SCRATCH_MNT/mysnap2
+run_check $FSSUM_PROG -A -f -w $tmp/clones.fssum $SCRATCH_MNT/clones_snap \
+   -x $SCRATCH_MNT/clones_snap/mysnap1 -x $SCRATCH_MNT/clones_snap/mysnap2
+
+run_check $BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
+run_check $BTRFS_UTIL_PROG send $SCRATCH_MNT/clones_snap -f $tmp/clones.snap
+run_check $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 \
+   -c $SCRATCH_MNT/clones_snap $SCRATCH_MNT/mysnap2 -f $tmp/2.snap
+
+_scratch_unmount
+_check_btrfs_filesystem $SCRATCH_DEV
+
+_scratch_mkfs >/dev/null 2>&1
+_scratch_mount
+
+run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap
+run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 2>> $seqres.full
+
+run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/clones.snap
+run_check $FSSUM_PROG -r $tmp/clones.fssum $SCRATCH_MNT/clones_snap 2>> 
$seqres.full
+
+run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap
+run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 2>> $seqres.full
+
+_scratch_unmount
+_check_btrfs_filesystem $SCRATCH_DEV
+
+status=0
+exit
diff --git a/tests/btrfs/040.out b/tests/btrfs/040