date:20140214

Re: [PATCH] Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol

2014-02-14 Thread J. Bruce Fields

On Fri, Feb 14, 2014 at 05:40:55PM -0800, Eric W. Biederman wrote:
> "J. Bruce Fields"  writes:
> 
> > On Fri, Feb 14, 2014 at 01:43:48PM -0500, Josef Bacik wrote:
> >> A user was running into errors from an NFS export of a subvolume that had a
> >> default subvol set.  When we mount a default subvol we will use 
> >> d_obtain_alias()
> >> to find an existing dentry for the subvolume in the case that the root 
> >> subvol
> >> has already been mounted, or a dummy one is allocated in the case that the 
> >> root
> >> subvol has not already been mounted.  This allows us to connect the dentry 
> >> later
> >> on if we wander into the path.  However if we don't ever wander into the 
> >> path we
> >> will keep DCACHE_DISCONNECTED set for a long time, which angers NFS.  It 
> >> doesn't
> >> appear to cause any problems but it is annoying nonetheless, so simply 
> >> unset
> >> DCACHE_DISCONNECTED in the get_default_root case and switch btrfs_lookup() 
> >> to
> >> use d_materialise_unique() instead which will make everything play nicely
> >> together and reconnect stuff if we wander into the defaul subvol path from 
> >> a
> >> different way.  With this patch I'm no longer getting the NFS errors when
> >> exporting a volume that has been mounted with a default subvol set.  
> >> Thanks,
> >
> > Looks obviously correct, but based on a quick grep, there are four
> > d_obtain_alias callers outside export methods:
> >
> > - btrfs/super.c:get_default_root()
> > - fs/ceph/super.c:open_root_dentry()
> > - fs/nfs/getroot.c:nfs_get_root()
> > - fs/nilfs2/super.c:nilfs_get_root_dentry()
> >
> > It'd be nice to give them a common d_obtain_alias variant instead of
> > making them all clear this by hand.
> 
> I am in favor of one small fix at a time, so that progress is made and
> fixing something just for btrfs seems reasonable for the short term.
> 
> > Of those nilfs2 also uses d_splice_alias.  I think that problem would
> > best be solved by fixing d_splice_alias not to require a
> > DCACHE_DISCONNECTED dentry; IS_ROOT() on its own should be fine.
> 
> You mean by renaming d_splice_alias d_materialise_unique?
> 
> Or is there a useful distinction you see that should be preserved
> between the two methods?
> 
> Right now my inclination is that everyone should just use
> d_materialise_unique and we should kill d_splice_alias.

Probably.  One remaining distinction:

- In the local filesystem case if you discover a directory is
  already aliased elsewhere, you have a corrupted filesystem and
  want to error out the lookup.  (Didn't you propose a patch to
  do something like that before?)
- In the distributed filesystem this is perfectly normal and we
  want to do our best to fix up our local cache to represent
  remote reality.

> And by everyone I mean all file systems that are either distributed
> (implementing d_revalidate) or exportable by knfsd.
> 
> One of the interesting things that d_materialise_unique does is get the
> lazy rename case correct for a distributed filesystem.
> check_submounts_and_drop can drop a directory when it is found not to be
> accessible by that name, but later when we look it up
> d_materialise_uniuqe will resuscciate the existing dentry.

OK.  I'm not sure I understand how that helps.

Ugly untested draft follows.

--b.

diff --git a/fs/dcache.c b/fs/dcache.c
index 265e0ce..b4572fa 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1905,58 +1905,6 @@ struct dentry *d_obtain_alias(struct inode *inode)
 EXPORT_SYMBOL(d_obtain_alias);
 
 /**
- * d_splice_alias - splice a disconnected dentry into the tree if one exists
- * @inode:  the inode which may have a disconnected dentry
- * @dentry: a negative dentry which we want to point to the inode.
- *
- * If inode is a directory and has a 'disconnected' dentry (i.e. IS_ROOT and
- * DCACHE_DISCONNECTED), then d_move that in place of the given dentry
- * and return it, else simply d_add the inode to the dentry and return NULL.
- *
- * This is needed in the lookup routine of any filesystem that is exportable
- * (via knfsd) so that we can build dcache paths to directories effectively.
- *
- * If a dentry was found and moved, then it is returned.  Otherwise NULL
- * is returned.  This matches the expected return value of ->lookup.
- *
- * Cluster filesystems may call this function with a negative, hashed dentry.
- * In that case, we know that the inode will be a regular file, and also this
- * will only occur during atomic_open. So we need to check for the dentry
- * being already hashed only in the final case.
- */
-struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
-{
-   struct dentry *new = NULL;
-
-   if (IS_ERR(inode))
-   return ERR_CAST(inode);
-
-   if (inode && S_ISDIR(inode->i_mode)) {
-   spin_lock(&inode->i_lock);
-   new = __d_find_alias(inode, 1);
-   if (new) {
-   BU

Re: [PATCH] xfstests: test for atime-related mount options

2014-02-14 Thread Eric Sandeen

On 2/14/14, 7:39 PM, Dave Chinner wrote:
> On Fri, Feb 14, 2014 at 05:48:59PM -0600, Eric Sandeen wrote:
>> On 2/14/14, 4:24 PM, Dave Chinner wrote:
>>> On Fri, Feb 14, 2014 at 10:41:16AM -0600, Eric Sandeen wrote:
 On 2/14/14, 10:39 AM, David Sterba wrote:
> On Thu, Feb 13, 2014 at 10:42:55AM -0600, Eric Sandeen wrote:
>>> +cat /proc/mounts | grep "$SCRATCH_MNT" | grep relatime >> $seqres.full
>>> +[ $? -ne 0 ] && echo "The relatime mount option should be the default."
>>
>> Ok, I guess "relatime" in /proc/mounts is from core vfs code and
>> should be there for the foreseeable future, so seems ok.
>>
>> But - relatime was added in v2.6.20, and made default in 2.6.30.  So
>> testing older kernels may not go as expected; it'd probably be best to
>> catch situations where relatime isn't available (< 2.6.20) or not
>> default (< 2.6.30), by explicitly mounting with relatime, and skipping
>> relatime/strictatime tests if that fails?
>
> Is there some consensus what's the lowest kernel version to be supported
> by xfstests? 2.6.32 is the lowest base for kernels in use today, so
> worrying about anything older does not seem necessary.
>

 I don't know that it's been discussed - selfishly, I know our QE uses
 xfstests on RHEL5, which is 2.6.18-based.
>>>
>>> Sure, but they can just add the test to a "rhel5-expunged" file and
>>> they don't have to care about tests that won't work on RHEL 5 or
>>> other older kernels. Or to send patches to add "_requires_relatime"
>>> so that it automatically does the right thing for older kernels.
>>
>> sure but some of this test is still valid on a kernel w/o relatime.
>> And since it's the default, "relatime" might disappear from /proc/mounts
>> some day anyway, so explicitly mounting with the option & failing
>> if that fails might be good future-proofind in any case.
>>
>> *shrug*
>>
>> It was just a request, not a demand.  :)  Koen, you can do with
>> it whatever you like.  Reviews aren't ultimatums.  :)
>>
>> If xfstests upstream is only targeted at the current kernel, that's
>> fine, but maye we should make that a little more explicit.
> 
> That's not what I meant. ;)
> 
> Really, all I'm saying is that we can't expect people who are
> writing tests that work on current kernels to know what is necessary
> to make tests work on 7 year old distros that don't support a
> feature that has been in mainline for 5 years. Hence that shouldn't
> be a barrier to having a test committed as we have mechanisms for
> distro QE to handle these sorts of issues...

Sure, that's perfectly fair.

I wasn't really thinking of RHEL5 when I made my first comment,
just general portability across kernels.  dsterba suggested that
2.6.32 is the oldest kernel used, and I pointed out that we do
still use 2.6.18.  :)

Anyway, for general portability across releases, perhaps rather than:

+cat /proc/mounts | grep "$SCRATCH_MNT" | grep relatime >> $seqres.full
+[ $? -ne 0 ] && echo "The relatime mount option should be the default."

which would fail the test, it should just [notrun] if relatime
isn't there, for any reason, on any kernel, if relatime is not
default as expected for the test framework.  i.e.

+[ $? -ne 0 ] && _notrun "The relatime mount option is not the default."

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: user creation/deletion of snapshots permissions bug

2014-02-14 Thread Russell Coker

On Fri, 14 Feb 2014 18:26:27 David Sterba wrote:
> On Fri, Feb 07, 2014 at 03:41:09PM +1100, Russell Coker wrote:
> > $ /sbin/btrfs subvol create /tmp/test
> > Create subvolume '/tmp/test'
> > $ /sbin/btrfs subvol delete /tmp/test
> > Delete subvolume '/tmp/test'
> > ERROR: cannot delete '/tmp/test' - Operation not permitted
> > 
> > The above is when running Debian kernel 3.12 based on Linux upstream
> > 3.12.8. I believe that the BTRFS kernel code should do a capabilities
> > check for CAP_SYS_ADMIN (which is used for mount/umount among many other
> > things) before creating a snapshot.  Currently it appears that the only
> > access control is write access to the parent directory.
> 
> This is going to be partially fixed in 3.14 and the patch backported to
> older stable trees
> 
> http://www.spinics.net/lists/linux-btrfs/msg30815.html

Great, thanks for that information.

> the user has to own the snapshot source, or be capable to do so. The
> requirement of admin capabilities to delete a subvolume is still there,
> but I guess it can go away under same checks (ie. owner or capable).
> 
> The admin capability requirement to create a subvolume/snapshot seems
> too restrictive. Although a subvolume is not as lightweight as a
> directory, it allows some convenience to do "reflink" copy of a deep
> directory structure in one go, followed by small changes (eg. git trees).

If you have hostile local users then a script to create a large number of 
subvolumes or snapshots is going to be really annoying as long as there's no 
tool to automatically delete them.  Something conceptually equivalent to rm -
rf but for subvols would be good.

> > There is some possibility of debate about the access control needed for
> > creating a subvol.  I want to use capabilities set by SE Linux policy to
> > prevent unwanted actions by hostile root processes and I think that such
> > use of capabilities (which is used by more than just SE Linux) should be
> > supported.  I don't think that there is any downside to such checks.
> I agree, making this tunable whom to allow to manipulate with subvolumes
> is a good thing. However there's no separate operation regarding
> subvolumes (like mkdir/rmdir), so this needs to add them so SElinux and
> the like can hook into there.

Are you suggesting that we have an option to determine whether a capability is 
needed to create a subvol or that we have a separate hook?

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol

2014-02-14 Thread Eric W. Biederman

"J. Bruce Fields"  writes:

> On Fri, Feb 14, 2014 at 01:43:48PM -0500, Josef Bacik wrote:
>> A user was running into errors from an NFS export of a subvolume that had a
>> default subvol set.  When we mount a default subvol we will use 
>> d_obtain_alias()
>> to find an existing dentry for the subvolume in the case that the root subvol
>> has already been mounted, or a dummy one is allocated in the case that the 
>> root
>> subvol has not already been mounted.  This allows us to connect the dentry 
>> later
>> on if we wander into the path.  However if we don't ever wander into the 
>> path we
>> will keep DCACHE_DISCONNECTED set for a long time, which angers NFS.  It 
>> doesn't
>> appear to cause any problems but it is annoying nonetheless, so simply unset
>> DCACHE_DISCONNECTED in the get_default_root case and switch btrfs_lookup() to
>> use d_materialise_unique() instead which will make everything play nicely
>> together and reconnect stuff if we wander into the defaul subvol path from a
>> different way.  With this patch I'm no longer getting the NFS errors when
>> exporting a volume that has been mounted with a default subvol set.  Thanks,
>
> Looks obviously correct, but based on a quick grep, there are four
> d_obtain_alias callers outside export methods:
>
>   - btrfs/super.c:get_default_root()
>   - fs/ceph/super.c:open_root_dentry()
>   - fs/nfs/getroot.c:nfs_get_root()
>   - fs/nilfs2/super.c:nilfs_get_root_dentry()
>
> It'd be nice to give them a common d_obtain_alias variant instead of
> making them all clear this by hand.

I am in favor of one small fix at a time, so that progress is made and
fixing something just for btrfs seems reasonable for the short term.

> Of those nilfs2 also uses d_splice_alias.  I think that problem would
> best be solved by fixing d_splice_alias not to require a
> DCACHE_DISCONNECTED dentry; IS_ROOT() on its own should be fine.

You mean by renaming d_splice_alias d_materialise_unique?

Or is there a useful distinction you see that should be preserved
between the two methods?

Right now my inclination is that everyone should just use
d_materialise_unique and we should kill d_splice_alias.

And by everyone I mean all file systems that are either distributed
(implementing d_revalidate) or exportable by knfsd.

One of the interesting things that d_materialise_unique does is get the
lazy rename case correct for a distributed filesystem.
check_submounts_and_drop can drop a directory when it is found not to be
accessible by that name, but later when we look it up
d_materialise_uniuqe will resuscciate the existing dentry.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] xfstests: test for atime-related mount options

2014-02-14 Thread Dave Chinner

On Fri, Feb 14, 2014 at 05:48:59PM -0600, Eric Sandeen wrote:
> On 2/14/14, 4:24 PM, Dave Chinner wrote:
> > On Fri, Feb 14, 2014 at 10:41:16AM -0600, Eric Sandeen wrote:
> >> On 2/14/14, 10:39 AM, David Sterba wrote:
> >>> On Thu, Feb 13, 2014 at 10:42:55AM -0600, Eric Sandeen wrote:
> > +cat /proc/mounts | grep "$SCRATCH_MNT" | grep relatime >> $seqres.full
> > +[ $? -ne 0 ] && echo "The relatime mount option should be the default."
> 
>  Ok, I guess "relatime" in /proc/mounts is from core vfs code and
>  should be there for the foreseeable future, so seems ok.
> 
>  But - relatime was added in v2.6.20, and made default in 2.6.30.  So
>  testing older kernels may not go as expected; it'd probably be best to
>  catch situations where relatime isn't available (< 2.6.20) or not
>  default (< 2.6.30), by explicitly mounting with relatime, and skipping
>  relatime/strictatime tests if that fails?
> >>>
> >>> Is there some consensus what's the lowest kernel version to be supported
> >>> by xfstests? 2.6.32 is the lowest base for kernels in use today, so
> >>> worrying about anything older does not seem necessary.
> >>>
> >>
> >> I don't know that it's been discussed - selfishly, I know our QE uses
> >> xfstests on RHEL5, which is 2.6.18-based.
> > 
> > Sure, but they can just add the test to a "rhel5-expunged" file and
> > they don't have to care about tests that won't work on RHEL 5 or
> > other older kernels. Or to send patches to add "_requires_relatime"
> > so that it automatically does the right thing for older kernels.
> 
> sure but some of this test is still valid on a kernel w/o relatime.
> And since it's the default, "relatime" might disappear from /proc/mounts
> some day anyway, so explicitly mounting with the option & failing
> if that fails might be good future-proofind in any case.
> 
> *shrug*
> 
> It was just a request, not a demand.  :)  Koen, you can do with
> it whatever you like.  Reviews aren't ultimatums.  :)
> 
> If xfstests upstream is only targeted at the current kernel, that's
> fine, but maye we should make that a little more explicit.

That's not what I meant. ;)

Really, all I'm saying is that we can't expect people who are
writing tests that work on current kernels to know what is necessary
to make tests work on 7 year old distros that don't support a
feature that has been in mainline for 5 years. Hence that shouldn't
be a barrier to having a test committed as we have mechanisms for
distro QE to handle these sorts of issues...

Indeed, I'm quite happy to host distro specific test expunge files
in the upstream repo so anyone can see what tests are expected to
pass/run on various distros

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] xfstests: test for atime-related mount options

2014-02-14 Thread Eric Sandeen

On 2/14/14, 4:24 PM, Dave Chinner wrote:
> On Fri, Feb 14, 2014 at 10:41:16AM -0600, Eric Sandeen wrote:
>> On 2/14/14, 10:39 AM, David Sterba wrote:
>>> On Thu, Feb 13, 2014 at 10:42:55AM -0600, Eric Sandeen wrote:
> +cat /proc/mounts | grep "$SCRATCH_MNT" | grep relatime >> $seqres.full
> +[ $? -ne 0 ] && echo "The relatime mount option should be the default."

 Ok, I guess "relatime" in /proc/mounts is from core vfs code and
 should be there for the foreseeable future, so seems ok.

 But - relatime was added in v2.6.20, and made default in 2.6.30.  So
 testing older kernels may not go as expected; it'd probably be best to
 catch situations where relatime isn't available (< 2.6.20) or not
 default (< 2.6.30), by explicitly mounting with relatime, and skipping
 relatime/strictatime tests if that fails?
>>>
>>> Is there some consensus what's the lowest kernel version to be supported
>>> by xfstests? 2.6.32 is the lowest base for kernels in use today, so
>>> worrying about anything older does not seem necessary.
>>>
>>
>> I don't know that it's been discussed - selfishly, I know our QE uses
>> xfstests on RHEL5, which is 2.6.18-based.
> 
> Sure, but they can just add the test to a "rhel5-expunged" file and
> they don't have to care about tests that won't work on RHEL 5 or
> other older kernels. Or to send patches to add "_requires_relatime"
> so that it automatically does the right thing for older kernels.

sure but some of this test is still valid on a kernel w/o relatime.
And since it's the default, "relatime" might disappear from /proc/mounts
some day anyway, so explicitly mounting with the option & failing
if that fails might be good future-proofind in any case.

*shrug*

It was just a request, not a demand.  :)  Koen, you can do with
it whatever you like.  Reviews aren't ultimatums.  :)

If xfstests upstream is only targeted at the current kernel, that's
fine, but maye we should make that a little more explicit.

-Eric

> Ultimately, upstream developers can't do all the work necessary to
> support distros - that's why the distros have their own engineers
> and QE to make sure the upstream code works correctly when they
> backport it. xfstests is no different. ;)
> 
> IOWs, if someone wants to run a modern test suite on a 7 year old
> distro, then they need to make sure that the test suite does the
> right thing for their distro. We'll take the patches that make it
> work, but we can't expect upstream developers to know what old
> distros require, let alone test and make stuff work on them...
> 
> Just my 2c worth.
> 
> Cheers,
> 
> Dave.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] xfstests: test for atime-related mount options

2014-02-14 Thread Dave Chinner

On Fri, Feb 14, 2014 at 10:41:16AM -0600, Eric Sandeen wrote:
> On 2/14/14, 10:39 AM, David Sterba wrote:
> > On Thu, Feb 13, 2014 at 10:42:55AM -0600, Eric Sandeen wrote:
> >>> +cat /proc/mounts | grep "$SCRATCH_MNT" | grep relatime >> $seqres.full
> >>> +[ $? -ne 0 ] && echo "The relatime mount option should be the default."
> >>
> >> Ok, I guess "relatime" in /proc/mounts is from core vfs code and
> >> should be there for the foreseeable future, so seems ok.
> >>
> >> But - relatime was added in v2.6.20, and made default in 2.6.30.  So
> >> testing older kernels may not go as expected; it'd probably be best to
> >> catch situations where relatime isn't available (< 2.6.20) or not
> >> default (< 2.6.30), by explicitly mounting with relatime, and skipping
> >> relatime/strictatime tests if that fails?
> > 
> > Is there some consensus what's the lowest kernel version to be supported
> > by xfstests? 2.6.32 is the lowest base for kernels in use today, so
> > worrying about anything older does not seem necessary.
> > 
> 
> I don't know that it's been discussed - selfishly, I know our QE uses
> xfstests on RHEL5, which is 2.6.18-based.

Sure, but they can just add the test to a "rhel5-expunged" file and
they don't have to care about tests that won't work on RHEL 5 or
other older kernels. Or to send patches to add "_requires_relatime"
so that it automatically does the right thing for older kernels.

Ultimately, upstream developers can't do all the work necessary to
support distros - that's why the distros have their own engineers
and QE to make sure the upstream code works correctly when they
backport it. xfstests is no different. ;)

IOWs, if someone wants to run a modern test suite on a 7 year old
distro, then they need to make sure that the test suite does the
right thing for their distro. We'll take the patches that make it
work, but we can't expect upstream developers to know what old
distros require, let alone test and make stuff work on them...

Just my 2c worth.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol

2014-02-14 Thread J. Bruce Fields

On Fri, Feb 14, 2014 at 01:43:48PM -0500, Josef Bacik wrote:
> A user was running into errors from an NFS export of a subvolume that had a
> default subvol set.  When we mount a default subvol we will use 
> d_obtain_alias()
> to find an existing dentry for the subvolume in the case that the root subvol
> has already been mounted, or a dummy one is allocated in the case that the 
> root
> subvol has not already been mounted.  This allows us to connect the dentry 
> later
> on if we wander into the path.  However if we don't ever wander into the path 
> we
> will keep DCACHE_DISCONNECTED set for a long time, which angers NFS.  It 
> doesn't
> appear to cause any problems but it is annoying nonetheless, so simply unset
> DCACHE_DISCONNECTED in the get_default_root case and switch btrfs_lookup() to
> use d_materialise_unique() instead which will make everything play nicely
> together and reconnect stuff if we wander into the defaul subvol path from a
> different way.  With this patch I'm no longer getting the NFS errors when
> exporting a volume that has been mounted with a default subvol set.  Thanks,

Looks obviously correct, but based on a quick grep, there are four
d_obtain_alias callers outside export methods:

- btrfs/super.c:get_default_root()
- fs/ceph/super.c:open_root_dentry()
- fs/nfs/getroot.c:nfs_get_root()
- fs/nilfs2/super.c:nilfs_get_root_dentry()

It'd be nice to give them a common d_obtain_alias variant instead of
making them all clear this by hand.

Of those nilfs2 also uses d_splice_alias.  I think that problem would
best be solved by fixing d_splice_alias not to require a
DCACHE_DISCONNECTED dentry; IS_ROOT() on its own should be fine.

--b.

> 
> cc: bfie...@fieldses.org
> cc: ebied...@xmission.com
> Signed-off-by: Josef Bacik 
> ---
>  fs/btrfs/inode.c | 2 +-
>  fs/btrfs/super.c | 9 -
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 197edee..8dba152 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -5157,7 +5157,7 @@ static struct dentry *btrfs_lookup(struct inode *dir, 
> struct dentry *dentry,
>   return ERR_CAST(inode);
>   }
>  
> - return d_splice_alias(inode, dentry);
> + return d_materialise_unique(dentry, inode);
>  }
>  
>  unsigned char btrfs_filetype_table[] = {
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 147ca1d..dc0a315 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -855,6 +855,7 @@ static struct dentry *get_default_root(struct super_block 
> *sb,
>   struct btrfs_path *path;
>   struct btrfs_key location;
>   struct inode *inode;
> + struct dentry *dentry;
>   u64 dir_id;
>   int new = 0;
>  
> @@ -925,7 +926,13 @@ setup_root:
>   return dget(sb->s_root);
>   }
>  
> - return d_obtain_alias(inode);
> + dentry = d_obtain_alias(inode);
> + if (!IS_ERR(dentry)) {
> + spin_lock(&dentry->d_lock);
> + dentry->d_flags &= ~DCACHE_DISCONNECTED;
> + spin_unlock(&dentry->d_lock);
> + }
> + return dentry;
>  }
>  
>  static int btrfs_fill_super(struct super_block *sb,
> -- 
> 1.8.3.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] xfstests: add regression test for btrfs incremental send

2014-02-14 Thread Filipe David Borba Manana

Test for a btrfs incremental send issue where we end up sending a
wrong section of data from a file extent if the corresponding file
extent is compressed and the respective file extent item has a non
zero data offset.

Fixed by the following linux kernel btrfs patch:

   Btrfs: use right clone root offset for compressed extents

Signed-off-by: Filipe David Borba Manana 
---
 tests/btrfs/040 |  103 +++
 tests/btrfs/040.out |1 +
 tests/btrfs/group   |1 +
 3 files changed, 105 insertions(+)
 create mode 100755 tests/btrfs/040
 create mode 100644 tests/btrfs/040.out

diff --git a/tests/btrfs/040 b/tests/btrfs/040
new file mode 100755
index 000..c1f3d13
--- /dev/null
+++ b/tests/btrfs/040
@@ -0,0 +1,103 @@
+#! /bin/bash
+# FS QA Test No. btrfs/040
+#
+# Test for a btrfs incremental send issue where we end up sending a
+# wrong section of data from a file extent if the corresponding file
+# extent is compressed and the respective file extent item has a non
+# zero data offset.
+#
+# Fixed by the following linux kernel btrfs patch:
+#
+#   Btrfs: use right clone root offset for compressed extents
+#
+#---
+# Copyright (c) 2014 Filipe Manana.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=`mktemp -d`
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+rm -fr $tmp
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_need_to_be_root
+
+FSSUM_PROG=$here/src/fssum
+[ -x $FSSUM_PROG ] || _notrun "fssum not built"
+
+rm -f $seqres.full
+
+_scratch_mkfs >/dev/null 2>&1
+_scratch_mount "-o compress-force=lzo"
+
+run_check $XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo
+$XFS_IO_PROG -c "fpunch 582007 864596" $SCRATCH_MNT/foo
+run_check $XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" \
+   $SCRATCH_MNT/foo
+
+run_check $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \
+   $SCRATCH_MNT/mysnap1
+
+run_check $XFS_IO_PROG -c "pwrite -S 0xe1 -b 38804 1119395 38804" \
+   $SCRATCH_MNT/foo
+run_check $XFS_IO_PROG -c "pwrite -S 0x0e -b 41125 80802 41125" \
+   $SCRATCH_MNT/foo
+
+run_check $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \
+   $SCRATCH_MNT/mysnap2
+
+run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
+run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
+   $SCRATCH_MNT/mysnap2
+
+run_check $BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
+run_check $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
+   -f $tmp/2.snap
+
+_scratch_unmount
+_check_btrfs_filesystem $SCRATCH_DEV
+
+_scratch_mkfs >/dev/null 2>&1
+_scratch_mount
+
+run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap
+run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 2>> $seqres.full
+
+run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap
+run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 2>> $seqres.full
+
+_scratch_unmount
+_check_btrfs_filesystem $SCRATCH_DEV
+
+status=0
+exit
diff --git a/tests/btrfs/040.out b/tests/btrfs/040.out
new file mode 100644
index 000..7740549
--- /dev/null
+++ b/tests/btrfs/040.out
@@ -0,0 +1 @@
+QA output created by 040
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 2ca2225..a687634 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -38,3 +38,4 @@
 033 auto quick
 034 auto quick
 036 auto quick
+040 auto quick
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: use right clone root offset for compressed extents

2014-02-14 Thread Filipe David Borba Manana

For non compressed extents, iterate_extent_inodes() gives us offsets
that take into account the data offset from the file extent items, while
for compressed extents it doesn't. Therefore we have to adjust them before
placing them in a send clone instruction. Not doing this adjustment leads to
the receiving end requesting for a wrong a file range to the clone ioctl,
which results in different file content from the one in the original send
root.

Issue reproducible with the following xfstest:

  _scratch_mkfs >/dev/null 2>&1
  _scratch_mount "-o compress-force=lzo"

  run_check $XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo
  $XFS_IO_PROG -c "fpunch 582007 864596" $SCRATCH_MNT/foo
  run_check $XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" \
  $SCRATCH_MNT/foo

  run_check $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \
  $SCRATCH_MNT/mysnap1

  run_check $XFS_IO_PROG -c "pwrite -S 0xe1 -b 38804 1119395 38804" \
  $SCRATCH_MNT/foo
  run_check $XFS_IO_PROG -c "pwrite -S 0x0e -b 41125 80802 41125" \
  $SCRATCH_MNT/foo

  run_check $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \
  $SCRATCH_MNT/mysnap2

  run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
  run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
  $SCRATCH_MNT/mysnap2

  run_check $BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
  run_check $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
  -f $tmp/2.snap

  _scratch_unmount
  _scratch_mkfs >/dev/null 2>&1
  _scratch_mount

  run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap
  run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 2>> $seqres.full

  run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap
  run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 2>> $seqres.full

Signed-off-by: Filipe David Borba Manana 
---
 fs/btrfs/send.c |7 +++
 1 file changed, 7 insertions(+)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index f46c43f..6447ce6 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -1295,6 +1295,13 @@ verbose_printk(KERN_DEBUG "btrfs: find_extent_clone: 
data_offset=%llu, "
}
 
if (cur_clone_root) {
+   if (compressed != BTRFS_COMPRESS_NONE) {
+   /*
+* Compensate the offsets set by:
+*backref.c:check_extent_in_eb()
+*/
+   cur_clone_root->offset += logical - found_key.objectid;
+   }
*found = cur_clone_root;
ret = 0;
} else {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: throttle delayed refs better

2014-02-14 Thread Josef Bacik

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



On 02/14/2014 02:25 PM, Johannes Hirte wrote:
> On Thu, 6 Feb 2014 16:19:46 -0500 Josef Bacik 
> wrote:
> 
>> Ok so I thought I reproduced the problem but I just reproduced a 
>> different problem.  Please undo any changes you've made and
>> apply this patch and reproduce and then provide me with any debug
>> output that gets spit out.  I'm sending this via thunderbird with
>> 6 different extensions to make sure it comes out right so if it
>> doesn't work let me know and I'll just paste it somewhere.
>> Thanks,
> 
> Sorry for the long delay. Was to busy last week.
> 

Ok perfect this is fixed by

[PATCH] Btrfs: don't loop forever if we can't run because of the tree
mod log

and it went into -rc2 iirc, so give that a whirl and make sure it
fixes your problem.  Thanks,

Josef
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJS/m6fAAoJEANb+wAKly3B73IP/052xDlBavgP5GTMhwnn2+yA
fY862NUlwQbb+5MlMi1DseG0lBp1/j0M8XkMq/F0btZSrAJcem+mZPSfeHHbYoxG
4kO5pjXQY3ha1Wj8Lc30HqF2hGGIIfr9zOyNq1d7t/w2wXXi84VkwRJkBlZWHROy
RjoK2eKv94MJtMnL4FRxew4Pkvg2y+kqnZeaL6DL84fno6wPIqf09RXwy6i5AZMD
AuOpbs5HFkQC2tb/C1ZvWZibDSXeI/nvQPDFMaFPtD4vRLT1KdpxceNErNtMGDTK
D6YmD+XYdFkg9kNPvgeRQOPyhcdEPWvUI5mWC6lRmQu/CK+7Qf5HPoHbHr+vZB1m
IwvO34bzUVDLAHkr9kCP4+QAz+GDm7LuhvFcc2uhaZqlLYZzTszG/HqXCNBx86+f
Y8RjJvSmU+j23bQlvso1FsHUP5d0ihUaEtU+FvG0mCtFMb3gOOqTusEEH0k2x0rD
SR12DCyR9nV/lSPXEtso+8Mtrkjarw76ZV7IJnZoAxOlHsK3vvuO1xNdJGxG45aV
k+hLuoXjuQtULydkkGPgQzfzd7s9Ol2NuvhezFjCF/0nC44UWtS4LcA1W41Xcy2M
3FeuKdWsBucvHwGAc/GSAS8U6oKvCAIUeFTD3Ui2OcXBDiMQYI9jPzGoBmyCnUVQ
gBiCLWxGejAMN8z2qfCZ
=7BA7
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: throttle delayed refs better

2014-02-14 Thread Johannes Hirte

On Thu, 6 Feb 2014 16:19:46 -0500
Josef Bacik  wrote:

> Ok so I thought I reproduced the problem but I just reproduced a
> different problem.  Please undo any changes you've made and apply
> this patch and reproduce and then provide me with any debug output
> that gets spit out.  I'm sending this via thunderbird with 6
> different extensions to make sure it comes out right so if it doesn't
> work let me know and I'll just paste it somewhere.  Thanks,

Sorry for the long delay. Was to busy last week.

Here is the output:

[   25.240971] looped a lot, count 14, nr 32, no_selected_ref 99986
[   25.267639] looped a lot, count 14, nr 32, no_selected_ref 199987
[   25.294308] looped a lot, count 14, nr 32, no_selected_ref 299988
[   25.320605] looped a lot, count 14, nr 32, no_selected_ref 399989
[   25.346639] looped a lot, count 14, nr 32, no_selected_ref 40
[   25.372517] looped a lot, count 14, nr 32, no_selected_ref 51
[   25.398924] looped a lot, count 14, nr 32, no_selected_ref 62
[   25.425443] looped a lot, count 14, nr 32, no_selected_ref 73
[   25.451344] looped a lot, count 14, nr 32, no_selected_ref 84
[   25.477350] looped a lot, count 14, nr 32, no_selected_ref 95
[   25.503069] looped a lot, count 14, nr 32, no_selected_ref 106
[   25.529372] looped a lot, count 14, nr 32, no_selected_ref 117
[   25.49] looped a lot, count 14, nr 32, no_selected_ref 128
[   25.581418] looped a lot, count 14, nr 32, no_selected_ref 139
[   25.607514] looped a lot, count 14, nr 32, no_selected_ref 150
[   25.633794] looped a lot, count 14, nr 32, no_selected_ref 161
[   25.659699] looped a lot, count 14, nr 32, no_selected_ref 172
[   25.686095] looped a lot, count 14, nr 32, no_selected_ref 183
[   25.711906] looped a lot, count 14, nr 32, no_selected_ref 194
[   25.752255] looped a lot, count 14, nr 32, no_selected_ref 205
[   25.788077] looped a lot, count 0, nr 32, no_selected_ref 10
[   25.811966] looped a lot, count 14, nr 32, no_selected_ref 216
[  360.749227] looped a lot, count 8, nr 32, no_selected_ref 2
[  360.770434] looped a lot, count 8, nr 32, no_selected_ref 13
[  360.792136] looped a lot, count 8, nr 32, no_selected_ref 24
[  360.813571] looped a lot, count 8, nr 32, no_selected_ref 35
[  360.834932] looped a lot, count 8, nr 32, no_selected_ref 46
[  360.856085] looped a lot, count 8, nr 32, no_selected_ref 57
[  360.877374] looped a lot, count 8, nr 32, no_selected_ref 68
[  360.899455] looped a lot, count 8, nr 32, no_selected_ref 79
[  360.921175] looped a lot, count 8, nr 32, no_selected_ref 90
[  360.942409] looped a lot, count 8, nr 32, no_selected_ref 101
[  360.963800] looped a lot, count 8, nr 32, no_selected_ref 112
[  360.985397] looped a lot, count 8, nr 32, no_selected_ref 123
[  361.007148] looped a lot, count 8, nr 32, no_selected_ref 134
[  361.028789] looped a lot, count 8, nr 32, no_selected_ref 145
[  361.050564] looped a lot, count 8, nr 32, no_selected_ref 156
[  361.072008] looped a lot, count 8, nr 32, no_selected_ref 167
[  361.093269] looped a lot, count 8, nr 32, no_selected_ref 178
[  361.114645] looped a lot, count 8, nr 32, no_selected_ref 189
[  361.136099] looped a lot, count 8, nr 32, no_selected_ref 1900010
[  361.157566] looped a lot, count 8, nr 32, no_selected_ref 211
[  361.178969] looped a lot, count 8, nr 32, no_selected_ref 2100012
[  361.200397] looped a lot, count 8, nr 32, no_selected_ref 2200013
[  361.221980] looped a lot, count 8, nr 32, no_selected_ref 2300014
[  361.243435] looped a lot, count 8, nr 32, no_selected_ref 2400015
[  361.264777] looped a lot, count 8, nr 32, no_selected_ref 2500016
[  361.286518] looped a lot, count 8, nr 32, no_selected_ref 2600017
[  361.308240] looped a lot, count 8, nr 32, no_selected_ref 2700018
[  361.329850] looped a lot, count 8, nr 32, no_selected_ref 2800019
[  361.351420] looped a lot, count 8, nr 32, no_selected_ref 2900020
[  361.372633] looped a lot, count 8, nr 32, no_selected_ref 321
[  361.394330] looped a lot, count 8, nr 32, no_selected_ref 3100022
[  361.416039] looped a lot, count 8, nr 32, no_selected_ref 3200023
[  361.437659] looped a lot, count 8, nr 32, no_selected_ref 3300024
[  361.459181] looped a lot, count 8, nr 32, no_selected_ref 3400025
[  361.481058] looped a lot, count 8, nr 32, no_selected_ref 3500026
[  361.502441] looped a lot, count 8, nr 32, no_selected_ref 3600027
[  361.523964] looped a lot, count 8, nr 32, no_selected_ref 3700028
[  361.545387] looped a lot, count 8, nr 32, no_selected_ref 3800029
[  361.566717] looped a lot, count 8, nr 32, no_selected_ref 3900030
[  361.588079] looped a lot, count 8, nr 32, no_selected_ref 431
[  361.609673] looped a lot, count 8, nr 32, no_selected_ref 4100032
[  361.631028] looped a lot, count 8, nr 32, no_selected_ref 4200033
[  361.652498] looped a lot, count 8, nr 32, no_selected

Re: [PATCH v2] btrfs-progs: add dry-run option to restore command

2014-02-14 Thread Justin Maggard

On Fri, Feb 14, 2014 at 10:59 AM, David Sterba  wrote:
> On Fri, Feb 14, 2014 at 10:40:47AM -0800, Justin Maggard wrote:
>> Sometimes it is useful to see what btrfs restore is going to do
>> before provisioning enough external storage to restore onto.
>> Add a dry-run option so we can see what files and paths are found
>> by restore, without actually restoring any data.
>>
>> Signed-off-by: Justin Maggard 
>
> Thanks, I've added a
>
> +   if (dry_run)
> +   printf("This is a dry-run, no files are going to be 
> restored\n");
> +
>
> before the actual restoring starts so user knows.

Sounds good to me.  Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] btrfs-progs: add dry-run option to restore command

2014-02-14 Thread David Sterba

On Fri, Feb 14, 2014 at 10:40:47AM -0800, Justin Maggard wrote:
> Sometimes it is useful to see what btrfs restore is going to do
> before provisioning enough external storage to restore onto.
> Add a dry-run option so we can see what files and paths are found
> by restore, without actually restoring any data.
> 
> Signed-off-by: Justin Maggard 

Thanks, I've added a

+   if (dry_run)
+   printf("This is a dry-run, no files are going to be 
restored\n");
+

before the actual restoring starts so user knows.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol

2014-02-14 Thread Eric W. Biederman

Josef Bacik  writes:

> A user was running into errors from an NFS export of a subvolume that had a
> default subvol set.  When we mount a default subvol we will use 
> d_obtain_alias()
> to find an existing dentry for the subvolume in the case that the root subvol
> has already been mounted, or a dummy one is allocated in the case that the 
> root
> subvol has not already been mounted.  This allows us to connect the dentry 
> later
> on if we wander into the path.  However if we don't ever wander into the path 
> we
> will keep DCACHE_DISCONNECTED set for a long time, which angers NFS.  It 
> doesn't
> appear to cause any problems but it is annoying nonetheless, so simply unset
> DCACHE_DISCONNECTED in the get_default_root case and switch btrfs_lookup() to
> use d_materialise_unique() instead which will make everything play nicely
> together and reconnect stuff if we wander into the defaul subvol path from a
> different way.  With this patch I'm no longer getting the NFS errors when
> exporting a volume that has been mounted with a default subvol set.  Thanks,
>
> cc: bfie...@fieldses.org
> cc: ebied...@xmission.com
Acked-by: "Eric W. Biederman" 

> Signed-off-by: Josef Bacik 
> ---
>  fs/btrfs/inode.c | 2 +-
>  fs/btrfs/super.c | 9 -
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 197edee..8dba152 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -5157,7 +5157,7 @@ static struct dentry *btrfs_lookup(struct inode *dir, 
> struct dentry *dentry,
>   return ERR_CAST(inode);
>   }
>  
> - return d_splice_alias(inode, dentry);
> + return d_materialise_unique(dentry, inode);
>  }
>  
>  unsigned char btrfs_filetype_table[] = {
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 147ca1d..dc0a315 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -855,6 +855,7 @@ static struct dentry *get_default_root(struct super_block 
> *sb,
>   struct btrfs_path *path;
>   struct btrfs_key location;
>   struct inode *inode;
> + struct dentry *dentry;
>   u64 dir_id;
>   int new = 0;
>  
> @@ -925,7 +926,13 @@ setup_root:
>   return dget(sb->s_root);
>   }
>  
> - return d_obtain_alias(inode);
> + dentry = d_obtain_alias(inode);
> + if (!IS_ERR(dentry)) {
> + spin_lock(&dentry->d_lock);
> + dentry->d_flags &= ~DCACHE_DISCONNECTED;
> + spin_unlock(&dentry->d_lock);
> + }
> + return dentry;
>  }
>  
>  static int btrfs_fill_super(struct super_block *sb,
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol

2014-02-14 Thread Josef Bacik

A user was running into errors from an NFS export of a subvolume that had a
default subvol set.  When we mount a default subvol we will use d_obtain_alias()
to find an existing dentry for the subvolume in the case that the root subvol
has already been mounted, or a dummy one is allocated in the case that the root
subvol has not already been mounted.  This allows us to connect the dentry later
on if we wander into the path.  However if we don't ever wander into the path we
will keep DCACHE_DISCONNECTED set for a long time, which angers NFS.  It doesn't
appear to cause any problems but it is annoying nonetheless, so simply unset
DCACHE_DISCONNECTED in the get_default_root case and switch btrfs_lookup() to
use d_materialise_unique() instead which will make everything play nicely
together and reconnect stuff if we wander into the defaul subvol path from a
different way.  With this patch I'm no longer getting the NFS errors when
exporting a volume that has been mounted with a default subvol set.  Thanks,

cc: bfie...@fieldses.org
cc: ebied...@xmission.com
Signed-off-by: Josef Bacik 
---
 fs/btrfs/inode.c | 2 +-
 fs/btrfs/super.c | 9 -
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 197edee..8dba152 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5157,7 +5157,7 @@ static struct dentry *btrfs_lookup(struct inode *dir, 
struct dentry *dentry,
return ERR_CAST(inode);
}
 
-   return d_splice_alias(inode, dentry);
+   return d_materialise_unique(dentry, inode);
 }
 
 unsigned char btrfs_filetype_table[] = {
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 147ca1d..dc0a315 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -855,6 +855,7 @@ static struct dentry *get_default_root(struct super_block 
*sb,
struct btrfs_path *path;
struct btrfs_key location;
struct inode *inode;
+   struct dentry *dentry;
u64 dir_id;
int new = 0;
 
@@ -925,7 +926,13 @@ setup_root:
return dget(sb->s_root);
}
 
-   return d_obtain_alias(inode);
+   dentry = d_obtain_alias(inode);
+   if (!IS_ERR(dentry)) {
+   spin_lock(&dentry->d_lock);
+   dentry->d_flags &= ~DCACHE_DISCONNECTED;
+   spin_unlock(&dentry->d_lock);
+   }
+   return dentry;
 }
 
 static int btrfs_fill_super(struct super_block *sb,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] btrfs-progs: add dry-run option to restore command

2014-02-14 Thread Justin Maggard

Sometimes it is useful to see what btrfs restore is going to do
before provisioning enough external storage to restore onto.
Add a dry-run option so we can see what files and paths are found
by restore, without actually restoring any data.

Signed-off-by: Justin Maggard 
---
 cmds-restore.c |   17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/cmds-restore.c b/cmds-restore.c
index 1748262..26792a8 100644
--- a/cmds-restore.c
+++ b/cmds-restore.c
@@ -53,6 +53,7 @@ static int verbose = 0;
 static int ignore_errors = 0;
 static int overwrite = 0;
 static int get_xattrs = 0;
+static int dry_run = 0;
 
 #define LZO_LEN 4
 #define PAGE_CACHE_SIZE 4096
@@ -801,6 +802,8 @@ static int search_dir(struct btrfs_root *root, struct 
btrfs_key *key,
}
if (verbose)
printf("Restoring %s\n", path_name);
+   if (dry_run)
+   goto next;
fd = open(path_name, O_CREAT|O_WRONLY, 0644);
if (fd < 0) {
fprintf(stderr, "Error creating %s: %d\n",
@@ -873,7 +876,10 @@ static int search_dir(struct btrfs_root *root, struct 
btrfs_key *key,
printf("Restoring %s\n", path_name);
 
errno = 0;
-   ret = mkdir(path_name, 0755);
+   if (dry_run)
+   ret = 0;
+   else
+   ret = mkdir(path_name, 0755);
if (ret && errno != EEXIST) {
free(dir);
fprintf(stderr, "Error mkdiring %s: %d\n",
@@ -1090,6 +1096,7 @@ out:
 
 static struct option long_options[] = {
{ "path-regex", 1, NULL, 256},
+   { "dry-run", 0, NULL, 'D'},
{ NULL, 0, NULL, 0}
 };
 
@@ -1105,9 +1112,10 @@ const char * const cmd_restore_usage[] = {
"-ttree location",
"-f  filesystem location",
"-u   super mirror",
-   "-r  root objectid",
+   "-r  root objectid",
"-d  find dir",
"-l  list tree roots",
+   "-D|--dry-rundry run (only list files that would be recovered)",
"--path-regex ",
"restore only filenames matching regex,",
"you have to use following syntax (possibly quoted):",
@@ -1135,7 +1143,7 @@ int cmd_restore(int argc, char **argv)
regex_t match_reg, *mreg = NULL;
char reg_err[256];
 
-   while ((opt = getopt_long(argc, argv, "sxviot:u:df:r:lc", long_options,
+   while ((opt = getopt_long(argc, argv, "sxviot:u:df:r:lDc", long_options,
&option_index)) != -1) {
 
switch (opt) {
@@ -1191,6 +1199,9 @@ int cmd_restore(int argc, char **argv)
case 'l':
list_roots = 1;
break;
+   case 'D':
+   dry_run = 1;
+   break;
case 'c':
match_cflags |= REG_ICASE;
break;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/8] Add command btrfs filesystem disk-usage

2014-02-14 Thread Hugo Mills

On Fri, Feb 14, 2014 at 07:27:57PM +0100, Goffredo Baroncelli wrote:
> On 02/14/2014 07:11 PM, Roman Mamedov wrote:
> > On Fri, 14 Feb 2014 18:57:03 +0100
> > Goffredo Baroncelli  wrote:
> > 
> >> On 02/13/2014 10:00 PM, Roman Mamedov wrote:
> >>> On Thu, 13 Feb 2014 20:49:08 +0100
> >>> Goffredo Baroncelli  wrote:
> >>>
>  Thanks for the comments, however I don't like du not usage; but you are 
>  right 
>  when you don't like "disk-usage". What about "btrfs filesystem 
>  chunk-usage" ?
> >>>
> >>> Personally I don't see the point of being super-pedantic here, i.e. "look 
> >>> this
> >>> is not just filesystem usage, this is filesystem CHUNK usage"... 
> >>> Consistency
> >>> of having a matching "dev usage" and "fi usage" would have been nicer.
> >>
> >>
> >> What about "btrfs filesystem chunk-usage" ? 
> > 
> > Uhm? Had to reread this several times, but it looks like you're repeating
> > exactly the same question that I was already answering in the quoted part.
> > 
> > To clarify even more, personally I'd like if there would have been "btrfs 
> > dev
> > usage" and "btrfs fi usage". Do not see the need to specifically make the 
> > 2nd
> > one "chunk-usage" instead of simply "usage".
> 
> I don't like "usage" because it to me seems to be too much generic.
> Because both "btrfs filesystem disk-usage" and "btrfs device disk-usage"
> report about chunk (and/or block group) infos, I am investigating 
> about 
> - btrfs filesystem chunk-usage
> - btrfs device chunk-usage

   Most people aren't going to know (or care) what a chunk is. I'm
much happier with Roman's suggestion of btrfs {fi,dev} usage.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Nostalgia isn't what it used to be. ---   


signature.asc
Description: Digital signature

Re: [PATCH 5/8] Add command btrfs filesystem disk-usage

2014-02-14 Thread Goffredo Baroncelli

On 02/14/2014 07:11 PM, Roman Mamedov wrote:
> On Fri, 14 Feb 2014 18:57:03 +0100
> Goffredo Baroncelli  wrote:
> 
>> On 02/13/2014 10:00 PM, Roman Mamedov wrote:
>>> On Thu, 13 Feb 2014 20:49:08 +0100
>>> Goffredo Baroncelli  wrote:
>>>
 Thanks for the comments, however I don't like du not usage; but you are 
 right 
 when you don't like "disk-usage". What about "btrfs filesystem 
 chunk-usage" ?
>>>
>>> Personally I don't see the point of being super-pedantic here, i.e. "look 
>>> this
>>> is not just filesystem usage, this is filesystem CHUNK usage"... Consistency
>>> of having a matching "dev usage" and "fi usage" would have been nicer.
>>
>>
>> What about "btrfs filesystem chunk-usage" ? 
> 
> Uhm? Had to reread this several times, but it looks like you're repeating
> exactly the same question that I was already answering in the quoted part.
> 
> To clarify even more, personally I'd like if there would have been "btrfs dev
> usage" and "btrfs fi usage". Do not see the need to specifically make the 2nd
> one "chunk-usage" instead of simply "usage".

I don't like "usage" because it to me seems to be too much generic.
Because both "btrfs filesystem disk-usage" and "btrfs device disk-usage"
report about chunk (and/or block group) infos, I am investigating 
about 
- btrfs filesystem chunk-usage
- btrfs device chunk-usage

Regards
GB


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recovering from hard disk failure in a pool

2014-02-14 Thread Daniel Lee

On 02/14/2014 09:53 AM, Axelle wrote:
> Hi Daniel,
>
> This is what it answers now:
>
> sudo btrfs filesystem df /samples
> [sudo] password for axelle:
> Data, RAID0: total=252.00GB, used=108.99GB
> System, RAID1: total=8.00MB, used=28.00KB
> System: total=4.00MB, used=0.00
> Metadata, RAID1: total=5.25GB, used=3.71GB
So the issue here is that your data is raid0 which will not tolerate any
loss of a device. I'd recommend trashing the current filesystem and
creating a new one with some redundancy (use raid1 not raid0, don't add
more than one partition from the same disk to a btrfs filesystem, etc.)
so you can recover from this sort of scenario in the future. To do this,
use wipefs on the remaining partitions to remove all traces of the
current btrfs filesystem.

> By the way, I was happy to recover most of my data :)

This is the nice thing about the checksumming in btrfs, knowing that
what data you did read off is correct. :)

> Of course, I still can't add my new /dev/sdb to /samples because it's 
> read-only:
> sudo btrfs device add /dev/sdb /samples
> ERROR: error adding the device '/dev/sdb' - Read-only file system
>
> Regards
> Axelle
>
> On Fri, Feb 14, 2014 at 5:19 PM, Daniel Lee  wrote:
>> On 02/14/2014 07:22 AM, Axelle wrote:
 Did the crashed /dev/sdb have more than 1 partitions in your raid1
 filesystem?
>>> No, only 1 - as far as I recall.
>>>
>>> -- Axelle.
>> What does:
>>
>> btrfs filesystem df /samples
>>
>> say now that you've mounted the fs readonly?
>>> On Fri, Feb 14, 2014 at 3:58 PM, Daniel Lee  wrote:
 On 02/14/2014 03:04 AM, Axelle wrote:
> Hi Hugo,
>
> Thanks for your answer.
> Unfortunately, I had also tried
>
> sudo mount -o degraded /dev/sdc1 /samples
> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>missing codepage or helper program, or other error
>In some cases useful info is found in syslog - try
>dmesg | tail  or so
>
> and dmesg says:
> [ 1177.695773] btrfs: open_ctree failed
> [ 1247.448766] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
> 2 transid 31105 /dev/sdc1
> [ 1247.449700] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
> 1 transid 31105 /dev/sdc6
> [ 1247.458794] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
> 2 transid 31105 /dev/sdc1
> [ 1247.459601] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
> 1 transid 31105 /dev/sdc6
> [ 4013.363254] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
> 2 transid 31105 /dev/sdc1
> [ 4013.408280] btrfs: allowing degraded mounts
> [ 4013.555764] btrfs: bdev (null) errs: wr 0, rd 14, flush 0, corrupt 0, 
> gen 0
> [ 4015.600424] Btrfs: too many missing devices, writeable mount is not 
> allowed
> [ 4015.630841] btrfs: open_ctree failed
 Did the crashed /dev/sdb have more than 1 partitions in your raid1
 filesystem?
> Yes, I know, I'll probably be losing a lot of data, but it's not "too
> much" my concern because I had a backup (sooo happy about that :D). If
> I can manage to recover a little more on the btrfs volume it's bonus,
> but in the event I do not, I'll be using my backup.
>
> So, how do I fix my volume? I guess there would be a solution apart
> from scratching/deleting everything and starting again...
>
>
> Regards,
> Axelle
>
>
>
> On Fri, Feb 14, 2014 at 11:58 AM, Hugo Mills  wrote:
>> On Fri, Feb 14, 2014 at 11:35:56AM +0100, Axelle wrote:
>>> Hi,
>>> I've just encountered a hard disk crash in one of my btrfs pools.
>>>
>>> sudo btrfs filesystem show
>>> failed to open /dev/sr0: No medium found
>>> Label: none  uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add
>>> Total devices 3 FS bytes used 112.70GB
>>> devid1 size 100.61GB used 89.26GB path /dev/sdc6
>>> devid2 size 93.13GB used 84.00GB path /dev/sdc1
>>> *** Some devices missing
>>>
>>> The device which is missing is /dev/sdb. I have replaced it with a new
>>> hard disk. How do I add it back to the volume and fix the device
>>> missing?
>>> The pool is expected to mount to /samples (it is not mounted yet).
>>>
>>> I tried this - which fails:
>>> sudo btrfs device add /dev/sdb /samples
>>> ERROR: error adding the device '/dev/sdb' - Inappropriate ioctl for 
>>> device
>>>
>>> Why isn't this working?
>>Because it's not mounted. :)
>>
>>> I also tried this:
>>> sudo mount -o recovery /dev/sdc1 /samples
>>> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>>>missing codepage or helper program, or other error
>>>In some cases useful info is found in syslog - try
>>>dmesg | tail  or so
>>> same with /dev/sdc6
>>Close, but what you want here is:
>>
>> mount -o degraded /dev/sdc1 /sam

Re: [PATCH 5/8] Add command btrfs filesystem disk-usage

2014-02-14 Thread Roman Mamedov

On Fri, 14 Feb 2014 18:57:03 +0100
Goffredo Baroncelli  wrote:

> On 02/13/2014 10:00 PM, Roman Mamedov wrote:
> > On Thu, 13 Feb 2014 20:49:08 +0100
> > Goffredo Baroncelli  wrote:
> > 
> >> Thanks for the comments, however I don't like du not usage; but you are 
> >> right 
> >> when you don't like "disk-usage". What about "btrfs filesystem 
> >> chunk-usage" ?
> > 
> > Personally I don't see the point of being super-pedantic here, i.e. "look 
> > this
> > is not just filesystem usage, this is filesystem CHUNK usage"... Consistency
> > of having a matching "dev usage" and "fi usage" would have been nicer.
> 
> 
> What about "btrfs filesystem chunk-usage" ? 

Uhm? Had to reread this several times, but it looks like you're repeating
exactly the same question that I was already answering in the quoted part.

To clarify even more, personally I'd like if there would have been "btrfs dev
usage" and "btrfs fi usage". Do not see the need to specifically make the 2nd
one "chunk-usage" instead of simply "usage".

-- 
With respect,
Roman

signature.asc
Description: PGP signature

Re: [PATCH 1/4] btrfs-progs: use usage() to replace the warning msg on no-arg usage

2014-02-14 Thread David Sterba

On Thu, Feb 13, 2014 at 11:16:35AM +0800, Gui Hecheng wrote:
> --- a/cmds-receive.c
> +++ b/cmds-receive.c
> @@ -951,10 +951,8 @@ int cmd_receive(int argc, char **argv)
>   }
>   }
>  
> - if (optind + 1 != argc) {
> - fprintf(stderr, "ERROR: receive needs path to subvolume\n");
> - return 1;
> - }
> + if (optind + 1 != argc)

FYI, I've replaced this with check_argc_exact

> + usage(cmd_receive_usage);
>  
>   tomnt = argv[optind];
>  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/8] Add command btrfs filesystem disk-usage

2014-02-14 Thread Goffredo Baroncelli

On 02/13/2014 10:00 PM, Roman Mamedov wrote:
> On Thu, 13 Feb 2014 20:49:08 +0100
> Goffredo Baroncelli  wrote:
> 
>> Thanks for the comments, however I don't like du not usage; but you are 
>> right 
>> when you don't like "disk-usage". What about "btrfs filesystem chunk-usage" ?
> 
> Personally I don't see the point of being super-pedantic here, i.e. "look this
> is not just filesystem usage, this is filesystem CHUNK usage"... Consistency
> of having a matching "dev usage" and "fi usage" would have been nicer.


What about "btrfs filesystem chunk-usage" ? 

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recovering from hard disk failure in a pool

2014-02-14 Thread Axelle

Hi Daniel,

This is what it answers now:

sudo btrfs filesystem df /samples
[sudo] password for axelle:
Data, RAID0: total=252.00GB, used=108.99GB
System, RAID1: total=8.00MB, used=28.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=5.25GB, used=3.71GB

By the way, I was happy to recover most of my data :)

Of course, I still can't add my new /dev/sdb to /samples because it's read-only:
sudo btrfs device add /dev/sdb /samples
ERROR: error adding the device '/dev/sdb' - Read-only file system

Regards
Axelle

On Fri, Feb 14, 2014 at 5:19 PM, Daniel Lee  wrote:
> On 02/14/2014 07:22 AM, Axelle wrote:
>>> Did the crashed /dev/sdb have more than 1 partitions in your raid1
>>> filesystem?
>> No, only 1 - as far as I recall.
>>
>> -- Axelle.
> What does:
>
> btrfs filesystem df /samples
>
> say now that you've mounted the fs readonly?
>> On Fri, Feb 14, 2014 at 3:58 PM, Daniel Lee  wrote:
>>> On 02/14/2014 03:04 AM, Axelle wrote:
 Hi Hugo,

 Thanks for your answer.
 Unfortunately, I had also tried

 sudo mount -o degraded /dev/sdc1 /samples
 mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so

 and dmesg says:
 [ 1177.695773] btrfs: open_ctree failed
 [ 1247.448766] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 2 transid 31105 /dev/sdc1
 [ 1247.449700] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 1 transid 31105 /dev/sdc6
 [ 1247.458794] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 2 transid 31105 /dev/sdc1
 [ 1247.459601] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 1 transid 31105 /dev/sdc6
 [ 4013.363254] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 2 transid 31105 /dev/sdc1
 [ 4013.408280] btrfs: allowing degraded mounts
 [ 4013.555764] btrfs: bdev (null) errs: wr 0, rd 14, flush 0, corrupt 0, 
 gen 0
 [ 4015.600424] Btrfs: too many missing devices, writeable mount is not 
 allowed
 [ 4015.630841] btrfs: open_ctree failed
>>> Did the crashed /dev/sdb have more than 1 partitions in your raid1
>>> filesystem?
 Yes, I know, I'll probably be losing a lot of data, but it's not "too
 much" my concern because I had a backup (sooo happy about that :D). If
 I can manage to recover a little more on the btrfs volume it's bonus,
 but in the event I do not, I'll be using my backup.

 So, how do I fix my volume? I guess there would be a solution apart
 from scratching/deleting everything and starting again...


 Regards,
 Axelle



 On Fri, Feb 14, 2014 at 11:58 AM, Hugo Mills  wrote:
> On Fri, Feb 14, 2014 at 11:35:56AM +0100, Axelle wrote:
>> Hi,
>> I've just encountered a hard disk crash in one of my btrfs pools.
>>
>> sudo btrfs filesystem show
>> failed to open /dev/sr0: No medium found
>> Label: none  uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add
>> Total devices 3 FS bytes used 112.70GB
>> devid1 size 100.61GB used 89.26GB path /dev/sdc6
>> devid2 size 93.13GB used 84.00GB path /dev/sdc1
>> *** Some devices missing
>>
>> The device which is missing is /dev/sdb. I have replaced it with a new
>> hard disk. How do I add it back to the volume and fix the device
>> missing?
>> The pool is expected to mount to /samples (it is not mounted yet).
>>
>> I tried this - which fails:
>> sudo btrfs device add /dev/sdb /samples
>> ERROR: error adding the device '/dev/sdb' - Inappropriate ioctl for 
>> device
>>
>> Why isn't this working?
>Because it's not mounted. :)
>
>> I also tried this:
>> sudo mount -o recovery /dev/sdc1 /samples
>> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>>missing codepage or helper program, or other error
>>In some cases useful info is found in syslog - try
>>dmesg | tail  or so
>> same with /dev/sdc6
>Close, but what you want here is:
>
> mount -o degraded /dev/sdc1 /samples
>
> not "recovery". That will tell the FS that there's a missing disk, and
> it should mount without complaining. If your data is not RAID-1 or
> RAID-10, then you will almost certainly have lost some data.
>
>At that point, since you've removed the dead disk, you can do:
>
> btrfs device delete missing /samples
>
> which forcibly removes the record of the missing device.
>
>Then you can add the new device:
>
> btrfs device add /dev/sdb /samples
>
>And finally balance to repair the RAID:
>
> btrfs balance start /samples
>
>It's worth noting that even if you have RAID-1 data and metadata,
> losing /dev/sdc in your curren

[PATCH] Allow forced conversion of metadata to dup profile on multiple devices

2014-02-14 Thread Austin S Hemmelgarn

Currently, btrfs balance start fails when trying to convert metadata or
system chunks to dup profile on filesystems with multiple devices.  This
requires that a conversion from a multi-device filesystem to a single
device filesystem use the following methodology:
1. btrfs balance start -dconvert=single -mconvert=single \
   -sconvert=single -f /
2. btrfs device delete / /dev/sdx
3. btrfs balance start -mconvert=dup -sconvert=dup /
This results in a period of time (possibly very long if the devices are
big) where you don't have the protection guarantees of multiple copies
of metadata chunks.

After applying this patch, one can instead use the following methodology
for conversion from a multi-device filesystem to a single device
filesystem:
1. btrfs balance start -dconvert=single -mconvert=dup \
   -sconvert=dup -f /
2. btrfs device delete / /dev/sdx
This greatly reduces the chances of the operation causing data loss due
to a read error during the device delete.

Signed-off-by: Austin S. Hemmelgarn 
---
 fs/btrfs/volumes.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 07629e9..38a9522 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3152,10 +3152,8 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
num_devices--;
}
btrfs_dev_replace_unlock(&fs_info->dev_replace);
-   allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
-   if (num_devices == 1)
-   allowed |= BTRFS_BLOCK_GROUP_DUP;
-   else if (num_devices > 1)
+   allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE | BTRFS_BLOCK_GROUP_DUP;
+   if (num_devices > 1)
allowed |= (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1);
if (num_devices > 2)
allowed |= BTRFS_BLOCK_GROUP_RAID5;
@@ -3221,6 +3219,21 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
goto out;
}
}
+   if (((bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) &&
+   (bctl->sys.target & ~BTRFS_BLOCK_GROUP_DUP) ||
+   (bctl->meta.flags & BTRFS_BALANCE_ARGS_CONVERT) &&
+   (bctl->meta.target & ~BTRFS_BLOCK_GROUP_DUP)) &&
+   (num_devs > 1)) {
+   if (bctl->flags & BTRFS_BALANCE_FORCE) {
+   btrfs_info(fs_info, "force conversion of 
metadata "
+  "to dup profile on multiple 
devices");
+   } else {
+   btrfs_err(fs_info, "balance will reduce 
metadata "
+ "integrity, use force if you want 
this");
+   ret = -EINVAL;
+   goto out;
+   }
+   }
} while (read_seqretry(&fs_info->profiles_lock, seq));
 
if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
-- 
1.8.5.4


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] btrfs-progs: fix fsck leaks on error returns

2014-02-14 Thread David Sterba

On Thu, Feb 13, 2014 at 11:16:38AM +0800, Gui Hecheng wrote:
> @@ -6460,6 +6460,7 @@ int cmd_check(int argc, char **argv)
>   !extent_buffer_uptodate(info->dev_root->node) ||
>   !extent_buffer_uptodate(info->chunk_root->node)) {
>   fprintf(stderr, "Critical roots corrupted, unable to fsck the 
> FS\n");
> + close_ctree(info->fs_root);
>   return -EIO;

Can you please convert it to the 'goto + single return' pattern?

The other patches are ok, adding them to integration.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: user creation/deletion of snapshots permissions bug

2014-02-14 Thread David Sterba

On Fri, Feb 07, 2014 at 03:41:09PM +1100, Russell Coker wrote:
> $ /sbin/btrfs subvol create /tmp/test
> Create subvolume '/tmp/test'
> $ /sbin/btrfs subvol delete /tmp/test
> Delete subvolume '/tmp/test'
> ERROR: cannot delete '/tmp/test' - Operation not permitted
> 
> The above is when running Debian kernel 3.12 based on Linux upstream 3.12.8.  
> I believe that the BTRFS kernel code should do a capabilities check for 
> CAP_SYS_ADMIN (which is used for mount/umount among many other things) before 
> creating a snapshot.  Currently it appears that the only access control is 
> write access to the parent directory.

This is going to be partially fixed in 3.14 and the patch backported to
older stable trees

http://www.spinics.net/lists/linux-btrfs/msg30815.html

the user has to own the snapshot source, or be capable to do so. The
requirement of admin capabilities to delete a subvolume is still there,
but I guess it can go away under same checks (ie. owner or capable).

The admin capability requirement to create a subvolume/snapshot seems
too restrictive. Although a subvolume is not as lightweight as a
directory, it allows some convenience to do "reflink" copy of a deep
directory structure in one go, followed by small changes (eg. git trees).

> There is some possibility of debate about the access control needed for 
> creating a subvol.  I want to use capabilities set by SE Linux policy to 
> prevent unwanted actions by hostile root processes and I think that such use 
> of capabilities (which is used by more than just SE Linux) should be 
> supported.  I don't think that there is any downside to such checks.

I agree, making this tunable whom to allow to manipulate with subvolumes
is a good thing. However there's no separate operation regarding
subvolumes (like mkdir/rmdir), so this needs to add them so SElinux and
the like can hook into there.

> In any case allowing a subvol to be created but not deleted with the same 
> level of access is obviously a bug.

Agreed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] xfstests: test for atime-related mount options

2014-02-14 Thread David Sterba

On Fri, Feb 14, 2014 at 10:41:16AM -0600, Eric Sandeen wrote:
> I don't know that it's been discussed - selfishly, I know our QE uses
> xfstests on RHEL5, which is 2.6.18-based.

Ok then.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] xfstests: test for atime-related mount options

2014-02-14 Thread Eric Sandeen

On 2/14/14, 10:39 AM, David Sterba wrote:
> On Thu, Feb 13, 2014 at 10:42:55AM -0600, Eric Sandeen wrote:
>>> +cat /proc/mounts | grep "$SCRATCH_MNT" | grep relatime >> $seqres.full
>>> +[ $? -ne 0 ] && echo "The relatime mount option should be the default."
>>
>> Ok, I guess "relatime" in /proc/mounts is from core vfs code and
>> should be there for the foreseeable future, so seems ok.
>>
>> But - relatime was added in v2.6.20, and made default in 2.6.30.  So
>> testing older kernels may not go as expected; it'd probably be best to
>> catch situations where relatime isn't available (< 2.6.20) or not
>> default (< 2.6.30), by explicitly mounting with relatime, and skipping
>> relatime/strictatime tests if that fails?
> 
> Is there some consensus what's the lowest kernel version to be supported
> by xfstests? 2.6.32 is the lowest base for kernels in use today, so
> worrying about anything older does not seem necessary.
> 

I don't know that it's been discussed - selfishly, I know our QE uses
xfstests on RHEL5, which is 2.6.18-based.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] xfstests: test for atime-related mount options

2014-02-14 Thread David Sterba

On Thu, Feb 13, 2014 at 10:42:55AM -0600, Eric Sandeen wrote:
> > +cat /proc/mounts | grep "$SCRATCH_MNT" | grep relatime >> $seqres.full
> > +[ $? -ne 0 ] && echo "The relatime mount option should be the default."
> 
> Ok, I guess "relatime" in /proc/mounts is from core vfs code and
> should be there for the foreseeable future, so seems ok.
> 
> But - relatime was added in v2.6.20, and made default in 2.6.30.  So
> testing older kernels may not go as expected; it'd probably be best to
> catch situations where relatime isn't available (< 2.6.20) or not
> default (< 2.6.30), by explicitly mounting with relatime, and skipping
> relatime/strictatime tests if that fails?

Is there some consensus what's the lowest kernel version to be supported
by xfstests? 2.6.32 is the lowest base for kernels in use today, so
worrying about anything older does not seem necessary.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: add dry-run option to restore command

2014-02-14 Thread David Sterba

On Fri, Feb 07, 2014 at 09:12:03AM -0800, Justin Maggard wrote:
> Sometimes it is useful to see what btrfs restore is going to do
> before provisioning enough external storage to restore onto.
> Add a dry-run option so we can see what files and paths are found
> by restore, without actually restoring any data.

Ok, makes sense. I suggest to add the long option --dry-run as well. The
-D option sounds like "no data", so I'm ok to keep it as you've
proposed.

Please resend the patch and add your Signed-off-by line.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recovering from hard disk failure in a pool

2014-02-14 Thread Daniel Lee

On 02/14/2014 07:22 AM, Axelle wrote:
>> Did the crashed /dev/sdb have more than 1 partitions in your raid1
>> filesystem?
> No, only 1 - as far as I recall.
>
> -- Axelle.
What does:

btrfs filesystem df /samples

say now that you've mounted the fs readonly?
> On Fri, Feb 14, 2014 at 3:58 PM, Daniel Lee  wrote:
>> On 02/14/2014 03:04 AM, Axelle wrote:
>>> Hi Hugo,
>>>
>>> Thanks for your answer.
>>> Unfortunately, I had also tried
>>>
>>> sudo mount -o degraded /dev/sdc1 /samples
>>> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>>>missing codepage or helper program, or other error
>>>In some cases useful info is found in syslog - try
>>>dmesg | tail  or so
>>>
>>> and dmesg says:
>>> [ 1177.695773] btrfs: open_ctree failed
>>> [ 1247.448766] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
>>> 2 transid 31105 /dev/sdc1
>>> [ 1247.449700] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
>>> 1 transid 31105 /dev/sdc6
>>> [ 1247.458794] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
>>> 2 transid 31105 /dev/sdc1
>>> [ 1247.459601] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
>>> 1 transid 31105 /dev/sdc6
>>> [ 4013.363254] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
>>> 2 transid 31105 /dev/sdc1
>>> [ 4013.408280] btrfs: allowing degraded mounts
>>> [ 4013.555764] btrfs: bdev (null) errs: wr 0, rd 14, flush 0, corrupt 0, 
>>> gen 0
>>> [ 4015.600424] Btrfs: too many missing devices, writeable mount is not 
>>> allowed
>>> [ 4015.630841] btrfs: open_ctree failed
>> Did the crashed /dev/sdb have more than 1 partitions in your raid1
>> filesystem?
>>> Yes, I know, I'll probably be losing a lot of data, but it's not "too
>>> much" my concern because I had a backup (sooo happy about that :D). If
>>> I can manage to recover a little more on the btrfs volume it's bonus,
>>> but in the event I do not, I'll be using my backup.
>>>
>>> So, how do I fix my volume? I guess there would be a solution apart
>>> from scratching/deleting everything and starting again...
>>>
>>>
>>> Regards,
>>> Axelle
>>>
>>>
>>>
>>> On Fri, Feb 14, 2014 at 11:58 AM, Hugo Mills  wrote:
 On Fri, Feb 14, 2014 at 11:35:56AM +0100, Axelle wrote:
> Hi,
> I've just encountered a hard disk crash in one of my btrfs pools.
>
> sudo btrfs filesystem show
> failed to open /dev/sr0: No medium found
> Label: none  uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add
> Total devices 3 FS bytes used 112.70GB
> devid1 size 100.61GB used 89.26GB path /dev/sdc6
> devid2 size 93.13GB used 84.00GB path /dev/sdc1
> *** Some devices missing
>
> The device which is missing is /dev/sdb. I have replaced it with a new
> hard disk. How do I add it back to the volume and fix the device
> missing?
> The pool is expected to mount to /samples (it is not mounted yet).
>
> I tried this - which fails:
> sudo btrfs device add /dev/sdb /samples
> ERROR: error adding the device '/dev/sdb' - Inappropriate ioctl for device
>
> Why isn't this working?
Because it's not mounted. :)

> I also tried this:
> sudo mount -o recovery /dev/sdc1 /samples
> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>missing codepage or helper program, or other error
>In some cases useful info is found in syslog - try
>dmesg | tail  or so
> same with /dev/sdc6
Close, but what you want here is:

 mount -o degraded /dev/sdc1 /samples

 not "recovery". That will tell the FS that there's a missing disk, and
 it should mount without complaining. If your data is not RAID-1 or
 RAID-10, then you will almost certainly have lost some data.

At that point, since you've removed the dead disk, you can do:

 btrfs device delete missing /samples

 which forcibly removes the record of the missing device.

Then you can add the new device:

 btrfs device add /dev/sdb /samples

And finally balance to repair the RAID:

 btrfs balance start /samples

It's worth noting that even if you have RAID-1 data and metadata,
 losing /dev/sdc in your current configuration is likely to cause
 severe data loss -- probably making the whole FS unrecoverable. This
 is because the FS sees /dev/sdc1 and /dev/sdc6 as independent devices,
 and will happily put both copies of a piece of RAID-1 data (or
 metadata) on /dev/sdc -- one on each of sdc1 and sdc6. I therefore
 wouldn't recommend running like that for very long.

Hugo.

 --
 === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
   PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- All hope abandon,  Ye who press Enter here. ---
>>> --
>>> To unsubscribe from this list: send the line "unsubsc

Re: Possible to wait for snapshot deletion?

2014-02-14 Thread David Sterba

On Thu, Feb 13, 2014 at 08:02:43PM +0100, Kai Krakow wrote:
> Is it technically possible to wait for a snapshot completely purged from 
> disk? I imagine an option like "--wait" for btrfs delete subvolume.

I have the patch WIP, will look at it again.

> This would fit some purposes I'm planning to implement:
> 
> * In a backup scenario have a subprocess which deletes snapshots one by one,
>   starting with the oldest. If free space raises above a certain threshold,
>   pause the subprocess. If number of kept snapshots falls below a certain
>   threshold, exit the subprocess. When the backup job finished, it joins the
>   subprocess to wait for a pending subvolume deletion if any, then syncs the
>   filesystem and waits some grace time for uncommitted writes, then shuts
>   the system down or hibernates it.
> 
> * Wait for pending background subvolume deletion before putting the system
>   to sleep.
> 
> * Get better control of cron jobs working with subvolumes so jobs either do
>   not overlap or do not put too much parallel work on the file system.

These usecases should be covered by the 'wait' command.

> If btrfs is shut down improperly, already deleted 
> subvolume entries may reappear in their directories. The deletion is not 
> gracefully resumed. Since deletion of subvolumes/snapshots can take hours, 
> this is a problem for systems that are not guaranteed to be up all the time 
> or suffer from disconnected power, especially if multiple snapshots are 
> being deleted at once. With an option to wait, a script managing snapshot 
> deletion could more gracefully resume its job.

As mentioned down the thread, next progs release will be able to force a
transaction commit when the subvoulme is deleted.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Issue with btrfs balance

2014-02-14 Thread Austin S Hemmelgarn

On 02/14/2014 02:56 AM, Brendan Hide wrote:
> On 14/02/14 05:42, Austin S. Hemmelgarn wrote:
>>> On 2014/02/10 04:33 AM, Austin S Hemmelgarn wrote:
>> Do you happen to know which git repository and branch is
>> preferred to base patches on?  I'm getting ready to write one to
>> fix this, and would like to make it as easy as possible for the
>> developers to merge.
> A list of the "main" repositories is maintained at 
> https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories
> 
> I'd suggest David Sterba's branch as he maintains it for
> userspace-tools integration.
> 
In this case, it will need to be patched both in the userspace tools,
and in the kernel, it's the kernel itself that prevents the balance,
cause it thinks that you can't do dup profiles with multiple devices.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Possible to wait for snapshot deletion?

2014-02-14 Thread David Sterba

On Fri, Feb 14, 2014 at 01:12:58AM +0100, Kai Krakow wrote:
> Garry T. Williams  schrieb:
> 
> > On 2-13-14 20:02:43 Kai Krakow wrote:
> >> Is it technically possible to wait for a snapshot completely purged
> >> from disk? I imagine an option like "--wait" for btrfs delete
> >> subvolume.
> > 
> > This may be what you're looking for:
> > 
> > http://www.spinics.net/lists/linux-btrfs/msg29833.html
> > 
> 
> Looks fantastic. Can I vote to get it into official sources? ;-)

This has been in the development integration branch for some time and
is present in the integration branch pending for the next release.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recovering from hard disk failure in a pool

2014-02-14 Thread Axelle

>Did the crashed /dev/sdb have more than 1 partitions in your raid1
>filesystem?

No, only 1 - as far as I recall.

-- Axelle.

On Fri, Feb 14, 2014 at 3:58 PM, Daniel Lee  wrote:
> On 02/14/2014 03:04 AM, Axelle wrote:
>> Hi Hugo,
>>
>> Thanks for your answer.
>> Unfortunately, I had also tried
>>
>> sudo mount -o degraded /dev/sdc1 /samples
>> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>>missing codepage or helper program, or other error
>>In some cases useful info is found in syslog - try
>>dmesg | tail  or so
>>
>> and dmesg says:
>> [ 1177.695773] btrfs: open_ctree failed
>> [ 1247.448766] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
>> 2 transid 31105 /dev/sdc1
>> [ 1247.449700] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
>> 1 transid 31105 /dev/sdc6
>> [ 1247.458794] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
>> 2 transid 31105 /dev/sdc1
>> [ 1247.459601] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
>> 1 transid 31105 /dev/sdc6
>> [ 4013.363254] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
>> 2 transid 31105 /dev/sdc1
>> [ 4013.408280] btrfs: allowing degraded mounts
>> [ 4013.555764] btrfs: bdev (null) errs: wr 0, rd 14, flush 0, corrupt 0, gen >> 0
>> [ 4015.600424] Btrfs: too many missing devices, writeable mount is not 
>> allowed
>> [ 4015.630841] btrfs: open_ctree failed
> Did the crashed /dev/sdb have more than 1 partitions in your raid1
> filesystem?
>>
>> Yes, I know, I'll probably be losing a lot of data, but it's not "too
>> much" my concern because I had a backup (sooo happy about that :D). If
>> I can manage to recover a little more on the btrfs volume it's bonus,
>> but in the event I do not, I'll be using my backup.
>>
>> So, how do I fix my volume? I guess there would be a solution apart
>> from scratching/deleting everything and starting again...
>>
>>
>> Regards,
>> Axelle
>>
>>
>>
>> On Fri, Feb 14, 2014 at 11:58 AM, Hugo Mills  wrote:
>>> On Fri, Feb 14, 2014 at 11:35:56AM +0100, Axelle wrote:
 Hi,
 I've just encountered a hard disk crash in one of my btrfs pools.

 sudo btrfs filesystem show
 failed to open /dev/sr0: No medium found
 Label: none  uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add
 Total devices 3 FS bytes used 112.70GB
 devid1 size 100.61GB used 89.26GB path /dev/sdc6
 devid2 size 93.13GB used 84.00GB path /dev/sdc1
 *** Some devices missing

 The device which is missing is /dev/sdb. I have replaced it with a new
 hard disk. How do I add it back to the volume and fix the device
 missing?
 The pool is expected to mount to /samples (it is not mounted yet).

 I tried this - which fails:
 sudo btrfs device add /dev/sdb /samples
 ERROR: error adding the device '/dev/sdb' - Inappropriate ioctl for device

 Why isn't this working?
>>>Because it's not mounted. :)
>>>
 I also tried this:
 sudo mount -o recovery /dev/sdc1 /samples
 mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so
 same with /dev/sdc6
>>>Close, but what you want here is:
>>>
>>> mount -o degraded /dev/sdc1 /samples
>>>
>>> not "recovery". That will tell the FS that there's a missing disk, and
>>> it should mount without complaining. If your data is not RAID-1 or
>>> RAID-10, then you will almost certainly have lost some data.
>>>
>>>At that point, since you've removed the dead disk, you can do:
>>>
>>> btrfs device delete missing /samples
>>>
>>> which forcibly removes the record of the missing device.
>>>
>>>Then you can add the new device:
>>>
>>> btrfs device add /dev/sdb /samples
>>>
>>>And finally balance to repair the RAID:
>>>
>>> btrfs balance start /samples
>>>
>>>It's worth noting that even if you have RAID-1 data and metadata,
>>> losing /dev/sdc in your current configuration is likely to cause
>>> severe data loss -- probably making the whole FS unrecoverable. This
>>> is because the FS sees /dev/sdc1 and /dev/sdc6 as independent devices,
>>> and will happily put both copies of a piece of RAID-1 data (or
>>> metadata) on /dev/sdc -- one on each of sdc1 and sdc6. I therefore
>>> wouldn't recommend running like that for very long.
>>>
>>>Hugo.
>>>
>>> --
>>> === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
>>>   PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
>>>--- All hope abandon,  Ye who press Enter here. ---
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to

Re: Recovering from hard disk failure in a pool

2014-02-14 Thread Daniel Lee

On 02/14/2014 03:04 AM, Axelle wrote:
> Hi Hugo,
>
> Thanks for your answer.
> Unfortunately, I had also tried
>
> sudo mount -o degraded /dev/sdc1 /samples
> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>missing codepage or helper program, or other error
>In some cases useful info is found in syslog - try
>dmesg | tail  or so
>
> and dmesg says:
> [ 1177.695773] btrfs: open_ctree failed
> [ 1247.448766] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
> 2 transid 31105 /dev/sdc1
> [ 1247.449700] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
> 1 transid 31105 /dev/sdc6
> [ 1247.458794] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
> 2 transid 31105 /dev/sdc1
> [ 1247.459601] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
> 1 transid 31105 /dev/sdc6
> [ 4013.363254] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
> 2 transid 31105 /dev/sdc1
> [ 4013.408280] btrfs: allowing degraded mounts
> [ 4013.555764] btrfs: bdev (null) errs: wr 0, rd 14, flush 0, corrupt 0, gen 0
> [ 4015.600424] Btrfs: too many missing devices, writeable mount is not allowed
> [ 4015.630841] btrfs: open_ctree failed
Did the crashed /dev/sdb have more than 1 partitions in your raid1
filesystem?
>
> Yes, I know, I'll probably be losing a lot of data, but it's not "too
> much" my concern because I had a backup (sooo happy about that :D). If
> I can manage to recover a little more on the btrfs volume it's bonus,
> but in the event I do not, I'll be using my backup.
>
> So, how do I fix my volume? I guess there would be a solution apart
> from scratching/deleting everything and starting again...
>
>
> Regards,
> Axelle
>
>
>
> On Fri, Feb 14, 2014 at 11:58 AM, Hugo Mills  wrote:
>> On Fri, Feb 14, 2014 at 11:35:56AM +0100, Axelle wrote:
>>> Hi,
>>> I've just encountered a hard disk crash in one of my btrfs pools.
>>>
>>> sudo btrfs filesystem show
>>> failed to open /dev/sr0: No medium found
>>> Label: none  uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add
>>> Total devices 3 FS bytes used 112.70GB
>>> devid1 size 100.61GB used 89.26GB path /dev/sdc6
>>> devid2 size 93.13GB used 84.00GB path /dev/sdc1
>>> *** Some devices missing
>>>
>>> The device which is missing is /dev/sdb. I have replaced it with a new
>>> hard disk. How do I add it back to the volume and fix the device
>>> missing?
>>> The pool is expected to mount to /samples (it is not mounted yet).
>>>
>>> I tried this - which fails:
>>> sudo btrfs device add /dev/sdb /samples
>>> ERROR: error adding the device '/dev/sdb' - Inappropriate ioctl for device
>>>
>>> Why isn't this working?
>>Because it's not mounted. :)
>>
>>> I also tried this:
>>> sudo mount -o recovery /dev/sdc1 /samples
>>> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>>>missing codepage or helper program, or other error
>>>In some cases useful info is found in syslog - try
>>>dmesg | tail  or so
>>> same with /dev/sdc6
>>Close, but what you want here is:
>>
>> mount -o degraded /dev/sdc1 /samples
>>
>> not "recovery". That will tell the FS that there's a missing disk, and
>> it should mount without complaining. If your data is not RAID-1 or
>> RAID-10, then you will almost certainly have lost some data.
>>
>>At that point, since you've removed the dead disk, you can do:
>>
>> btrfs device delete missing /samples
>>
>> which forcibly removes the record of the missing device.
>>
>>Then you can add the new device:
>>
>> btrfs device add /dev/sdb /samples
>>
>>And finally balance to repair the RAID:
>>
>> btrfs balance start /samples
>>
>>It's worth noting that even if you have RAID-1 data and metadata,
>> losing /dev/sdc in your current configuration is likely to cause
>> severe data loss -- probably making the whole FS unrecoverable. This
>> is because the FS sees /dev/sdc1 and /dev/sdc6 as independent devices,
>> and will happily put both copies of a piece of RAID-1 data (or
>> metadata) on /dev/sdc -- one on each of sdc1 and sdc6. I therefore
>> wouldn't recommend running like that for very long.
>>
>>Hugo.
>>
>> --
>> === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
>>   PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
>>--- All hope abandon,  Ye who press Enter here. ---
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Instructions to release your payment.

2014-02-14 Thread Dr.William Davies

Attention

Sir,

Following an application brought, seeking the release of your due payment
through British bank, I am directed to inform you that the application has
been approved and Natwest bank of London has been mandated to make transfer
of your payment to the bank account you will nominate. Please kindly reply
for immediate release of your US$6.2 Million to you nominates account.

Sir, for the avoidance of doubts, reconfirm the following information to
me to enable us forward same to Natwest bank to contact you for your
payment.

Name:

Address:

Tel/Fax No.:

Nationality:

Occupation:

Date of birth: 

As soon as I received the above information, I will forward them to
Natwest bank to contact you for your approved payment. Please see in the
attachment, letter I wrote to Natwest bank informing them of the
transaction

Yours faithfully

Dr.William Davies
Chairman,British Banking Regulatory Board.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS with RAID1 cannot boot when removing drive

2014-02-14 Thread Saint Germain

On 11 February 2014 03:30, Saint Germain  wrote:
>> > I am experimenting with BTRFS and RAID1 on my Debian Wheezy (with
>> > backported kernel 3.12-0.bpo.1-amd64) using a a motherboard with
>> > UEFI.
>>
>> > I have installed Debian with the following partition on the first
>> > hard drive (no BTRFS subsystem):
>> > /dev/sda1: for / (BTRFS)
>> > /dev/sda2: for /home (BTRFS)
>> > /dev/sda3: for swap
>> >
>> > Then I added another drive for a RAID1 configuration (with btrfs
>> > balance) and I installed grub on the second hard drive with
>> > "grub-install /dev/sdb".
>>
>> You should be able to mount a two-device btrfs raid1 filesystem with
>> only a single device with the degraded mount option, tho I believe
>> current kernels refuse a read-write mount in that case, so you'll
>> have read-only access until you btrfs device add a second device, so
>> it can do normal raid1 mode once again.
>>
>> Meanwhile, I don't believe it's on the wiki, but it's worth noting my
>> experience with btrfs raid1 mode in my pre-deployment tests.
>> Actually, with the (I believe) mandatory read-only mount if raid1 is
>> degraded below two devices, this problem's going to be harder to run
>> into than it was in my testing several kernels ago, but here's what I
>> found:
>>
>> But as I said, if btrfs only allows read-only mounts of filesystems
>> without enough devices to properly complete the raidlevel, that
>> shouldn't be as big an issue these days, since it should be more
>> difficult or impossible to get the two devices separately mounted
>> writable in the first place, with the consequence that the differing
>> copies issue will be difficult or impossible to trigger in the first
>> place. =:^)
>>

Hello,

With your advices and Chris ones, I have now a (clean ?) partition to
start experimenting with RAID1 (and which boot correctly in UEFI
mode):
sda1 = BIOS Boot partition
sda2 = EFI System Partition
sda3 = BTFS partition
sda4 = swap partition
For the moment I haven't created subvolumes (for "/" and for "/home"
for instance) to keep things simple.

The idea is then to create a RAID1 with a sdb drive (duplicate sda
partitioning, add/balance/convert sdb3 + grub-install on sdb, add sdb
swap UUID in /etc/fstab), shutdown and remove sda to check the
procedure to replace it.

I read the last thread on the subject "lost with degraded RAID1", but
would like to really confirm what would be the current approved
procedure and if it will be valid for future BTRFS version (especially
about the read-only mount).

So what should I do from there ?
Here are a few questions:

1) Boot in degraded mode: currently with my kernel
(3.12-0.bpo.1-amd64, from Debian wheezy-backports) it seems that I can
mount in read-write mode.
However for future kernel, it seems that I will be only able to mount
read-only ? See here:
http://www.spinics.net/lists/linux-btrfs/msg20164.html
https://bugzilla.kernel.org/show_bug.cgi?id=60594

2) If I am able to mount read-write, is this the correct procedure:
  a) place a new drive in another physical location sdc (I don't think
I can use the same sda physical location ?)
  b) boot in degraded mode on sdb
  c) use the 'replace' command to replace sda by sdc
  d) perhaps a 'balance' is necessary ?

3) Can I use also the above procedure if I am only allowed to mount read-only ?

4) If I want to use my system without RAID1 support (dangerous I
know), after booting in degraded mode with read-write, can I convert
back sdb from RAID1 to RAID0 in a safe way ?
(btrfs balance start -dconvert=raid0 -mconvert=raid0 /)

5) Perhaps a recovery procedure which includes booting on a different
rescue disk would be more appropriate ?

Thanks again,
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 0/3] __btrfs_drop_extents() BUG_ON reproducer

2014-02-14 Thread Filipe David Manana

On Fri, Feb 14, 2014 at 12:46 PM, David Disseldorp  wrote:
> Ping, any Btrfsers get a chance to look at this patch series?
> I'd like to get it into the QA tree.

If no one else gets there first, I'll take a look at it soon.
thanks

>
> On Fri,  7 Feb 2014 11:35:38 +0100, David Disseldorp wrote:
>
>> This patch-set provides a reproducer for hitting the 3.14.0-rc1 BUG_ON()
>> at:
>>  692 int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
>> ...
>>  839 /*
>>  840  *  |  range to drop - |
>>  841  *  |  extent  |
>>  842  */
>>  843 if (start <= key.offset && end < extent_end) {
>>  844 BUG_ON(extent_type == BTRFS_FILE_EXTENT_INLINE);
>>  845
>>  846 memcpy(&new_key, &key, sizeof(new_key));
>>
>> The first patch adds a small cloner binary which is used by btrfs/035 to
>> dispatch BTRFS_IOC_CLONE_RANGE requests.
>>
>> This workload resembles that of Samba's vfs_btrfs module, when a Windows
>> client restores a file from a shadow-copy (snapshot) using server-side
>> copy requests.
>>
>> Changes since V2:
>> - Remove explicit write error checks
>>
>> Changes since V1:
>> - Use strtoull instead of atoi
>> - Print error conditions in cloner
>> - Check for cloner binary before running test
>> - Continue test on failure
>> - Add cloner to .gitignore
>>
>> Feedback appreciated.
>>
>> Cheers, David
>>
>>
>>  .gitignore  |   1 +
>>  configure.ac|   1 +
>>  src/Makefile|   2 +-
>>  src/cloner.c| 192 
>> +++
>>  tests/btrfs/035 |  77 
>> +
>>  tests/btrfs/035.out |   3 +++
>>  tests/btrfs/group   |   1 +
>>  7 files changed, 276 insertions(+), 1 deletion(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Hello

2014-02-14 Thread Karty


HI,

My Name is Miss Karty. I will like to be your friend and there is something I 
want us to discuss.
Thanks
Karty


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 0/3] __btrfs_drop_extents() BUG_ON reproducer

2014-02-14 Thread David Disseldorp

Ping, any Btrfsers get a chance to look at this patch series?
I'd like to get it into the QA tree.

On Fri,  7 Feb 2014 11:35:38 +0100, David Disseldorp wrote:

> This patch-set provides a reproducer for hitting the 3.14.0-rc1 BUG_ON()
> at:
>  692 int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
> ...
>  839 /*
>  840  *  |  range to drop - |
>  841  *  |  extent  |
>  842  */
>  843 if (start <= key.offset && end < extent_end) {
>  844 BUG_ON(extent_type == BTRFS_FILE_EXTENT_INLINE);
>  845 
>  846 memcpy(&new_key, &key, sizeof(new_key));
> 
> The first patch adds a small cloner binary which is used by btrfs/035 to
> dispatch BTRFS_IOC_CLONE_RANGE requests.
> 
> This workload resembles that of Samba's vfs_btrfs module, when a Windows
> client restores a file from a shadow-copy (snapshot) using server-side
> copy requests.
> 
> Changes since V2:
> - Remove explicit write error checks
> 
> Changes since V1:
> - Use strtoull instead of atoi
> - Print error conditions in cloner
> - Check for cloner binary before running test
> - Continue test on failure
> - Add cloner to .gitignore
> 
> Feedback appreciated.
> 
> Cheers, David
> 
> 
>  .gitignore  |   1 +
>  configure.ac|   1 +
>  src/Makefile|   2 +-
>  src/cloner.c| 192 
> +++
>  tests/btrfs/035 |  77 
> +
>  tests/btrfs/035.out |   3 +++
>  tests/btrfs/group   |   1 +
>  7 files changed, 276 insertions(+), 1 deletion(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recovering from hard disk failure in a pool

2014-02-14 Thread Axelle

Hi,
Some update:

>sudo mount -o degraded /dev/sdc1 /samples
>mount: wrong fs type, bad option, bad superblock on /dev/sdc1,

I am mounting it read-only, and backuping what I can still access to
another drive.

Then, what should I do? Fully erase the volume and create a new one?
Or is there a way I can use the snapshots I had?
Or somehow fix the ro volume, add the new disk to it, and re-mount rw?

Regards,
Axelle.




On Fri, Feb 14, 2014 at 12:04 PM, Axelle  wrote:
> Hi Hugo,
>
> Thanks for your answer.
> Unfortunately, I had also tried
>
> sudo mount -o degraded /dev/sdc1 /samples
> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>missing codepage or helper program, or other error
>In some cases useful info is found in syslog - try
>dmesg | tail  or so
>
> and dmesg says:
> [ 1177.695773] btrfs: open_ctree failed
> [ 1247.448766] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
> 2 transid 31105 /dev/sdc1
> [ 1247.449700] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
> 1 transid 31105 /dev/sdc6
> [ 1247.458794] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
> 2 transid 31105 /dev/sdc1
> [ 1247.459601] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
> 1 transid 31105 /dev/sdc6
> [ 4013.363254] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
> 2 transid 31105 /dev/sdc1
> [ 4013.408280] btrfs: allowing degraded mounts
> [ 4013.555764] btrfs: bdev (null) errs: wr 0, rd 14, flush 0, corrupt 0, gen 0
> [ 4015.600424] Btrfs: too many missing devices, writeable mount is not allowed
> [ 4015.630841] btrfs: open_ctree failed
>
> Yes, I know, I'll probably be losing a lot of data, but it's not "too
> much" my concern because I had a backup (sooo happy about that :D). If
> I can manage to recover a little more on the btrfs volume it's bonus,
> but in the event I do not, I'll be using my backup.
>
> So, how do I fix my volume? I guess there would be a solution apart
> from scratching/deleting everything and starting again...
>
>
> Regards,
> Axelle
>
>
>
> On Fri, Feb 14, 2014 at 11:58 AM, Hugo Mills  wrote:
>> On Fri, Feb 14, 2014 at 11:35:56AM +0100, Axelle wrote:
>>> Hi,
>>> I've just encountered a hard disk crash in one of my btrfs pools.
>>>
>>> sudo btrfs filesystem show
>>> failed to open /dev/sr0: No medium found
>>> Label: none  uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add
>>> Total devices 3 FS bytes used 112.70GB
>>> devid1 size 100.61GB used 89.26GB path /dev/sdc6
>>> devid2 size 93.13GB used 84.00GB path /dev/sdc1
>>> *** Some devices missing
>>>
>>> The device which is missing is /dev/sdb. I have replaced it with a new
>>> hard disk. How do I add it back to the volume and fix the device
>>> missing?
>>> The pool is expected to mount to /samples (it is not mounted yet).
>>>
>>> I tried this - which fails:
>>> sudo btrfs device add /dev/sdb /samples
>>> ERROR: error adding the device '/dev/sdb' - Inappropriate ioctl for device
>>>
>>> Why isn't this working?
>>
>>Because it's not mounted. :)
>>
>>> I also tried this:
>>> sudo mount -o recovery /dev/sdc1 /samples
>>> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>>>missing codepage or helper program, or other error
>>>In some cases useful info is found in syslog - try
>>>dmesg | tail  or so
>>> same with /dev/sdc6
>>
>>Close, but what you want here is:
>>
>> mount -o degraded /dev/sdc1 /samples
>>
>> not "recovery". That will tell the FS that there's a missing disk, and
>> it should mount without complaining. If your data is not RAID-1 or
>> RAID-10, then you will almost certainly have lost some data.
>>
>>At that point, since you've removed the dead disk, you can do:
>>
>> btrfs device delete missing /samples
>>
>> which forcibly removes the record of the missing device.
>>
>>Then you can add the new device:
>>
>> btrfs device add /dev/sdb /samples
>>
>>And finally balance to repair the RAID:
>>
>> btrfs balance start /samples
>>
>>It's worth noting that even if you have RAID-1 data and metadata,
>> losing /dev/sdc in your current configuration is likely to cause
>> severe data loss -- probably making the whole FS unrecoverable. This
>> is because the FS sees /dev/sdc1 and /dev/sdc6 as independent devices,
>> and will happily put both copies of a piece of RAID-1 data (or
>> metadata) on /dev/sdc -- one on each of sdc1 and sdc6. I therefore
>> wouldn't recommend running like that for very long.
>>
>>Hugo.
>>
>> --
>> === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
>>   PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
>>--- All hope abandon,  Ye who press Enter here. ---
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recovering from hard disk failure in a pool

2014-02-14 Thread Axelle

Hi Hugo,

Thanks for your answer.
Unfortunately, I had also tried

sudo mount -o degraded /dev/sdc1 /samples
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
   missing codepage or helper program, or other error
   In some cases useful info is found in syslog - try
   dmesg | tail  or so

and dmesg says:
[ 1177.695773] btrfs: open_ctree failed
[ 1247.448766] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
2 transid 31105 /dev/sdc1
[ 1247.449700] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
1 transid 31105 /dev/sdc6
[ 1247.458794] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
2 transid 31105 /dev/sdc1
[ 1247.459601] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
1 transid 31105 /dev/sdc6
[ 4013.363254] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
2 transid 31105 /dev/sdc1
[ 4013.408280] btrfs: allowing degraded mounts
[ 4013.555764] btrfs: bdev (null) errs: wr 0, rd 14, flush 0, corrupt 0, gen 0
[ 4015.600424] Btrfs: too many missing devices, writeable mount is not allowed
[ 4015.630841] btrfs: open_ctree failed

Yes, I know, I'll probably be losing a lot of data, but it's not "too
much" my concern because I had a backup (sooo happy about that :D). If
I can manage to recover a little more on the btrfs volume it's bonus,
but in the event I do not, I'll be using my backup.

So, how do I fix my volume? I guess there would be a solution apart
from scratching/deleting everything and starting again...


Regards,
Axelle



On Fri, Feb 14, 2014 at 11:58 AM, Hugo Mills  wrote:
> On Fri, Feb 14, 2014 at 11:35:56AM +0100, Axelle wrote:
>> Hi,
>> I've just encountered a hard disk crash in one of my btrfs pools.
>>
>> sudo btrfs filesystem show
>> failed to open /dev/sr0: No medium found
>> Label: none  uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add
>> Total devices 3 FS bytes used 112.70GB
>> devid1 size 100.61GB used 89.26GB path /dev/sdc6
>> devid2 size 93.13GB used 84.00GB path /dev/sdc1
>> *** Some devices missing
>>
>> The device which is missing is /dev/sdb. I have replaced it with a new
>> hard disk. How do I add it back to the volume and fix the device
>> missing?
>> The pool is expected to mount to /samples (it is not mounted yet).
>>
>> I tried this - which fails:
>> sudo btrfs device add /dev/sdb /samples
>> ERROR: error adding the device '/dev/sdb' - Inappropriate ioctl for device
>>
>> Why isn't this working?
>
>Because it's not mounted. :)
>
>> I also tried this:
>> sudo mount -o recovery /dev/sdc1 /samples
>> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>>missing codepage or helper program, or other error
>>In some cases useful info is found in syslog - try
>>dmesg | tail  or so
>> same with /dev/sdc6
>
>Close, but what you want here is:
>
> mount -o degraded /dev/sdc1 /samples
>
> not "recovery". That will tell the FS that there's a missing disk, and
> it should mount without complaining. If your data is not RAID-1 or
> RAID-10, then you will almost certainly have lost some data.
>
>At that point, since you've removed the dead disk, you can do:
>
> btrfs device delete missing /samples
>
> which forcibly removes the record of the missing device.
>
>Then you can add the new device:
>
> btrfs device add /dev/sdb /samples
>
>And finally balance to repair the RAID:
>
> btrfs balance start /samples
>
>It's worth noting that even if you have RAID-1 data and metadata,
> losing /dev/sdc in your current configuration is likely to cause
> severe data loss -- probably making the whole FS unrecoverable. This
> is because the FS sees /dev/sdc1 and /dev/sdc6 as independent devices,
> and will happily put both copies of a piece of RAID-1 data (or
> metadata) on /dev/sdc -- one on each of sdc1 and sdc6. I therefore
> wouldn't recommend running like that for very long.
>
>Hugo.
>
> --
> === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
>   PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
>--- All hope abandon,  Ye who press Enter here. ---
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recovering from hard disk failure in a pool

2014-02-14 Thread Hugo Mills

On Fri, Feb 14, 2014 at 11:35:56AM +0100, Axelle wrote:
> Hi,
> I've just encountered a hard disk crash in one of my btrfs pools.
> 
> sudo btrfs filesystem show
> failed to open /dev/sr0: No medium found
> Label: none  uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add
> Total devices 3 FS bytes used 112.70GB
> devid1 size 100.61GB used 89.26GB path /dev/sdc6
> devid2 size 93.13GB used 84.00GB path /dev/sdc1
> *** Some devices missing
> 
> The device which is missing is /dev/sdb. I have replaced it with a new
> hard disk. How do I add it back to the volume and fix the device
> missing?
> The pool is expected to mount to /samples (it is not mounted yet).
> 
> I tried this - which fails:
> sudo btrfs device add /dev/sdb /samples
> ERROR: error adding the device '/dev/sdb' - Inappropriate ioctl for device
> 
> Why isn't this working?

   Because it's not mounted. :)

> I also tried this:
> sudo mount -o recovery /dev/sdc1 /samples
> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>missing codepage or helper program, or other error
>In some cases useful info is found in syslog - try
>dmesg | tail  or so
> same with /dev/sdc6

   Close, but what you want here is: 

mount -o degraded /dev/sdc1 /samples

not "recovery". That will tell the FS that there's a missing disk, and
it should mount without complaining. If your data is not RAID-1 or
RAID-10, then you will almost certainly have lost some data.

   At that point, since you've removed the dead disk, you can do:

btrfs device delete missing /samples

which forcibly removes the record of the missing device.

   Then you can add the new device:

btrfs device add /dev/sdb /samples

   And finally balance to repair the RAID:

btrfs balance start /samples

   It's worth noting that even if you have RAID-1 data and metadata,
losing /dev/sdc in your current configuration is likely to cause
severe data loss -- probably making the whole FS unrecoverable. This
is because the FS sees /dev/sdc1 and /dev/sdc6 as independent devices,
and will happily put both copies of a piece of RAID-1 data (or
metadata) on /dev/sdc -- one on each of sdc1 and sdc6. I therefore
wouldn't recommend running like that for very long.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- All hope abandon,  Ye who press Enter here. ---   

signature.asc
Description: Digital signature

Recovering from hard disk failure in a pool

2014-02-14 Thread Axelle

Hi,
I've just encountered a hard disk crash in one of my btrfs pools.

sudo btrfs filesystem show
failed to open /dev/sr0: No medium found
Label: none  uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add
Total devices 3 FS bytes used 112.70GB
devid1 size 100.61GB used 89.26GB path /dev/sdc6
devid2 size 93.13GB used 84.00GB path /dev/sdc1
*** Some devices missing

The device which is missing is /dev/sdb. I have replaced it with a new
hard disk. How do I add it back to the volume and fix the device
missing?
The pool is expected to mount to /samples (it is not mounted yet).

I tried this - which fails:
sudo btrfs device add /dev/sdb /samples
ERROR: error adding the device '/dev/sdb' - Inappropriate ioctl for device

Why isn't this working?

I also tried this:
sudo mount -o recovery /dev/sdc1 /samples
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
   missing codepage or helper program, or other error
   In some cases useful info is found in syslog - try
   dmesg | tail  or so
same with /dev/sdc6

I ran btrfsck --repair on /dev/sdc1 and /dev/sdc6. Apart that it
reports a device is missing (/dev/sdb) seems okay.

I also tried:
sudo btrfs filesystem df /samples
ERROR: couldn't get space info on '/samples' - Inappropriate ioctl for device

and as I'm supposed to have a snapshot, this (but I suppose it's
helpless as the volume isn't mounted)
btrfs subvolume snapshot /samples/malwareSnapshot /before
ERROR: error accessing '/samples/malwareSnapshot'

Please help me out, thanks
Axelle.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

A WARN_ON running fsstress punch hole

2014-02-14 Thread EthanLien


Hello,

I used the command "fsstress -f punch=20 -d /volume1 -n 1 -p 50"
to repeatedly stress my btrfs volume

After a few hours stress, I got a WARN_ON at fs/btrfs/file.c:553
It seems someone gave btrfs_drop_extent_cache a range to drop
where end < start

The call flow is btrfs_punch_hole , __btrfs_drop_extents , fill_holes , 
btrfs_drop_extent_cache. I found when the WARN_ON was hit,

__btrfs_drop_extents sent a range where cur_offset == drop_end
thus caused the WARN_ON

Now I think there may be some problems with __btrfs_drop_extents.
I set some logs in __btrfs_drop_extents and find it will return
*drop_end = start and the return value ret is still 0 in some situations.
In this situation, __btrfs_drop_extents intends to drop many extents.
The first extent is truncated, and the extent_end is set to start. The
following extents are entirely dropped. Finally, it loops back to the 
beginning

of the while loop and uses btrfs_next_leaf to search next leaf, but somehow
it gets the extent which is first truncated and thus return *drop_end = 
start


Has anyone ever met problem like this?

Thanks

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

47 matches

Mail list logo