BTRFS with more than two parities

2014-10-20 Thread Ronny Egner
Dear All,

i was wondering what happened with the patch posted by Andrea Mazzoleni
back in
Februrary 2014 (this Thread:
http://thread.gmane.org/gmane.linux.kernel/1654735).

Why wash´t it added to the code? Something missing/wrong?

In my opinion the posted patch is awesome and would enable a unique
feature to btrfs that
no file system / volume manager on Linux and other UNIX-operating system
currently has.


Cheers
Ronny

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] Btrfs for stable (mostly 3.17)

2014-10-20 Thread Greg KH
On Mon, Oct 20, 2014 at 01:22:22PM +0100, Filipe Manana wrote:
> 
> 
> On 10/20/2014 12:13 AM, Greg KH wrote:
> > On Sun, Oct 19, 2014 at 09:55:11PM +0200, Greg KH wrote:
> >> On Sun, Oct 19, 2014 at 06:01:16AM -0400, Chris Mason wrote:
> >>> Hi everyone,
> >>>
> >>> I've pulled out some of the btrfs commits from the merge window that
> >>> we'd like to see in stable.  The full list of sha's from Linus is below,
> >>> you can see 4 of them are only needed on 3.17
> >>>
> >>> 2fad4e83e12591eb3bd213875b9edc2d18e93383
> >>> 0b4699dcb65c2cff793210b07f40b98c2d423a43 # v3.17
> >>> 12b894cb288d57292b01cf158177b6d5c89a6272
> >>> 78a017a2c92df9b571db0a55a016280f9019c65e
> >>> 4d1a40c66bed0b3fa43b9da5fbd5cbe332e4eccf
> >>> e6c4efd87ab04e5ead363f24e6ac35ed3506d401 # v3.17
> >>> f6acfd50110b335c7af636cf1fc8e55319cae5fc
> >>> 1d52c78afbbf80b58299e076a159617d6b42fe3c
> >>> 75bfb9aff45e44625260f52a5fd581b92ace3e62
> >>> bbe9051441effce51c9a533d2c56440df64db2d7
> >>> 32be3a1ac6d09576c57063c6c350ca36eaebdbd3 # v3.17
> >>> 42383020beb1cfb05f5d330cc311931bc4917a97
> >>> d37973082b453ba6b89ec07eb7b84305895d35e1 # v3.17
> >>
> >> I'm confused, the others not marked with a "# v3.17" need to go on older
> >> kernels as well?
> > 
> > I've picked up the ones that apply and build for the older stable
> > kernels I maintain now, thanks for the list.
> 
> May I suggest porting the following commit to 3.14 too?
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=766b5e5ae78dd04a93a275690a49e23d7dcb1f39
> 
> It fixes a data corruption issue for an incremental send. Particularly
> important, IMHO, as the corruption happens silently (no errors returned
> to user space nor any sort of warnings/errors in syslog, etc). It
> affects only 3.14, and the change applies cleanly on 3.14.22.

Chris, any objection for me taking this?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange 3.16.3 problem

2014-10-20 Thread Goffredo Baroncelli
On 10/20/2014 07:37 PM, Robert White wrote:
> On 10/18/2014 04:41 PM, Russell Coker wrote:
[...]
> Also you said that you are using a 32bit user space "copied from
> another server" under a 64bit kernel. Is the "ls" command a 32 bit
> executable then?

Could this be related to the inode overflow in 32 bit system 
(see inode_cache options) ? If so running a 64bit "ls -i" should
work


> -- To unsubscribe from this list: send the line "unsubscribe
> linux-btrfs" in the body of a message to majord...@vger.kernel.org 
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 RESEND] Move BTRFS RCU string to common library

2014-10-20 Thread Omar Sandoval
On Fri, Oct 03, 2014 at 10:12:38AM -0700, Omar Sandoval wrote:
> The RCU-friendly string API used internally by BTRFS is generic enough for
> common use. This doesn't add any new functionality, but instead just moves the
> code and documents the existing API.
> 
> Signed-off-by: Omar Sandoval 
> Reviewed-by: Josh Triplett 
> Acked-by: Paul E. McKenney 
> ---
> Version 4 doesn't return anything from the printk wrappers on the assumption
> that printk will return void someday (possibly soon). This is a resubmission
> because I omitted the v4 from the subject last time so it may have gotten
> buried. It applies to v3.17-rc7 and should be good to go for 3.18 or 3.19.
> 
>  fs/btrfs/check-integrity.c |  6 ++--
>  fs/btrfs/dev-replace.c | 19 +-
>  fs/btrfs/disk-io.c |  6 ++--
>  fs/btrfs/extent_io.c   |  4 +--
>  fs/btrfs/ioctl.c   |  4 +--
>  fs/btrfs/raid56.c  |  2 +-
>  fs/btrfs/rcu-string.h  | 56 -
>  fs/btrfs/scrub.c   | 15 
>  fs/btrfs/super.c   |  2 +-
>  fs/btrfs/volumes.c | 14 
>  include/linux/rcustring.h  | 89 
> ++
>  11 files changed, 126 insertions(+), 91 deletions(-)
>  delete mode 100644 fs/btrfs/rcu-string.h
>  create mode 100644 include/linux/rcustring.h
> 
> diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
> index ce92ae3..4ccd7da 100644
> --- a/fs/btrfs/check-integrity.c
> +++ b/fs/btrfs/check-integrity.c
> @@ -94,6 +94,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "ctree.h"
>  #include "disk-io.h"
>  #include "hash.h"
> @@ -103,7 +104,6 @@
>  #include "print-tree.h"
>  #include "locking.h"
>  #include "check-integrity.h"
> -#include "rcu-string.h"
>  
>  #define BTRFSIC_BLOCK_HASHTABLE_SIZE 0x1
>  #define BTRFSIC_BLOCK_LINK_HASHTABLE_SIZE 0x1
> @@ -851,8 +851,8 @@ static int btrfsic_process_superblock_dev_mirror(
>   printk_in_rcu(KERN_INFO "New initial S-block (bdev %p, 
> %s)"
>" @%llu (%s/%llu/%d)\n",
>superblock_bdev,
> -  rcu_str_deref(device->name), dev_bytenr,
> -  dev_state->name, dev_bytenr,
> +  rcu_string_dereference(device->name),
> +  dev_bytenr, dev_state->name, dev_bytenr,
>superblock_mirror_num);
>   list_add(&superblock_tmp->all_blocks_node,
>&state->all_blocks_list);
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index eea26e1..87d10cc 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include "ctree.h"
>  #include "extent_map.h"
> @@ -34,7 +35,6 @@
>  #include "volumes.h"
>  #include "async-thread.h"
>  #include "check-integrity.h"
> -#include "rcu-string.h"
>  #include "dev-replace.h"
>  #include "sysfs.h"
>  
> @@ -376,9 +376,9 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
>   printk_in_rcu(KERN_INFO
> "BTRFS: dev_replace from %s (devid %llu) to %s started\n",
> src_device->missing ? "" :
> - rcu_str_deref(src_device->name),
> +   rcu_string_dereference(src_device->name),
> src_device->devid,
> -   rcu_str_deref(tgt_device->name));
> +   rcu_string_dereference(tgt_device->name));
>  
>   tgt_device->total_bytes = src_device->total_bytes;
>   tgt_device->disk_total_bytes = src_device->disk_total_bytes;
> @@ -528,9 +528,10 @@ static int btrfs_dev_replace_finishing(struct 
> btrfs_fs_info *fs_info,
>   printk_in_rcu(KERN_ERR
> "BTRFS: btrfs_scrub_dev(%s, %llu, %s) failed 
> %d\n",
> src_device->missing ? "" :
> - rcu_str_deref(src_device->name),
> +   rcu_string_dereference(src_device->name),
> src_device->devid,
> -   rcu_str_deref(tgt_device->name), scrub_ret);
> +   rcu_string_dereference(tgt_device->name),
> +   scrub_ret);
>   btrfs_dev_replace_unlock(dev_replace);
>   mutex_unlock(&root->fs_info->fs_devices->device_list_mutex);
>   mutex_unlock(&root->fs_info->chunk_mutex);
> @@ -544,9 +545,9 @@ static int btrfs_dev_replace_finishing(struct 
> btrfs_fs_info *fs_info,
>   printk_in_rcu(KERN_INFO
> "BTRFS: dev_replace from %s (devid %llu) to %s) 
> finished\n",
> src_device->missing ? "" :
> - rcu_str_deref(src_device->name),
> +   rcu_string_dereference(src_device->name),
>   

Re: unexplainable corruptions 3.17.0

2014-10-20 Thread Tomasz Torcz
On Fri, Oct 17, 2014 at 11:01:51AM -0400, Chris Murphy wrote:
> 
> On Oct 16, 2014, at 5:17 AM, Tomasz Torcz  wrote:
> > 
> >  Broken files are in /var/log/journal directory. This directory
> > is set NOCOW with chattr, all the files within too.
> > 
> > Example of broken file:
> > system@0005057fe87730cf-6d3d85ed59bd70ae.journal~
> 
> What do you get for 'journalctl --verify' ? I'm curious if any journal files 
> are considered corrupt by journalctl, and if there's parity between 
> journalctl and dd_rescue when it comes to good/bad journals.

  journalctl "bus errors" on them.


-- 
Tomasz Torcz  ,,If you try to upissue this patchset I shall be 
seeking
xmpp: zdzich...@chrome.pl   an IP-routable hand grenade.'' -- Andrew Morton 
(LKML)

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unexplainable corruptions 3.17.0

2014-10-20 Thread Tomasz Torcz
On Fri, Oct 17, 2014 at 08:53:06AM -0400, Chris Mason wrote:
> On Fri, Oct 17, 2014 at 4:54 AM, Tomasz Torcz  wrote:
> >On Fri, Oct 17, 2014 at 04:29:36PM +0800, Liu Bo wrote:
> >> On Fri, Oct 17, 2014 at 10:10:09AM +0200, Tomasz Torcz wrote:
> >> > On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote:
> >> > > >   Recently I've observed some corruptions to systemd's journal
> >> > > > files which are somewhat puzzling. This is especially worrying
> >> > > > as this is btrfs raid1 setup and I expected auto-healing.
> >> > > > read(4, 0x1001000, 65536)   = -1 EIO (Input/output
> >>error)
> >>
> >> Well..I don't know exactly what's the cause, but as the file is NOCOW,
> >>it writes
> >> data in place, have you experienced a hard reboot or something
> >>recently?
> >
> >  Nothing like that.  Server is on an UPS, there were couple normal
> >shutdowns
> >this year (few kernel upgrades).
> >
> >> And any message in dmesg log while getting EIO by reading the file?
> >
> >  Nothing in dmesg, no btrfs messages, no SCSI/SATA errors, nothing.
> >That's
> >why I find those corruptions mysterious.
> >  Maybe there is some way to inspect internal btrfs state and find out
> >what
> >causing the problems?  Or maybe this is related to patch mentioned in this
> >thread?
> 
> This sounds like the problem fixed with some patches to our extent mapping
> code  that went in with the merge window.  I've cherry picked a few for
> stable and I'm running them through tests now.  They are in my stable-3.17
> branch, and I'll send to Greg once Linus grabs the revert for the last one.
> 
> But, if you want to try that branch out, it may fix this EIO.  Otherwise
> we'll start sending you debugging.

  Good shot.  Fedora kernel maintainer was kind enough to include those patches
and build a kernel for F21.  With this kernel EIO is not showing and files
are readable.  Thanks!

-- 
Tomasz Torcz  ,,If you try to upissue this patchset I shall be 
seeking
xmpp: zdzich...@chrome.pl   an IP-routable hand grenade.'' -- Andrew Morton 
(LKML)

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION?] Used+avail gives more than size of device

2014-10-20 Thread David Sterba
On Sun, Oct 12, 2014 at 10:55:54PM +, Duncan wrote:
[...]
> It's the raid-factor.  =:^)
> 
> Btrfs in the kernel is apparently accounting for raid-factor in used 
> space in whatever function standard df is using, but not in available 
> space, even where that available space is already chunk-allocated (btrfs 
> fi show, individual devices, size vs. used, where "used" simply means 
> chunk-allocated for show) and thus the raid-factor known.

Nice analysis. There must be a bug then. The raid factor is taken into account
when calculating the usable space:

super.c:btrfs_statfs():
1827 buf->f_bavail = total_free_data;
1828 ret = btrfs_calc_avail_data_space(fs_info->tree_root, 
&total_free_data);
1829 if (ret) {
1830 mutex_unlock(&fs_info->chunk_mutex);
1831 mutex_unlock(&fs_info->fs_devices->device_list_mutex);
1832 return ret;
1833 }
1834 buf->f_bavail += div_u64(total_free_data, factor);

but it looks like the free space is counted twice, lines 1827 and 1834. The
function btrfs_calc_avail_data_space does the guesswork "how much space could
be still allocated" in the logical units, basically simulating the allocator
logic. I'll have a look as I've touched btrfs_statfs last.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange 3.16.3 problem

2014-10-20 Thread Robert White

On 10/18/2014 04:41 PM, Russell Coker wrote:

On Sun, 19 Oct 2014, Robert White  wrote:

On 10/17/2014 08:54 PM, Russell Coker wrote:

# find . -name "*546"
./1412233213.M638209P10546
# ls -l ./1412233213.M638209P10546
ls: cannot access ./1412233213.M638209P10546: No such file or directory

Any suggestions?


Does "ls -l *546" show the file to exist? e.g. what happens if you use
the exact same wildcard in the ls command as you used in the find?


# ls -l *546
ls: cannot access 1412233213.M638209P10546: No such file or directory

That gives the same result as find, the shell matches the file name but then
ls can't view it.

lstat64("1412233213.M638209P10546", 0x9fab0c8) = -1 ENOENT (No such file or
directory)

 From strace, the lstat64 system call fails.


Okay, from the strace output the shell _is_ finding the file in the 
directory read and expand (readdir) pass. That is "*546" is being 
expanded to the full file name text "1412233213.M638209P10546" but then 
the actual operation fails because the name is apparently not associated 
with anything.


So what pass of scrub or btrfsck checks directory connectedness? Does 
that pass give your file system a clean bill of health?


Also you said that you are using a 32bit user space "copied from another 
server" under a 64bit kernel. Is the "ls" command a 32 bit executable then?


What happens if you stop the Xen domain for the mail server and then 
mount the disks into a native 64bit environment and then ls the file name?


I ask because the man page for lstat64 says its a "wrapper" for the 
underlying system call (fstatat64). It is not impossible that you might 
have a case where the wrapper is failing inside glibc due to some 32/64 
bit conversion taking place.


Since you copied the entire 32bit environment from another (older?) 
server there may be some nonsense happening where the two interfaces meet.


I'd check the file system against a native 64bit kernel and user-space 
next. Possibly from a distro CD if necessary, just to isolate the 
potential file system causes from the user-space causes. If the native 
64bit environment fails then its a fs issue, if the natvie 64bit 
operations work, then its a userspace problem and you win the fun of 
remaking the mail server from scratch.



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Compressed size of a filesystem

2014-10-20 Thread David Sterba
On Mon, Oct 20, 2014 at 09:05:41AM -0700, Suman Chakravartula wrote:
> I'd like to calculate the compressed size of a btrfs filesystem. I read 
> the wiki and understand the backward compatibility issues mentioned. 
> Also, the "df before and after" method doesn't work for me and I have no 
> control over when end users are writing to the filesystem.
> 
> Wiki says there's a patch which is not merged. Is that refering to this 
> discussion:
> 
> http://comments.gmane.org/gmane.comp.file-systems.btrfs/14942

I'll update the link.

> Why was the patch not merged?

Because the approach was abandoned in favor of a more generic solution
that reuses the existing interface (FIEMAP). The patchset implementing
this is

http://thread.gmane.org/gmane.comp.file-systems.btrfs/37312
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs-progs release 3.17

2014-10-20 Thread David Sterba
Hi,

the version 3.17 of btrfs-progs has been released.

git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git v3.17
https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/btrfs-progs-v3.17.tar.xz

Among other fixes and updates, there are many fsck improvements, most notably a
fix for the bug introduced in 3.17 regarding inconsistencies after read-only
snapshots (fix https://patchwork.kernel.org/patch/5086521/).

User visible changes:
 * check: --init-csum-tree acutally does something useful, rebuilds the whole
   csum tree
 * /dev scanning for btrfs devices is gone
 * /proc/partitions scanning is gone, blkid is used exclusively
 * new subcommand 'subvolume sync'
 * filesystem df: new options to set unit format
 * convert: allow to copy label from the origin, or specify a new one
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Poll: time to switch skinny-metadata on by default?

2014-10-20 Thread David Sterba
On Thu, Oct 16, 2014 at 01:33:37PM +0200, David Sterba wrote:
> I'd like to make it default with the 3.17 release of btrfs-progs.
> Please let me know if you have objections.

For the record, 3.17 will not change the defaults. The timing of the
poll was very bad to get enough feedback before the release. Let's keep
it open for now.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unexplainable corruptions 3.17.0

2014-10-20 Thread Tomasz Torcz
On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote:
> 
> Does scrub work for you?
> 

  Scrub ended with not errors:
scrub status for a4f339d4-c129-4485-acc1-1233d29c665d
scrub started at Fri Oct 17 10:04:24 2014 and finished after 31992 
seconds
total bytes scrubbed: 6.03TiB with 0 errors

I guess I'll have to check the patch Marc pointed out.

-- 
Tomasz Torcz   "Never underestimate the bandwidth of a station
xmpp: zdzich...@chrome.plwagon filled with backup tapes." -- Jim Gray

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] Btrfs for stable (mostly 3.17)

2014-10-20 Thread Chris Mason



On Sun, Oct 19, 2014 at 7:13 PM, Greg KH  
wrote:

On Sun, Oct 19, 2014 at 09:55:11PM +0200, Greg KH wrote:

 On Sun, Oct 19, 2014 at 06:01:16AM -0400, Chris Mason wrote:
 > Hi everyone,
 >
 > I've pulled out some of the btrfs commits from the merge window 
that
 > we'd like to see in stable.  The full list of sha's from Linus is 
below,

 > you can see 4 of them are only needed on 3.17
 >
 > 2fad4e83e12591eb3bd213875b9edc2d18e93383
 > 0b4699dcb65c2cff793210b07f40b98c2d423a43 # v3.17
 > 12b894cb288d57292b01cf158177b6d5c89a6272
 > 78a017a2c92df9b571db0a55a016280f9019c65e
 > 4d1a40c66bed0b3fa43b9da5fbd5cbe332e4eccf
 > e6c4efd87ab04e5ead363f24e6ac35ed3506d401 # v3.17
 > f6acfd50110b335c7af636cf1fc8e55319cae5fc
 > 1d52c78afbbf80b58299e076a159617d6b42fe3c
 > 75bfb9aff45e44625260f52a5fd581b92ace3e62
 > bbe9051441effce51c9a533d2c56440df64db2d7
 > 32be3a1ac6d09576c57063c6c350ca36eaebdbd3 # v3.17
 > 42383020beb1cfb05f5d330cc311931bc4917a97
 > d37973082b453ba6b89ec07eb7b84305895d35e1 # v3.17

 I'm confused, the others not marked with a "# v3.17" need to go on 
older

 kernels as well?


I've picked up the ones that apply and build for the older stable
kernels I maintain now, thanks for the list.


Sorry I wasn't clear.  The other unmarked ones should go back to the 
older kernels as well.


-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Compressed size of a filesystem

2014-10-20 Thread Suman Chakravartula

Hi,

I'd like to calculate the compressed size of a btrfs filesystem. I read 
the wiki and understand the backward compatibility issues mentioned. 
Also, the "df before and after" method doesn't work for me and I have no 
control over when end users are writing to the filesystem.


Wiki says there's a patch which is not merged. Is that refering to this 
discussion:


http://comments.gmane.org/gmane.comp.file-systems.btrfs/14942

Why was the patch not merged?

--
Suman Chakravartula
Rockstor, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unexplainable corruptions 3.17.0

2014-10-20 Thread Rich Freeman
On Mon, Oct 20, 2014 at 10:04 AM, Zygo Blaxell  wrote:
> On Fri, Oct 17, 2014 at 08:17:37AM +, Hugo Mills wrote: > On Fri, Oct 17, 
> 2014 at 10:10:09AM +0200, Tomasz Torcz wrote:
>> > On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote:
>> > > >   Recently I've observed some corruptions to systemd's journal
>> > > > files which are somewhat puzzling. This is especially worrying
>> > > > as this is btrfs raid1 setup and I expected auto-healing.
>> > > >
>> > > >   System details: 3.17.0-301.fc21.x86_64
>> > > > btrfs: raid1 over 2x dm-crypted 6TB HDDs.
>> > > > mount opts: rw,relatime,seclabel,compress=lzo,space_cache
>> > > >   Reads with cat, hexdump fails with:
>> > > > read(4, 0x1001000, 65536)   = -1 EIO (Input/output error)
>> > > >
>> > > Does scrub work for you?
>> >
>> >   As there seem to be no way to scrub individual files, I've started
>> > scrub of full volume.  It will take some hours to finish.
>> >
>> >   Meanwhile, could you satisfy my curiosity what would scrub do that
>> > wouldn't be done by just reading the whole file?
>>
>>It checks both copies. Reading the file will only read one of the
>> copies of any given block (so if that's good and the other copy is
>> bad, it won't fix anything).
>
> Really?  One of my earliest btrfs tests was to run a loop of 'sha1sum
> -c' on a gigabyte or two of files in one window while I used dd to
> write random data in random locations directly to one of the filesystem
> mirror partitions in the other.  I did this test *specifically* to
> watch the automatic checksumming and self-healing features of btrfs
> in action.  A complete 'sha1sum' verification of the filesystem contents
> passed even though the kernel log was showing checksum errors scrolling
> by faster than I could read, which strongly implies that read() normally
> does check both mirrors before returning EIO.

I think you misread the earlier post.  It sounds like the algorithm is:
1.  Receive request to read block from file.
2.  Determine which mirrored block to read it from (it sounds like
this is sub-optimal today, presumably you'd want to use the least busy
disk or disk with the head closest to the right cylinder to do it).
3.  Read the block.  Verify the checksum.  If it matches return the data.
4.  If not find another mirrored block to read it from if one exists.
Verify the checksum.  If it matches return the data and update all
other mirrored copies with it.
5.  Repeat step 4 until you run out of mirrored copies.  If so, return an error.

So, doing random reads will NOT be equivalent to scrubbing the disks,
because with a scrub you want to check that ALL copies are code, and
the algorithm above only determines that any copy is good.

When you used dd to overwrite blocks, you didn't get errors because
when the first copy failed the filesystem just read the second copy as
intended.  That isn't a scrub - it is a recovery.

An actual scrub isn't file-focused, but device focused.  It starts
reading at the start of the device, and verifies each logical unit of
data sequentially.  This can be done asynchronously since btrfs stores
checksums, as opposed to a traditional RAID where the reads need to be
synchronous since the validity of a mirror/stripe can only be
ascertained by comparing it to all the other devices in that
mirror/stripe (and then unless you're using something like RAID6+ you
couldn't determine which copy is bad without a checksum).  In theory
I'd expect a scrub with btrfs to be less detrimental to performance as
a result - a read request could halt the scrub on one device without
delaying the scrub on the other devices.  Writes in RAID1 mode
necessarily disrupt two devices, but others would not be impacted.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unexplainable corruptions 3.17.0

2014-10-20 Thread Zygo Blaxell
On Fri, Oct 17, 2014 at 08:17:37AM +, Hugo Mills wrote: > On Fri, Oct 17, 
2014 at 10:10:09AM +0200, Tomasz Torcz wrote:
> > On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote:
> > > >   Recently I've observed some corruptions to systemd's journal
> > > > files which are somewhat puzzling. This is especially worrying
> > > > as this is btrfs raid1 setup and I expected auto-healing.
> > > > 
> > > >   System details: 3.17.0-301.fc21.x86_64
> > > > btrfs: raid1 over 2x dm-crypted 6TB HDDs.
> > > > mount opts: rw,relatime,seclabel,compress=lzo,space_cache
> > > >   Reads with cat, hexdump fails with:
> > > > read(4, 0x1001000, 65536)   = -1 EIO (Input/output error)
> > > > 
> > > Does scrub work for you?
> > 
> >   As there seem to be no way to scrub individual files, I've started
> > scrub of full volume.  It will take some hours to finish.
> > 
> >   Meanwhile, could you satisfy my curiosity what would scrub do that
> > wouldn't be done by just reading the whole file?
> 
>It checks both copies. Reading the file will only read one of the
> copies of any given block (so if that's good and the other copy is
> bad, it won't fix anything).

Really?  One of my earliest btrfs tests was to run a loop of 'sha1sum
-c' on a gigabyte or two of files in one window while I used dd to
write random data in random locations directly to one of the filesystem
mirror partitions in the other.  I did this test *specifically* to
watch the automatic checksumming and self-healing features of btrfs
in action.  A complete 'sha1sum' verification of the filesystem contents
passed even though the kernel log was showing checksum errors scrolling
by faster than I could read, which strongly implies that read() normally
does check both mirrors before returning EIO.  This was on kernel version
3.12.21 or so, so it should be working on 3.17 too.

Thomasz reports using 'nocow', which breaks the data integrity checks.
I'd expect the read() to return success and provide garbage data, but the
observed behavior is EIO instead.  The underlying device doesn't seem
to be generating the I/O errors, so it's probably metadata corruption
of some kind.  Are there btrfs kernel messages in dmesg?



signature.asc
Description: Digital signature


Re: strange 3.16.3 problem

2014-10-20 Thread Austin S Hemmelgarn

On 2014-10-20 09:02, Zygo Blaxell wrote:

On Mon, Oct 20, 2014 at 04:38:28AM +, Duncan wrote:

Russell Coker posted on Sat, 18 Oct 2014 14:54:19 +1100 as excerpted:


# find . -name "*546"
./1412233213.M638209P10546 # ls -l ./1412233213.M638209P10546 ls: cannot
access ./1412233213.M638209P10546: No such file or directory


Does your mail server do a lot of renames?  Is one perhaps stuck?  If so,
that sounds like the same thing "Zygo Blaxell" is reporting in the
"3.16.3..3.17.1 hang in renameat2()" thread, OP on Sun, 19 Oct 2014
15:25:26 -400, Msg-ID: <20141019192525.ga29...@hungrycats.org>, as linked
here:



I pointed him at this thread too.  I hadn't seen you mention a hung
rename, but the other symptoms sound similar.


Not really.  It looks like Russell having a NFS client-side problem,
I'm having a server-side one (maybe).  Also, all Russell's system calls
seem to be returning promptly, while some of mine are not.  Even if
there were timeouts, an NFS server timeout gives a different error than
'No such file or directory'.  Finally, the one and only thing I _can_
do with my bug is 'ls' on the renamed files (for me, the find would get
stuck before returning any output).

For Russell's issue...most of the stuff I can think of has been
tried already.  I didn't see if there was any attempt try to ls the
file from the NFS server as well as the client side.  If ls is OK on
the server but not the client, it's an NFS issue (possibly interacting
with some btrfs-specific quirk); otherwise, it's likely a corrupted
filesystem (mail servers seem to be unusually good at making these).

Most of the I/O time on mail servers tends to land in the fsync() system
call, and some nasty fsync() btrfs bugs were fixed in 3.17 (i.e. after
3.16, and not in the 3.16.x stable update for x <= 5 (the last one
I've checked)).  That said, I'm not familiar with how fsync() translates
over NFS, so it might not be relevant after all.

If the NFS server's view of the filesystem is OK, check the NFS protocol
version from /proc/mounts on the client.  Sometimes NFS clients will
get some transient network error during connection and fall back to some
earlier (and potentially buggier) NFS version.  I've seen very different
behavior in some important corner cases from v4 and v3 clients, for
example, and if the client is falling all the way back to v2 the bugs
and their workarounds start to get just plain _weird_ (e.g. filenames
which produce specific values from some hash function or that contain
specific character sequences are unusable).  v2 is so old it may even
have issues with 64-bit inode numbers.

Just now saw this thread, but IIRC 'No such file or directory' also gets 
returned sometimes when trying to automount a share that can't be 
enumerated by the client, and also sometimes when there is a stale NFS 
file handle.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: strange 3.16.3 problem

2014-10-20 Thread Zygo Blaxell
On Mon, Oct 20, 2014 at 04:38:28AM +, Duncan wrote:
> Russell Coker posted on Sat, 18 Oct 2014 14:54:19 +1100 as excerpted:
> 
> > # find . -name "*546"
> > ./1412233213.M638209P10546 # ls -l ./1412233213.M638209P10546 ls: cannot
> > access ./1412233213.M638209P10546: No such file or directory
> 
> Does your mail server do a lot of renames?  Is one perhaps stuck?  If so, 
> that sounds like the same thing "Zygo Blaxell" is reporting in the 
> "3.16.3..3.17.1 hang in renameat2()" thread, OP on Sun, 19 Oct 2014 
> 15:25:26 -400, Msg-ID: <20141019192525.ga29...@hungrycats.org>, as linked 
> here:
> 
> 
> 
> I pointed him at this thread too.  I hadn't seen you mention a hung 
> rename, but the other symptoms sound similar.

Not really.  It looks like Russell having a NFS client-side problem,
I'm having a server-side one (maybe).  Also, all Russell's system calls
seem to be returning promptly, while some of mine are not.  Even if
there were timeouts, an NFS server timeout gives a different error than
'No such file or directory'.  Finally, the one and only thing I _can_
do with my bug is 'ls' on the renamed files (for me, the find would get
stuck before returning any output).

For Russell's issue...most of the stuff I can think of has been
tried already.  I didn't see if there was any attempt try to ls the
file from the NFS server as well as the client side.  If ls is OK on
the server but not the client, it's an NFS issue (possibly interacting
with some btrfs-specific quirk); otherwise, it's likely a corrupted
filesystem (mail servers seem to be unusually good at making these).

Most of the I/O time on mail servers tends to land in the fsync() system
call, and some nasty fsync() btrfs bugs were fixed in 3.17 (i.e. after
3.16, and not in the 3.16.x stable update for x <= 5 (the last one
I've checked)).  That said, I'm not familiar with how fsync() translates
over NFS, so it might not be relevant after all.

If the NFS server's view of the filesystem is OK, check the NFS protocol
version from /proc/mounts on the client.  Sometimes NFS clients will
get some transient network error during connection and fall back to some
earlier (and potentially buggier) NFS version.  I've seen very different
behavior in some important corner cases from v4 and v3 clients, for
example, and if the client is falling all the way back to v2 the bugs
and their workarounds start to get just plain _weird_ (e.g. filenames
which produce specific values from some hash function or that contain
specific character sequences are unusable).  v2 is so old it may even
have issues with 64-bit inode numbers.



signature.asc
Description: Digital signature


Re: [GIT PULL] Btrfs for stable (mostly 3.17)

2014-10-20 Thread Filipe Manana


On 10/20/2014 12:13 AM, Greg KH wrote:
> On Sun, Oct 19, 2014 at 09:55:11PM +0200, Greg KH wrote:
>> On Sun, Oct 19, 2014 at 06:01:16AM -0400, Chris Mason wrote:
>>> Hi everyone,
>>>
>>> I've pulled out some of the btrfs commits from the merge window that
>>> we'd like to see in stable.  The full list of sha's from Linus is below,
>>> you can see 4 of them are only needed on 3.17
>>>
>>> 2fad4e83e12591eb3bd213875b9edc2d18e93383
>>> 0b4699dcb65c2cff793210b07f40b98c2d423a43 # v3.17
>>> 12b894cb288d57292b01cf158177b6d5c89a6272
>>> 78a017a2c92df9b571db0a55a016280f9019c65e
>>> 4d1a40c66bed0b3fa43b9da5fbd5cbe332e4eccf
>>> e6c4efd87ab04e5ead363f24e6ac35ed3506d401 # v3.17
>>> f6acfd50110b335c7af636cf1fc8e55319cae5fc
>>> 1d52c78afbbf80b58299e076a159617d6b42fe3c
>>> 75bfb9aff45e44625260f52a5fd581b92ace3e62
>>> bbe9051441effce51c9a533d2c56440df64db2d7
>>> 32be3a1ac6d09576c57063c6c350ca36eaebdbd3 # v3.17
>>> 42383020beb1cfb05f5d330cc311931bc4917a97
>>> d37973082b453ba6b89ec07eb7b84305895d35e1 # v3.17
>>
>> I'm confused, the others not marked with a "# v3.17" need to go on older
>> kernels as well?
> 
> I've picked up the ones that apply and build for the older stable
> kernels I maintain now, thanks for the list.

May I suggest porting the following commit to 3.14 too?

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=766b5e5ae78dd04a93a275690a49e23d7dcb1f39

It fixes a data corruption issue for an incremental send. Particularly
important, IMHO, as the corruption happens silently (no errors returned
to user space nor any sort of warnings/errors in syslog, etc). It
affects only 3.14, and the change applies cleanly on 3.14.22.

Thanks

> 
> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unexplainable corruptions 3.17.0

2014-10-20 Thread Chris Samuel
On Mon, 20 Oct 2014 10:01:56 AM Marc Dietrich wrote:

> so fixes would be tagged earlier this way and merged automaticly.

I don't think there's a lot automatic about stable, Greg K-H merges patches
into a git tree here:

http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git

As you can see since last night he pulled in a bunch of btrfs fixes into that
based upon what Chris Mason emailed out yesterday.


commit 2792dbfd1e02a70a8eef7e0cc3f44cb77d6c100f
Author: Greg Kroah-Hartman 
Date:   Mon Oct 20 07:08:43 2014 +0800

3.17-stable patches

added patches:

btrfs-add-missing-compression-property-remove-in-btrfs_ioctl_setflags.patch
btrfs-cleanup-error-handling-in-build_backref_tree.patch
btrfs-don-t-do-async-reclaim-during-log-replay.patch
btrfs-don-t-go-readonly-on-existing-qgroup-items.patch
btrfs-fix-a-deadlock-in-btrfs_dev_replace_finishing.patch

btrfs-fix-and-enhance-merge_extent_mapping-to-insert-best-fitted-extent-map.patch
btrfs-fix-build_backref_tree-issue-with-multiple-shared-blocks.patch
btrfs-fix-race-in-wait_sync-ioctl.patch
btrfs-fix-the-wrong-condition-judgment-about-subset-extent-map.patch
btrfs-fix-up-bounds-checking-in-lseek.patch
btrfs-try-not-to-enospc-on-log-replay.patch
btrfs-wake-up-transaction-thread-from-sync_fs-ioctl.patch
revert-btrfs-race-free-update-of-commit-root-for-ro-snapshots.patch

(there are also a bunch going in for 3.10, 3.14 and 3.16 too)

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: ioctl BTRFS_IOC_FS_INFO and BTRFS_IOC_DEV_INFO miss-matched with slots

2014-10-20 Thread Anand Jain


inline as below.


On 10/17/2014 09:58 AM, Gui Hecheng wrote:

On Thu, 2014-09-04 at 20:02 +0800, Anand Jain wrote:



On 09/04/2014 05:58 PM, David Sterba wrote:

On Mon, Aug 18, 2014 at 04:38:18PM +0800, Anand Jain wrote:

ioctl BTRFS_IOC_FS_INFO return num_devices which does _not_ include seed
device, But the following ioctl BTRFS_IOC_DEV_INFO counts and gets seed
disk when probed. So in the userland we hit a count-slot missmatch
bug..
  get_fs_info()
  ::
  BUG_ON(ndevs >= fi_args->num_devices);
which hits this bug when we have mounted a seed device.

So to fix this problem here in this patch ioctl BTRFS_IOC_FS_INFO
will provide total_devices instead of num_devices.


The ioctl is very unclear what the 'num_device' actually means.


   Right. Thats also true in kernel. very messy. very confusing.
   tool btrfs-devlist would help understand whats going on.


   $ egrep num_device *.c | egrep "total_device"
ioctl.c:fi_args->num_devices = fs_devices->total_devices;
super.c:ret = !(fs_devices->num_devices == 
fs_devices->total_devices);
volumes.c:  total_devices = btrfs_super_num_devices(disk_super);


   By the way about BTRFS_IOC_DEVICES_READY ioctl above its long time
   broken with seed/replace, just waiting to get these patches integrated
   first so to fix it later.



This would fix the problem partly. Partly because ealier num_devices
included the replacing device but now total_device does not include
the replacing device. Getting a count which includes a transient device
is rather too in efficient/wrong indeed, because there can be a race
condition where in the time between ioctl BTRFS_IOC_FS_INFO to
BTRFS_IOC_DEV_INFO the replace device operation might have been
completed. So to fix this problem its better that user land btrfs-progs
probes replacing device (at devid 0) separately.

v2:
Agree with Wang's comment. Its better to show seed disks under the
sprout fs, so that user can establish mapping of seed to sprout devices.

So here I am making BTRFS_IOC_FS_INFO to return the total_devices
which would count the seed devices (but not the replacing device).


This is even more confusing. I think we need to add another member to
the ioctl struct to reflect the number of regular devices (num_devices)
and the true total number of devices including seeding and replaced
devices.


   that will be a better way. thanks.


The difference should be accompanied by a flag that would say
if there's a seeding or replace in progress.

There are some backward compatibility concerns. Setting num_devices to
total_devices changes semantics of the ioctl, so I think it should stay
as is for now,


   As I have tested there is not backward compatibility issue.
   But from semantics perspective .. agreed.


but the BUG_ON can be removed and replaced by code that
reallocates the buffer or allocates a few more items in advance.


We don't know how may seed devices are there for a sprout FS.
So thats not possible.

   Will review  resubmit.

Thanks for commenting.


Hi all,

Firtly, thanks for the fix, Anand, how's the new version going?



I've been testing the btrfs fi show cmd these days and find that
this patch has not been merged into linus's tree yet.

Since the suggested way of adding member to the ioctl struct brings
compatibility issues, it may need more discussion.


 Thanks Gui for commenting.

 Yes discussion is needed. Mainly on our long term plan for a better
 btrfs kernel device and parameter read interface. which also means
 current bugs can wait for this new interface instead of patching the
 old structures and bring in a new compatibility mess.

Thanks, Anand




But since this fix really incluence much to the user, I consider merging
this version first to be a good idea.

What do you think, Chris?

Thanks,
Gui


Anand


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unexplainable corruptions 3.17.0

2014-10-20 Thread Marc Dietrich
Am Samstag, 18. Oktober 2014, 18:32:49 schrieb Chris Samuel:
> On Fri, 17 Oct 2014 02:09:30 PM Rich Freeman wrote:
> > Just for clarity - when can we expect to see these in the kernel?
> 
> The stable kernel rules say:
> 
> https://www.kernel.org/doc/Documentation/stable_kernel_rules.txt
> 
> #  - It or an equivalent fix must already exist in Linus' tree (upstream).
> 
> So until Linus merges the revert into the mainline kernel it cannot go into
> a stable release, and he's not merged it yet.

it also says a few lines below:

- To have the patch automatically included in the stable tree, add the tag
 Cc: sta...@vger.kernel.org
   in the sign-off area. Once the patch is merged it will be applied to
   the stable tree without anything else needing to be done by the author
   or subsystem maintainer.

so fixes would be tagged earlier this way and merged automaticly.

Marc


signature.asc
Description: This is a digitally signed message part.