from:"Vincent Olivier"

Re: btrfs check help

2015-11-27 Thread Vincent Olivier


> On Nov 26, 2015, at 10:03 PM, Vincent Olivier <vinc...@up4.com> wrote:
> 
>> 
>> On Nov 25, 2015, at 8:44 PM, Qu Wenruo <quwen...@cn.fujitsu.com> wrote:
>> 
>> 
>> 
>> Vincent Olivier wrote on 2015/11/25 11:51 -0500:
>>> I should probably point out that there is 64GB of RAM on this machine and 
>>> it’s a dual Xeon processor (LGA2011-3) system. Also, there is only Btrfs 
>>> served via Samba and the kernel panic was caused Btrfs (as per what I 
>>> remember from the log on the screen just before I rebooted) and happened in 
>>> the middle of the night when zero (0) client was connected.
>>> 
>>> You will find below the full “btrfs check” log for each device in the order 
>>> it is listed by “btrfs fi show”.
>> 
>> There is really no need to do such thing, as btrfs is able to manage 
>> multiple device, calling btrfsck on any of them is enough as long as it's 
>> not hugely damaged.
>> 
>>> 
>>> Ca I get a strong confirmation that I should run with the “—repair” option 
>>> on each device? Thanks.
>> 
>> YES.
>> 
>> Inode nbytes fix is *VERY* safe as long as it's the only error.
>> 
>> Although it's not that convincing since the inode nbytes fix code is written 
>> by myself and authors always tend to believe their codes are good
>> But at least, some other users with more complicated problem(with inode 
>> nbytes error) fixed it.
>> 
>> The last decision is still on you anyway.
> 
> I will do it on the first device from the “fi show” output and report.


ok this doesn’t look good. i ran —repair and check again and it looks even 
worse. please help.


[root@3dcpc5 ~]# btrfs check --repair /dev/sdk
enabling repair mode
Checking filesystem on /dev/sdk
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents
Fixed 0 roots.
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
reset nbytes for ino 1341670 root 5
reset nbytes for ino 1341670 root 11406
warning line 3653
checking csums
checking root refs
found 19343374874998 bytes used err is 0
total csum bytes: 18863243900
total tree bytes: 27413118976
total fs tree bytes: 4455694336
total extent tree bytes: 3077373952
btree space waste bytes: 2882193883
file data blocks allocated: 19461564538880
 referenced 20155355832320





root@3dcpc5 ~]# btrfs check /dev/sdk
Checking filesystem on /dev/sdk
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents
checking free space cache
block group 53328591454208 has wrong amount of free space
failed to load free space cache for block group 53328591454208
block group 53329665196032 has wrong amount of free space
failed to load free space cache for block group 53329665196032
Wanted offset 58836887044096, found 58836887011328
Wanted offset 58836887044096, found 58836887011328
cache appears valid but isnt 58836887011328
Wanted offset 60505481887744, found 60505481805824
Wanted offset 60505481887744, found 60505481805824
cache appears valid but isnt 60505481805824
Wanted bytes 16384, found 81920 for off 60979001966592
Wanted bytes 1073725440, found 81920 for off 60979001966592
cache appears valid but isnt 60979001950208
Wanted offset 61297908056064, found 61297908006912
Wanted offset 61297908056064, found 61297908006912
cache appears valid but isnt 61297903271936
Wanted bytes 32768, found 16384 for off 61711301296128
Wanted bytes 1066319872, found 16384 for off 61711301296128
cache appears valid but isnt 61711293874176
There is no free space entry for 62691824041984-62691824058368
There is no free space entry for 62691824041984-62692693901312
cache appears valid but isnt 62691620159488
There is no free space entry for 63723505205248-63723505221632
There is no free space entry for 63723505205248-63724559794176
cache appears valid but isnt 63723486052352
Wanted bytes 32768, found 16384 for off 64746920902656
Wanted bytes 914849792, found 16384 for off 64746920902656
cache appears valid but isnt 64746762010624
There is no free space entry for 65770368401408-65770368434176
There is no free space entry for 65770368401408-6577710720
cache appears valid but isnt 65770037968896
Wanted offset 66758954270720, found 66758954221568
Wanted offset 66758954270720, found 66758954221568
cache appears valid but isnt 66758954188800
block group 70204591702016 has wrong amount of free space
failed to load free space cache for block group 70204591702016
block group 70205665443840 has wrong amount of free space
failed to load free space cache for block group 70205665443840
block group 70206739185664 has wrong amount of free space
failed to load free space cache for block group 70206739185664
Wanted offset 70216543715328, found 70216543698944
Wanted offset 70216543715328, found 70216543698944
cache appears valid b

Re: btrfs check help

2015-11-26 Thread Vincent Olivier


> On Nov 25, 2015, at 8:44 PM, Qu Wenruo <quwen...@cn.fujitsu.com> wrote:
> 
> 
> 
> Vincent Olivier wrote on 2015/11/25 11:51 -0500:
>> I should probably point out that there is 64GB of RAM on this machine and 
>> it’s a dual Xeon processor (LGA2011-3) system. Also, there is only Btrfs 
>> served via Samba and the kernel panic was caused Btrfs (as per what I 
>> remember from the log on the screen just before I rebooted) and happened in 
>> the middle of the night when zero (0) client was connected.
>> 
>> You will find below the full “btrfs check” log for each device in the order 
>> it is listed by “btrfs fi show”.
> 
> There is really no need to do such thing, as btrfs is able to manage multiple 
> device, calling btrfsck on any of them is enough as long as it's not hugely 
> damaged.
> 
>> 
>> Ca I get a strong confirmation that I should run with the “—repair” option 
>> on each device? Thanks.
> 
> YES.
> 
> Inode nbytes fix is *VERY* safe as long as it's the only error.
> 
> Although it's not that convincing since the inode nbytes fix code is written 
> by myself and authors always tend to believe their codes are good
> But at least, some other users with more complicated problem(with inode 
> nbytes error) fixed it.
> 
> The last decision is still on you anyway.

I will do it on the first device from the “fi show” output and report.

Thanks,

Vincent

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs check help

2015-11-25 Thread Vincent Olivier

: 2881050910
file data blocks allocated: 19445786390528
referenced 20138885959680
Checking filesystem on /dev/sdn
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents [O]
checking free space cache [.]
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong

found 19328980191604 bytes used err is 1
total csum bytes: 18849205856
total tree bytes: 27393392640
total fs tree bytes: 4452958208
total extent tree bytes: 3075571712
btree space waste bytes: 2881050910
file data blocks allocated: 19445786390528
referenced 20138885959680
Checking filesystem on /dev/sdl
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents [o]
checking free space cache [o]
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong

found 19328980191604 bytes used err is 1
total csum bytes: 18849205856
total tree bytes: 27393392640
total fs tree bytes: 4452958208
total extent tree bytes: 3075571712
btree space waste bytes: 2881050910
file data blocks allocated: 19445786390528
referenced 20138885959680
Checking filesystem on /dev/sdc
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents [O]
checking free space cache [.]
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong

found 19328980191604 bytes used err is 1
total csum bytes: 18849205856
total tree bytes: 27393392640
total fs tree bytes: 4452958208
total extent tree bytes: 3075571712
btree space waste bytes: 2881050910
file data blocks allocated: 19445786390528
referenced 20138885959680
Checking filesystem on /dev/sdr
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents [O]
checking free space cache [.]
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong

found 19328980191604 bytes used err is 1
total csum bytes: 18849205856
total tree bytes: 27393392640
total fs tree bytes: 4452958208
total extent tree bytes: 3075571712
btree space waste bytes: 2881050910
file data blocks allocated: 19445786390528
referenced 20138885959680
Checking filesystem on /dev/sdf
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents [o]
checking free space cache [o]
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong

found 19328980191604 bytes used err is 1
total csum bytes: 18849205856
total tree bytes: 27393392640
total fs tree bytes: 4452958208
total extent tree bytes: 3075571712
btree space waste bytes: 2881050910
file data blocks allocated: 19445786390528
referenced 20138885959680
Checking filesystem on /dev/sde
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents [.]
checking free space cache [.]
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong

found 19328980191604 bytes used err is 1
total csum bytes: 18849205856
total tree bytes: 27393392640
total fs tree bytes: 4452958208
total extent tree bytes: 3075571712
btree space waste bytes: 2881050910
file data blocks allocated: 19445786390528
referenced 20138885959680
Checking filesystem on /dev/sdd
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents [o]
checking free space cache [.]
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong

found 19328980191604 bytes used err is 1
total csum bytes: 18849205856
total tree bytes: 27393392640
total fs tree bytes: 4452958208
total extent tree bytes: 3075571712
btree space waste bytes: 2881050910
file data blocks allocated: 19445786390528
referenced 20138885959680
Checking filesystem on /dev/sdb
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents [o]
checking free space cache [.]
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong

found 19328980191604 bytes used err is 1
total csum bytes: 18849205856
total tree bytes: 27393392640
total fs tree bytes: 4452958208
total extent tree bytes: 3075571712
btree space waste bytes: 2881050910
file data blocks allocated: 19445786390528
referenced 20138885959680



> On Nov 24, 2015, at 3:32 PM, Hugo Mills <h...@carfax.org.uk> wrote:
> 
> On Tue, Nov 24, 2015 at 03:28:28PM -0500, Austin S Hemmelgarn wrote:
>> On 2015-11-24 12:06, Vincent Olivier wrote:
>>> Hi,
>>> 
>>> Woke up this morning with a kernel panic (for which I do not have details). 
>>> Please find below the output for btrfs check. Is this normal ? What should 
>>> I do ? Arch Linux 4.2.5. Btrfs-utils 4.3.1. 17x4TB RAID10.
>> You get bonus points for being on a reasonably up-to-date kernel and
>> userspace :)
>> 
>> This is actually a pretty tame check result for a filesystem that's
>> been through kernel panic. I think everything listed here is safe
>> for check to fix, but I would suggest waiting until the devs provide
>> opinions before actually running with --repair.  I would also
>> suggest comparing resul

btrfs check help

2015-11-24 Thread Vincent Olivier

Hi,

Woke up this morning with a kernel panic (for which I do not have details). 
Please find below the output for btrfs check. Is this normal ? What should I do 
? Arch Linux 4.2.5. Btrfs-utils 4.3.1. 17x4TB RAID10.

Regards,

Vincent

[root@3dcpc5 ~]# btrfs check /dev/sdk
Checking filesystem on /dev/sdk
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents
checking free space cache
checking fs roots
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong
found 19328809638262 bytes used err is 1
total csum bytes: 18849042724
total tree bytes: 27389886464
total fs tree bytes: 4449746944
total extent tree bytes: 3075457024
btree space waste bytes: 2880474254
file data blocks allocated: 19430708535296
referenced 20123773407232--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: FYIO: A rant about btrfs

2015-09-16 Thread Vincent Olivier

Hi,

> On Sep 16, 2015, at 11:20 AM, Austin S Hemmelgarn  
> wrote:
> 
> On 2015-09-16 10:43, M G Berberich wrote:
>> Hello,
>> 
>> just for information. I stumbled about a rant about btrfs-performance:
>> 
>> http://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp

I read it too.

> It is worth noting a few things that were done incorrectly in this testing:
> 1. _NEVER_ turn off write barriers (nobarrier mount option), doing so subtly 
> breaks the data integrity guarantees of _ALL_ filesystems, but especially so 
> on COW filesystems like BTRFS.  With this off, you will have a much higher 
> chance that a power loss will cause data loss.  It shouldn't be turned off 
> unless you are also turning off write-caching in the hardware or know for 
> certain that no write-reordering is done by the hardware (and almost all 
> modern hardware does write-reordering for performance reasons).

But can the “nobarrier” mount option affect performances negatively for Btrfs 
(and not only data integrity)?

> 2. He provides no comparison of any other filesystem with TRIM support turned 
> on (it is very likely that all filesystems will demonstrate such performance 
> drops.  Based on that graph, it looks like the device doesn't support 
> asynchronous trim commands).

I think he means by the text surrounding the only graph that mentions TRIM that 
this exact same test on the other filesystems he benchmarked yield much better 
results.

> 3. He's testing it for a workload is a known and documented problem for 
> BTRFS, and claiming that that means that it isn't worth considering as a 
> general usage filesystem.  Most people don't run RDBMS servers on their 
> systems, and as such, such a workload is not worth considering for most 
> people.

Apparently RDBMS being a problem on Btrfs is neither known nor documented 
enough (he’s right about the contrast with claiming publicly that Btrfs is 
indeed production ready).

My view on this is that having one filesystem to rule them all (all storage 
technologies, all use cases) is unrealistic. Also the time when you could put 
your NAS on an old i386 with 3MB of RAM is over. Compression, checksumming, 
COW, snapshotting, quotas, etc. are all computationally intensive features. In 
2015 block storage has become computationally intensive. How about saying 
non-root Btrfs RAID10 is the best choice for a Samba NAS on rotational-HDDs 
with no SMR (my use case)? For root and RDBMS, I use ext4 on a M.2 SSD and with 
a sane initramfs and the most recent stable kernel. I am happy with the 
performances and delighted with the features Btrfs provides. I think it is much 
more productive to document and compare the most successful Btrfs deployments 
rather than trying to find bugs and bottlenecks for use cases that are the 
development focus of other filesystems. I don’t know, I might not make a lot of 
sense here but on top of refactoring the Gotchas, I would be happy to start a 
successful deployment story section on the wiki and maybe improve my usage of 
Btrfs along the way (who else here is using Btrfs in a similar fashion?).

> His points about the degree of performance jitter are valid however, as are 
> the complaints of apparent CPU intensive stalls in the BTRFS code, and I 
> occasionally see both on my own systems.

Me too. My two cents is that focusing on improving performances for 
Btrfs-optimal use cases is much more interesting than bringing new features 
like automatically turning COW off for RDBMS usage or debugging TRIM support.

Vincent

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: FYIO: A rant about btrfs

2015-09-16 Thread Vincent Olivier


> On Sep 16, 2015, at 2:22 PM, Austin S Hemmelgarn <ahferro...@gmail.com> wrote:
> 
> On 2015-09-16 12:51, Vincent Olivier wrote:
>> Hi,
>> 
>> 
>>> On Sep 16, 2015, at 11:20 AM, Austin S Hemmelgarn <ahferro...@gmail.com> 
>>> wrote:
>>> 
>>> On 2015-09-16 10:43, M G Berberich wrote:
>>>> Hello,
>>>> 
>>>> just for information. I stumbled about a rant about btrfs-performance:
>>>> 
>>>>  http://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp
>> I read it too.
>>> It is worth noting a few things that were done incorrectly in this testing:
>>> 1. _NEVER_ turn off write barriers (nobarrier mount option), doing so 
>>> subtly breaks the data integrity guarantees of _ALL_ filesystems, but 
>>> especially so on COW filesystems like BTRFS.  With this off, you will have 
>>> a much higher chance that a power loss will cause data loss.  It shouldn't 
>>> be turned off unless you are also turning off write-caching in the hardware 
>>> or know for certain that no write-reordering is done by the hardware (and 
>>> almost all modern hardware does write-reordering for performance reasons).
>> But can the “nobarrier” mount option affect performances negatively for 
>> Btrfs (and not only data integrity)?
> Using it improves performance for every filesystem on Linux that supports it. 
>  This does not mean that it is _EVER_ a good idea to do so.  This mount 
> option is one of the few things on my list of things that I will _NEVER_ 
> personally provide support to people for, because it almost guarantees that 
> you will lose data if the system dies unexpectedly (even if it's for a reason 
> other than power loss).



OK fine. Let it be clearer then (on the Btrfs wiki): nobarrier is an absolute 
no go. Case closed.



>>> 2. He provides no comparison of any other filesystem with TRIM support 
>>> turned on (it is very likely that all filesystems will demonstrate such 
>>> performance drops.  Based on that graph, it looks like the device doesn't 
>>> support asynchronous trim commands).
>> I think he means by the text surrounding the only graph that mentions TRIM 
>> that this exact same test on the other filesystems he benchmarked yield much 
>> better results.
> Possibly, but there are also known issues with TRIM/DISCARD on BTRFS in 4.0.  
> And his claim is still baseless unless he actually provides reference for it.



Same as above: TRIM/DISCARD officially not recommended in production until 
further notice?



>>> 3. He's testing it for a workload is a known and documented problem for 
>>> BTRFS, and claiming that that means that it isn't worth considering as a 
>>> general usage filesystem.  Most people don't run RDBMS servers on their 
>>> systems, and as such, such a workload is not worth considering for most 
>>> people.
>> Apparently RDBMS being a problem on Btrfs is neither known nor documented 
>> enough (he’s right about the contrast with claiming publicly that Btrfs is 
>> indeed production ready).
> OK, maybe not documented, but RDBMS falls under 'Large files with highly 
> random access patterns and heavy RMW usage', which is a known issue for 
> BTRFS, and also applies to VM images.


This guy is no idiot. If it wasn’t clear enough for him. It’s not clear enough 
period.


>>> His points about the degree of performance jitter are valid however, as are 
>>> the complaints of apparent CPU intensive stalls in the BTRFS code, and I 
>>> occasionally see both on my own systems.
>> Me too. My two cents is that focusing on improving performances for 
>> Btrfs-optimal use cases is much more interesting than bringing new features 
>> like automatically turning COW off for RDBMS usage or debugging TRIM support.
> It depends, BTRFS is still not feature complete with the overall intent when 
> it was started (raid56 and qgroups being the two big issues at the moment), 
> and attempting to optimize things tends to introduce bugs, which we have 
> quite enough of already without people adding more (and they still seem to be 
> breeding like rabbits).



I would just like a clear statement from a dev-lead saying : until we are 
feature-complete (with a finite list of features to complete) the focus will be 
on feature-completion and not optimizing already-implemented features. Ideally 
with an ETA on when optimization will be more of a priority than it is today.



> That said, my systems (which are usually doing mostly CPU or memory bound 
> tasks, and not I/O bound like the aforementioned benchmarks were testing) run 
> no slower than they did with ext4 as t

Re: errors while "btrfs receive"

2015-09-01 Thread Vincent Olivier

It's ~900GiB. Sorry.

I'm on 4.1.6 Arch Linux.

-Original Message-
From: "Duncan" <1i5t5.dun...@cox.net>
Sent: Tuesday, September 1, 2015 03:30
To: linux-btrfs@vger.kernel.org
Subject: Re: errors while "btrfs receive"

Vincent Olivier posted on Mon, 31 Aug 2015 14:34:02 -0400 as excerpted:

> i'm doing a ~900TiB receive on a 6x4TB RAID0
> 
> "fi show", "device scan" all fail and report "unable to connect to
> /dev/sdX"
> 
> is it normal ?

Can't answer your direct question as my use-case doesn't include send/
receive, but...

~900TiB receive on a 24TB (6x4TB) raid0?  

Did you mean ~900GiB, or did you miss the decimal point, or   I mean, 
yeah, btrfs does have the compress= option, but I don't think it's going 
to compress 900 TiB to fit in under 24!  That's likely to be some 
seriously lossy compression!

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

errors while "btrfs receive"

2015-08-31 Thread Vincent Olivier

hi,

i'm doing a ~900TiB receive on a 6x4TB RAID0

"fi show", "device scan" all fail and report "unable to connect to /dev/sdX"

is it normal ?

thanks,

Vincent

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Response to Bcachefs Claims

2015-08-26 Thread Vincent Olivier

I'm still parsing through the multi-device advices. Will be back on this when 
I'm done. And I'll probably switch distro to Archlinux which seems the way to 
go if one is using cutting-edge kernel features like Btrfs.

As for the work on Gotchas/Known Issues on the Btrfs wiki, I also think that 
the best way is to start with the Gotchas page and put a more prominent link 
for it on the home page.

I would restructure the Gotchas page in the following ways :

* Add the mention Known Issues close to the title;

* Only keep current issues/gotchas (current stable kernel, current userspace 
utilities release), archive every other on a separate page;

* Group by smallest feature encompassing the issue : multi-device, quotas, 
subvolumes, compression, conversion from extX, interactions with other things 
like LVM, MD, encryption, etc.;

* Something that is new and not as thoroughly tested as other features should 
be listed there as well (I think) until there is a consensus (to be defined) on 
it being reliable enough to be taken out of the list. Or maybe that should go 
on another page ?

* Provide links to HOWTOs or best practices for the features discussed 
(multi-device GOTCHAS should link to multi-device HOWTO).

I will be thinking about it more before doing anything and still welcoming 
ideas.

Thanks !

Vincent

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Response to Bcachefs Claims

2015-08-25 Thread Vincent Olivier

Hi,

I have been using Btrfs for almost a year now with a 16x4TB RAID10 and its 
8x4TB RAID0 backup (using incremental snapshots diffs). I have always tried to 
stay at the latest stable kernel (currently 4.1.6). But I might be moving to 
Fedora 22 because Centos 7 has significant incompatibilities with the 4.1.x 
kernel series.

I have seen the news about Bcachefs aiming to be Btrfs-complete while being 
extX-stable.

What are the chances Bcachefs beats Btrfs at being the Linux kernel's next 
official file system ? I chose Btrfs over ZFS because it seemed like the only 
next-gen heir to ext4/xfs.

I have been having a few problems with Btrfs myself. I have only one that 
remains unresolved : I haven't found the best way to mount Btrfs at boot time. 
LABEL= won't work for known reasons (I don't understand however why a mount 
can't do its own device scan transparently). UUID= won't work for unknown 
reasons (haven't got a reply on this, maybe it's the same as LABEL=). And I 
will use /dev/* in fstab for stability reasons. Right now I'm mounting the fs 
manually after a device scan and picking up the first device that shows up in 
the fi show run. I can live with that but I suppose that things like this 
contribute to the feeling that Btrfs is actually still experimental contrarily 
to claims that it is production-ready.

For my own sake and other's I would like to maintain (if nobody is already 
working on that nor needs any help) a centralized human-readable digest of 
known issues that would be featured prominently on top of the Btrfs wiki. I 
would merge the Gotchas page and the various known issues pages (including the 
various multi-device mount gotchas here and there).

Answers ? Comments ? Help ?

Thanks,

Vincent

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2015-08-18 Thread Vincent Olivier

it appears that it might be related to label/uuid fstab boot mounting instead

when I mount manually without the noauto,x-systemd.automount” options and use 
the first device I get from btrfs fi show after a btrfs device scan I never 
get the problem

does this sound familiar ? I thought I was safe with uuid mount in stab…

I can (temporarily) live with manually mounting this filesystem but I would 
appreciate being able to mount it at boot time via fstab…

thanks

vincent

-Original Message-
From: Vincent Olivier vinc...@up4.com
Sent: Thursday, August 13, 2015 22:42
To: Duncan 1i5t5.dun...@cox.net
Cc: linux-btrfs@vger.kernel.org
Subject: Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

I'll try without autodefrag anyways tomorrow just to make sure.

And then file a bug report too with however it decides to behave.

Vincent

-Original Message-
From: Duncan 1i5t5.dun...@cox.net
Sent: Thursday, August 13, 2015 20:30
To: linux-btrfs@vger.kernel.org
Subject: Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

Chris Murphy posted on Thu, 13 Aug 2015 17:19:41 -0600 as excerpted:

 Well I think others have suggested 3000 snapshots and quite a few things
 will get very slow. But then also you have autodefrag and I forget the
 interaction of this with many snapshots since the snapshot aware defrag
 code was removed.

Autodefrag shouldn't have any snapshots mount-time-related interaction, 
with snapshot-aware-defrag disabled.  The interaction between defrag 
(auto or not) and snapshots will be additional data space usage, since 
with snapshot-aware disabled, defrag only works with the current copy, 
thus forcing it to COW the extents elsewhere while not freeing the old 
extents as they're still referenced by the snapshots, but it shouldn't 
affect mount-time.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2015-08-13 Thread Vincent Olivier

Hi,

I think I might be having this problem too. 12 x 4TB RAID10 (original makefs, 
not converted from ext or whatnot). Says it has ~6TiB left. Centos 7. Dual Xeon 
CPU. 32GB RAM. ELRepo Kernel 4.1.5. Fstab options: 
noatime,autodefrag,compress=zlib,space_cache,nossd,noauto,x-systemd.automount

Sometimes (not all the time) when I cd or ls the mount point it will not return 
within 5 minutes (I never let it run more than 5 minutes before rebooting) and 
I reboot and then it takes between 10-30s. Well as I'm writing this it's 
already been more than 10 minutes.
 
I don't have the problem when I mount manually without the 
noauto,x-systemd.automount options.
 
Can anyone help ?
 
Thanks.

Vincent

-Original Message-
From: Austin S Hemmelgarn ahferro...@gmail.com
Sent: Wednesday, August 5, 2015 07:30
To: John Ettedgui john.etted...@gmail.com
Cc: Qu Wenruo quwen...@cn.fujitsu.com, btrfs linux-btrfs@vger.kernel.org
Subject: Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

On 2015-08-04 13:36, John Ettedgui wrote:
 On Tue, Aug 4, 2015 at 4:28 AM, Austin S Hemmelgarn
 ahferro...@gmail.com wrote:
 On 2015-08-04 00:58, John Ettedgui wrote:

 On Mon, Aug 3, 2015 at 8:01 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:

 Although the best practice is staying away from such converted fs, either
 using pure, newly created btrfs, or convert back to ext* before any
 balance.

 Unfortunately I don't have enough hard drive space to do a clean
 btrfs, so my only way to use btrfs for that partition was a
 conversion.

 If you could get your hands on a decent sized flash drive (32G or more), you
 could do an incremental conversion offline.  The steps would look something
 like this:

 1. Boot the system into a LiveCD or something similar that doesn't need to
 run from your regular root partition (SystemRescueCD would be my personal
 recommendation, although if you go that way, make sure to boot the
 alternative kernel, as it's a lot newer then the standard ones).
 2. Plug in the flash drive, format it as BTRFS.
 3. Mount both your old partition and the flash drive somewhere.
 4. Start copying files from the old partition to the flash drive.
 5. When you hit ENOSPC on the flash drive, unmount the old partition, shrink
 it down to the minimum size possible, and create a new partition in the free
 space produced by doing so.
 6. Add the new partition to the BTRFS filesystem on the flash drive.
 7. Repeat steps 4-6 until you have copied everything.
 8. Wipe the old partition, and add it to the BTRFS filesystem.
 9. Run a full balance on the new BTRFS filesystem.
 10. Delete the partition from step 5 that is closest to the old partition
 (via btrfs device delete), then resize the old partition to fill the space
 that the deleted partition took up.
 11. Repeat steps 9-10 until the only remaining partitions in the new BTRFS
 filesystem are the old one and the flash drive.
 12. Delete the flash drive from the BTRFS filesystem.

 This takes some time and coordination, but it does work reliably as long as
 you are careful (I've done it before on multiple systems).


 I suppose I could do that even without the flash as I have some free
 space anyway, but moving Tbs of data with Gbs of free space will take
 days, plus the repartitioning. It'd probably be easier to start with a
 1Tb drive or something.
 Is this currently my best bet as conversion is not as good as I thought?

 I believe my other 2 partitions also come from conversion, though I
 may have rebuilt them later from scratch.

 Thank you!
 John

Yeah, you're probably better off getting a TB disk and starting with 
that.  In theory it is possible to automate the process, but I would 
advise against that if at all possible, it's a lot easier to recover 
from an error if you're doing it manually.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2015-08-13 Thread Vincent Olivier

I'll try without autodefrag anyways tomorrow just to make sure.

And then file a bug report too with however it decides to behave.

Vincent

-Original Message-
From: Duncan 1i5t5.dun...@cox.net
Sent: Thursday, August 13, 2015 20:30
To: linux-btrfs@vger.kernel.org
Subject: Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

Chris Murphy posted on Thu, 13 Aug 2015 17:19:41 -0600 as excerpted:

 Well I think others have suggested 3000 snapshots and quite a few things
 will get very slow. But then also you have autodefrag and I forget the
 interaction of this with many snapshots since the snapshot aware defrag
 code was removed.

Autodefrag shouldn't have any snapshots mount-time-related interaction, 
with snapshot-aware-defrag disabled.  The interaction between defrag 
(auto or not) and snapshots will be additional data space usage, since 
with snapshot-aware disabled, defrag only works with the current copy, 
thus forcing it to COW the extents elsewhere while not freeing the old 
extents as they're still referenced by the snapshots, but it shouldn't 
affect mount-time.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2015-08-13 Thread Vincent Olivier

I have 2 snapshots a few days apart for incrementally backing up the volume but 
that's it.

I'll try without autodefrag tomorrow.

Vincent

-Original Message-
From: Chris Murphy li...@colorremedies.com
Sent: Thursday, August 13, 2015 19:19
To: Btrfs BTRFS linux-btrfs@vger.kernel.org
Subject: Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

On Thu, Aug 13, 2015 at 4:38 PM, Vincent Olivier vinc...@up4.com wrote:
 Hi,

 I think I might be having this problem too. 12 x 4TB RAID10 (original makefs, 
 not converted from ext or whatnot). Says it has ~6TiB left. Centos 7. Dual 
 Xeon CPU. 32GB RAM. ELRepo Kernel 4.1.5. Fstab options: 
 noatime,autodefrag,compress=zlib,space_cache,nossd,noauto,x-systemd.automount

Well I think others have suggested 3000 snapshots and quite a few
things will get very slow. But then also you have autodefrag and I
forget the interaction of this with many snapshots since the snapshot
aware defrag code was removed.

I'd say file a bug with the full details of the hardware from the
ground up to the Btrfs file system. And include as an attachment,
dmesg with sysrq+t during this hang. Usually I see t asked if
there's just slowness/delays, and w if there's already a kernel
message saying there's a blocked task for 120 seconds.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

systemd : Timed out waiting for defice dev-disk-by…

2015-07-24 Thread Vincent Olivier

Hi,

(Sorry if this gets sent twice : one of my mail relay is misbehaving today)

50% of the time when booting, the system go in safe mode because my 12x 4TB 
RAID10 btrfs is taking too long to mount from fstab.

When I comment it out from fstab and mount it manually, it’s all good.

I don’t like that. Is there a way to increase the timer or something ?

Thanks,

Vincent

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Send/Receive Use Case

2015-07-10 Thread Vincent Olivier

actually I have another question : is it posssible for a RAID0 fs to “receive” 
from a “sending” RAID10 ? or do they need to be of the same replication scheme 
too ?



 On Jun 27, 2015, at 10:34 AM, Vincent Olivier vinc...@up4.com wrote:
 
 ok i’ll go home and rethink my life then ;)
 
 On Jun 27, 2015, at 10:21 AM, Hugo Mills h...@carfax.org.uk wrote:
 
 On Sat, Jun 27, 2015 at 10:04:28AM -0400, Vincent Olivier wrote:
 Hi,
 
 There are 4 things I’m not sure about re:send/receive.
 
 1) Is it possible to first copy things on a file system using rsync and 
 then use send-receive ? And to subsequently mix rsync and send-receive ? 
 Provided that snapshots are made accordingly.
 
  Probably. It depends on exctly how you want to use them.
 
 2) Is possible to “send a snapshot diff to disk and then “receive it from 
 the said disk into a remote filesystem ? I have two very large and 
 physically distant btrfs filesystems. It would be more economical to juste 
 dump snapshot diffs to disk for transport instead of the network.
 
  Yes, that's perfectly possible.
 
 3) How are “conflicts” handled by send-receive if at all ?
 
  There are no conflicts possible, due to the requirement of all the
 subvolumes involved in the send/receive process being read-only.
 (Actually, that's not quite true -- you can make a subvolume
 read/write, and then read-only again. In that case, the receive will
 probably fail, leaving the received subvolume in a partially-created
 state).
 
 4) If a file is created, modified and then deleted in-between two
 snapshots is it ignored by send/receive or does send/receive
 re-enacts” the journal exactly ?
 
  It'll be ignored. The FS doesn't keep track of how it reached a
 particular state -- only what that state is.
 
  Hugo.
 
 -- 
 Hugo Mills | IMPROVE YOUR ORGANISMS!!
 hugo@... carfax.org.uk |
 http://carfax.org.uk/  |
 PGP: E2AB1DE4  |Subject line of spam 
 email
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Send/Receive Use Case

2015-07-10 Thread Vincent Olivier

this is GOOD news . thanks !

 On Jul 10, 2015, at 3:11 PM, Hugo Mills h...@carfax.org.uk wrote:
 
 On Fri, Jul 10, 2015 at 03:03:27PM -0400, Vincent Olivier wrote:
 actually I have another question : is it posssible for a RAID0 fs to 
 “receive” from a “sending” RAID10 ?
 
   Yes, definitely. I do my backups from RAID-1 to single. The send
 stream format is based on files, not on the underlying raw storage.
 
   Hugo.
 
 or do they need to be of the same replication scheme too ?
 
 
 
 On Jun 27, 2015, at 10:34 AM, Vincent Olivier vinc...@up4.com wrote:
 
 ok i’ll go home and rethink my life then ;)
 
 On Jun 27, 2015, at 10:21 AM, Hugo Mills h...@carfax.org.uk wrote:
 
 On Sat, Jun 27, 2015 at 10:04:28AM -0400, Vincent Olivier wrote:
 Hi,
 
 There are 4 things I’m not sure about re:send/receive.
 
 1) Is it possible to first copy things on a file system using rsync and 
 then use send-receive ? And to subsequently mix rsync and send-receive ? 
 Provided that snapshots are made accordingly.
 
 Probably. It depends on exctly how you want to use them.
 
 2) Is possible to “send a snapshot diff to disk and then “receive it 
 from the said disk into a remote filesystem ? I have two very large and 
 physically distant btrfs filesystems. It would be more economical to 
 juste dump snapshot diffs to disk for transport instead of the network.
 
 Yes, that's perfectly possible.
 
 3) How are “conflicts” handled by send-receive if at all ?
 
 There are no conflicts possible, due to the requirement of all the
 subvolumes involved in the send/receive process being read-only.
 (Actually, that's not quite true -- you can make a subvolume
 read/write, and then read-only again. In that case, the receive will
 probably fail, leaving the received subvolume in a partially-created
 state).
 
 4) If a file is created, modified and then deleted in-between two
 snapshots is it ignored by send/receive or does send/receive
 re-enacts” the journal exactly ?
 
 It'll be ignored. The FS doesn't keep track of how it reached a
 particular state -- only what that state is.
 
 Hugo.
 
 
 --
 Hugo Mills | UNIX: Japanese brand of food containers
 hugo@... carfax.org.uk |
 http://carfax.org.uk/  |
 PGP: E2AB1DE4  |



signature.asc
Description: Message signed with OpenPGP using GPGMail

Send/Receive Use Case

2015-06-27 Thread Vincent Olivier

Hi,

There are 4 things I’m not sure about re:send/receive.

1) Is it possible to first copy things on a file system using rsync and then 
use send-receive ? And to subsequently mix rsync and send-receive ? Provided 
that snapshots are made accordingly.

2) Is possible to “send a snapshot diff to disk and then “receive it from the 
said disk into a remote filesystem ? I have two very large and physically 
distant btrfs filesystems. It would be more economical to juste dump snapshot 
diffs to disk for transport instead of the network.

3) How are “conflicts” handled by send-receive if at all ?

4) If a file is created, modified and then deleted in-between two snapshots is 
it ignored by send/receive or does send/receive re-enacts” the journal exactly 
?

Thanks,

Vincent--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Send/Receive Use Case

2015-06-27 Thread Vincent Olivier

ok i’ll go home and rethink my life then ;)

 On Jun 27, 2015, at 10:21 AM, Hugo Mills h...@carfax.org.uk wrote:
 
 On Sat, Jun 27, 2015 at 10:04:28AM -0400, Vincent Olivier wrote:
 Hi,
 
 There are 4 things I’m not sure about re:send/receive.
 
 1) Is it possible to first copy things on a file system using rsync and then 
 use send-receive ? And to subsequently mix rsync and send-receive ? Provided 
 that snapshots are made accordingly.
 
   Probably. It depends on exctly how you want to use them.
 
 2) Is possible to “send a snapshot diff to disk and then “receive it from 
 the said disk into a remote filesystem ? I have two very large and 
 physically distant btrfs filesystems. It would be more economical to juste 
 dump snapshot diffs to disk for transport instead of the network.
 
   Yes, that's perfectly possible.
 
 3) How are “conflicts” handled by send-receive if at all ?
 
   There are no conflicts possible, due to the requirement of all the
 subvolumes involved in the send/receive process being read-only.
 (Actually, that's not quite true -- you can make a subvolume
 read/write, and then read-only again. In that case, the receive will
 probably fail, leaving the received subvolume in a partially-created
 state).
 
 4) If a file is created, modified and then deleted in-between two
 snapshots is it ignored by send/receive or does send/receive
 re-enacts” the journal exactly ?
 
   It'll be ignored. The FS doesn't keep track of how it reached a
 particular state -- only what that state is.
 
   Hugo.
 
 -- 
 Hugo Mills | IMPROVE YOUR ORGANISMS!!
 hugo@... carfax.org.uk |
 http://carfax.org.uk/  |
 PGP: E2AB1DE4  |Subject line of spam email

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RAID10 Balancing Request for Comments and Advices

2015-06-17 Thread Vincent Olivier


 On Jun 17, 2015, at 9:27 AM, Hugo Mills h...@carfax.org.uk wrote:
 
 On Wed, Jun 17, 2015 at 09:13:08AM -0400, Vincent Olivier wrote:
 
 On Jun 16, 2015, at 8:14 PM, Chris Murphy li...@colorremedies.com wrote:
 
 On Tue, Jun 16, 2015 at 5:58 PM, Duncan 1i5t5.dun...@cox.net wrote:
 
 On a current kernel unlike older ones, btrfs actually automates entirely
 empty chunk reclaim, so this problem doesn't occur anything close to near
 as often as it used to.  However, it's still possible to have mostly but
 not entirely empty chunks that btrfs won't automatically reclaim.  A
 balance can be used to rewrite and combine these mostly empty chunks,
 reclaiming the space saved.  This is what Hugo was recommending.
 
 Yes, as little as a -dusage=5 (data chunks that are 5% or less full)
 can clear the problem and is very fast, seconds. Possibly a bit
 longer, many seconds o single digit minutes is -dusage=15. I haven't
 done a full balance in forever.
 
 
 Yes, on this 80% full 6x4TB RAID10 -dusage=15 took 2 seconds and relocated 
 0 out of 3026 chunks”.
 
 Out of curiosity, I had to use -dusage=90 to have it relocate only 1 chunk 
 and it took les than 30 seconds.
 
 So I put a -dusage=25 in the weekly cron just before the scrub.
 
   In most cases, all you need to do is clean up one data chunk to
 give the metadata enough space to work in. Instead of manually
 iterating through several values of usage= until you get a useful
 response, you can use limit=n to stop after n successful block
 group relocations.


Nice! Will do that instead! Thanks.


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: RAID10 Balancing Request for Comments and Advices

2015-06-17 Thread Vincent Olivier


 On Jun 16, 2015, at 8:14 PM, Chris Murphy li...@colorremedies.com wrote:
 
 On Tue, Jun 16, 2015 at 5:58 PM, Duncan 1i5t5.dun...@cox.net wrote:
 
 On a current kernel unlike older ones, btrfs actually automates entirely
 empty chunk reclaim, so this problem doesn't occur anything close to near
 as often as it used to.  However, it's still possible to have mostly but
 not entirely empty chunks that btrfs won't automatically reclaim.  A
 balance can be used to rewrite and combine these mostly empty chunks,
 reclaiming the space saved.  This is what Hugo was recommending.
 
 Yes, as little as a -dusage=5 (data chunks that are 5% or less full)
 can clear the problem and is very fast, seconds. Possibly a bit
 longer, many seconds o single digit minutes is -dusage=15. I haven't
 done a full balance in forever.


Yes, on this 80% full 6x4TB RAID10 -dusage=15 took 2 seconds and relocated 0 
out of 3026 chunks”.

Out of curiosity, I had to use -dusage=90 to have it relocate only 1 chunk and 
it took les than 30 seconds.

So I put a -dusage=25 in the weekly cron just before the scrub.

FYI.

Thanks for your help.--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RAID10 Balancing Request for Comments and Advices

2015-06-17 Thread Vincent Olivier


 On Jun 16, 2015, at 7:58 PM, Duncan 1i5t5.dun...@cox.net wrote:
 
 Vincent Olivier posted on Tue, 16 Jun 2015 09:34:29 -0400 as excerpted:
 
 
 On Jun 16, 2015, at 8:25 AM, Hugo Mills h...@carfax.org.uk wrote:
 
 On Tue, Jun 16, 2015 at 08:09:17AM -0400, Vincent Olivier wrote:
 
 My first question is this : is it normal to have “single” blocks ?
 Why not only RAID10? I don’t remember the exact mkfs options I used
 but I certainly didn’t ask for “single” so this is unexpected.
 
 Yes. It's an artefact of the way that mkfs works. If you run a
 balance on those chunks, they'll go away. (btrfs balance start
 -dusage=0 -musage=0 /mountpoint)
 
 Thanks! I did and it did go away, except for the GlobalReserve, single:
 total=512.00MiB, used=0.00B”. But I suppose this is a permanent fixture,
 right?
 
 Yes.  GlobalReserve is for short-term btrfs-internal use, reserved for
 times when btrfs needs to (temporarily) allocate some space in ordered to
 free space, etc.  It's always single, and you'll rarely see anything but
 0 used except perhaps in the middle of a balance or something.


Get it. Thanks.

Is there anyway to put that on another device, say, a SSD? I am thinking of 
backing up this RAID10 on a 2x8TB device-managed SMR RAID1 and I want to 
minimize random write operations (noatime  al.). I will start a new thread for 
that maybe but first, is there something substantial I can read about 
btrfs+SMR? Or should I avoid SMR+btfs ?


 
 For maintenance, I would suggest running a scrub regularly, to
 check for various forms of bitrot. Typical frequencies for a scrub are
 once a week or once a month -- opinions vary (as do runtimes).
 
 
 Yes. I cronned it weekly for now. Takes about 5 hours. Is it
 automatically corrected on RAID10 since a copy of it exist within the
 filesystem ? What happens for RAID0 ?
 
 For raid10 (and the raid1 I use), yes, it's corrected, from the other
 existing copy, assuming it's good, tho if there are metadata checksum
 errors, there may be corresponding unverified checksums as well, where
 the verification couldn't be done because the metadata containing the
 checksums was bad.  Thus, if there are errors found and corrected, and
 you see unverified errors as well, rerun the scrub, so the newly
 corrected metadata can now be used to verify the previously unverified
 errors.


ok then, rule of the thumb re-run the scrub on “unverified checksum error(s)”. 
I have yet to see checksum errors yet but will keep it in mind..

 
 I'm presently getting a lot of experience with this as one of the ssds in
 my raid1 is gradually failing and rewriting sectors.  Generally what
 happens is that the ssd will take too long, triggering a SATA reset (30
 second timeout), and btrfs will call that an error.  The scrub then
 rewrites the bad copy on the unreliable device with the good copy from
 the more reliable device, with the write triggering a sector relocation
 on the bad device.  The newly written copy then checks out good, but if
 it was metadata, it very likely contained checksums for several other
 blocks, which couldn't be verified because the block containing their
 checksums was itself bad.  Typically I'll see dozens to a couple hundred
 unverified errors for every bad metadata block rewritten in this way.
 Rerunning the scrub then either verifies or fixes the previously
 unverified blocks, tho sometimes one of those in turn ends up bad and if
 it's a metadata block, I may end up rerunning the scrub another time or
 two, until everything checks out.
 
 FWIW, on the bad device, smartctl -A reports (excerpted):
 
 ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
 WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   098   098   036Old_age   Always   
 -   259
 182 Erase_Fail_Count_Total  0x0032   100   100   000Old_age   Always  
  -   132
 
 While on the paired good device:
 
  5 Reallocated_Sector_Ct   0x0032   253   253   036Old_age   Always   
 -   0
 182 Erase_Fail_Count_Total  0x0032   253   253   000Old_age   Always  
  -   0
 
 Meanwhile, smartctl -H has already warned once that the device is
 failing, tho it went back to passing status again, but as of now it's
 saying failing, again.  The attribute that actually registers as failing,
 again from the bad device, followed by the good, is:
 
  1 Raw_Read_Error_Rate 0x000f   001   001   006Pre-fail  Always   
 FAILING_NOW 3081
 
  1 Raw_Read_Error_Rate 0x000f   160   159   006Pre-fail  Always   
 -   41
 
 When it's not actually reporting failing, the FAILING_NOW status is
 replaced with IN_THE_PAST.
 
 250 Read_Error_Retry_Rate is the other attribute of interest, with values
 of 100 current and worst for both devices, threshold 0, but a raw value
 of 2488 for the good device and over 17,000,000 for the failing device.
 But with the cooked value never moving from 100 and with no real
 guidance on how to interpret the raw values, while

RAID10 Balancing Request for Comments and Advices

2015-06-16 Thread Vincent Olivier

Hello,

I have a Centos 7 machine with the latest EPEL kernel-ml (4.0.5) with a 6-disk 
4TB HGST RAID10 btrfs volume. With the following mount options :

noatime,compress=zlib,space_cache 0 2


btrfs filesystem df” gives :


Data, RAID10: total=7.08TiB, used=7.02TiB
Data, single: total=8.00MiB, used=0.00B
System, RAID10: total=7.88MiB, used=656.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID10: total=9.19GiB, used=7.56GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B


My first question is this : is it normal to have “single” blocks ? Why not only 
RAID10? I don’t remember the exact mkfs options I used but I certainly didn’t 
ask for “single” so this is unexpected.

My second question is : what is the best device add / balance sequence to use 
if I want to add 2 more disks to this RAID10 volume? Also is a balance 
necessary at all since I’m adding a pair?

My third question is: given that this file system is an offline backup for 
another RAID0 volume with SMB sharing, what is the best maintenance schedule as 
long as it is offline? For now, I only have a weekly cron scrub now, but I 
think that the priority is to have it balanced after a send-receive or rsync to 
optimize storage space availability (over performance). Is there a “light” 
balancing method recommended in this case?

My fourth question, still within the same context: are there best practices 
when using smartctl for periodically testing (long test, short test) btrfs RAID 
devices?

Thanks!

Vincent

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RAID10 Balancing Request for Comments and Advices

2015-06-16 Thread Vincent Olivier

 
 On Jun 16, 2015, at 8:25 AM, Hugo Mills h...@carfax.org.uk wrote:
 
 On Tue, Jun 16, 2015 at 08:09:17AM -0400, Vincent Olivier wrote:
 
 btrfs filesystem df” gives :
 
 
 Data, RAID10: total=7.08TiB, used=7.02TiB
 Data, single: total=8.00MiB, used=0.00B
 System, RAID10: total=7.88MiB, used=656.00KiB
 System, single: total=4.00MiB, used=0.00B
 Metadata, RAID10: total=9.19GiB, used=7.56GiB
 Metadata, single: total=8.00MiB, used=0.00B
 GlobalReserve, single: total=512.00MiB, used=0.00B
 
 My first question is this : is it normal to have “single” blocks ?
 Why not only RAID10? I don’t remember the exact mkfs options I used
 but I certainly didn’t ask for “single” so this is unexpected.
 
  Yes. It's an artefact of the way that mkfs works. If you run a
 balance on those chunks, they'll go away. (btrfs balance start
 -dusage=0 -musage=0 /mountpoint)



Thanks! I did and it did go away, except for the GlobalReserve, single: 
total=512.00MiB, used=0.00B”. But I suppose this is a permanent fixture, right?



 
 My second question is : what is the best device add / balance sequence to 
 use if I want to add 2 more disks to this RAID10 volume? Also is a balance 
 necessary at all since I’m adding a pair?
 
  Add both devices first, then balance.
 
  For a RAID-1 filesystem, adding two devices wouldn't need a balance
 to get full usage out of the new devices. However, you've got RAID-10,
 so the most you'd be able to get on the FS without a balance is four
 times the remaining space on one of the existing disks.
 
  The chunk allocator for RAID-10 will allocate as many chunks as it
 can in an even number across all the devices, omitting the device with
 the smallest free space if there's an odd number of devices. It must
 have space on at least four devices, so adding two devices means that
 it'll have to have free space on at least two of the existing ones
 (and will try to use all of them).
 
  So yes, unless you're adding four devices, a rebalance is required
 here.


It is perfectly clear and logical that 1+0 works on four devices at a time.


 My third question is: given that this file system is an offline
 backup for another RAID0 volume with SMB sharing, what is the best
 maintenance schedule as long as it is offline? For now, I only have
 a weekly cron scrub now, but I think that the priority is to have it
 balanced after a send-receive or rsync to optimize storage space
 availability (over performance). Is there a “light” balancing method
 recommended in this case?
 
  You don't need to balance after send/receive or rsync. If you find
 that you have lots of data space allocated but not used (the first
 line in btrfs fi df, above), *and* metadata close to usage (within,
 say, 700 MiB), *and* no unallocated space (btrfs fi show), then it's
 worth running a filtered balance with -dlimit=3 or some similar small
 value to free up some space that the metadata can expand into. Other
 than that, it's pretty much entirely pointless.


Ok thanks. Is there a btrfs-utils way of automating the if less than 1Gb free 
do balance -dlimit=3” ?


  For maintenance, I would suggest running a scrub regularly, to
 check for various forms of bitrot. Typical frequencies for a scrub
 are once a week or once a month -- opinions vary (as do runtimes).


Yes. I cronned it weekly for now. Takes about 5 hours. Is it automatically 
corrected on RAID10 since a copy of it exist within the filesystem ? What 
happens for RAID0 ?

Thanks!

V--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs check help

Re: btrfs check help

Re: btrfs check help

btrfs check help

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: errors while "btrfs receive"

errors while "btrfs receive"

Re: Response to Bcachefs Claims

Response to Bcachefs Claims

Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

systemd : Timed out waiting for defice dev-disk-by…

Re: Send/Receive Use Case

Re: Send/Receive Use Case

Send/Receive Use Case

Re: Send/Receive Use Case

Re: RAID10 Balancing Request for Comments and Advices

Re: RAID10 Balancing Request for Comments and Advices

Re: RAID10 Balancing Request for Comments and Advices

RAID10 Balancing Request for Comments and Advices

Re: RAID10 Balancing Request for Comments and Advices

24 matches

Site Navigation

Mail list logo

Footer information