Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q
On Fri, Nov 25, 2016 at 3:31 PM, Zygo Blaxellwrote: > > This risk mitigation measure does rely on admins taking a machine in this > state down immediately, and also somehow knowing not to start a scrub > while their RAM is failing...which is kind of an annoying requirement > for the admin. Attempting to detect if RAM is bad when scrub starts is both time consuming and not very reliable right. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Drive Replacement
I've got a BTRFS array that is of mixed size disks: 2x750G 3x1.5T 3x3T And it's getting fuller than I'd like. The problem is that adding disks is harder than one would like as the computer only has 8 sata ports. Is it viable to do the following to upgrade one of the disks? A) Take array offline B) DD the contents of one of the 750G drives to a new 3T drive C) Remove the 750G from the system D) btrfs scan E) Mount array F) Run a balance I know that not physically removing the old copy of the drive will cause massive issues, but if I do that everything should be fine right? -- Gareth Pye - chatterofjudges.mtgmelb.com Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommendation on raid5 drive error resolution
Things have been copying off really well. I'm starting to suspect the issue was the PSU which I've swapped out. What is the line I should see in dmesg if the degraded option was actually used when mounting the file system? On Thu, Sep 1, 2016 at 9:25 PM, Austin S. Hemmelgarn <ahferro...@gmail.com> wrote: > On 2016-08-31 19:04, Gareth Pye wrote: >> >> ro,degraded has mounted it nicely and my rsync of the more useful data >> is progressing at the speed of WiFi. >> >> There are repeated read errors from one drive still but the rsync >> hasn't bailed yet, which I think means there isn't any overlapping >> errors in any of the files it has touched thus far. Am I right or is >> their likely to be corrupt data in the files I've synced off? > > Unless you've been running with nocow or nodatasum in your mount options, > then what you've concluded should be correct. I would still suggest > verifying the data by some external means if possible, this type of > situation is not something that's well tested, and TBH I'm amazed that > things are working to the degree that they are. > >> >> On Wed, Aug 31, 2016 at 7:46 AM, Gareth Pye <gar...@cerberos.id.au> wrote: >>> >>> Or I could just once again select the right boot device in the bios. I >>> think I want some new hardware :) >>> >>> On Wed, Aug 31, 2016 at 7:23 AM, Gareth Pye <gar...@cerberos.id.au> >>> wrote: >>>> >>>> On Wed, Aug 31, 2016 at 4:28 AM, Chris Murphy <li...@colorremedies.com> >>>> wrote: >>>>> >>>>> But I'd try a newer kernel before you >>>>> give up on it. >>>> >>>> >>>> >>>> Any recommendations on liveCDs that have recent kernels & btrfs tools? >>>> For no apparent reason system isn't booting normally either, and I'm >>>> reluctant to fix that before at least confirming the things I at least >>>> partially care about have a recent backup. >>>> >>>> -- >>>> Gareth Pye - blog.cerberos.id.au >>>> Level 2 MTG Judge, Melbourne, Australia >>> >>> >>> >>> >>> -- >>> Gareth Pye - blog.cerberos.id.au >>> Level 2 MTG Judge, Melbourne, Australia >> >> >> >> > -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommendation on raid5 drive error resolution
ro,degraded has mounted it nicely and my rsync of the more useful data is progressing at the speed of WiFi. There are repeated read errors from one drive still but the rsync hasn't bailed yet, which I think means there isn't any overlapping errors in any of the files it has touched thus far. Am I right or is their likely to be corrupt data in the files I've synced off? On Wed, Aug 31, 2016 at 7:46 AM, Gareth Pye <gar...@cerberos.id.au> wrote: > Or I could just once again select the right boot device in the bios. I > think I want some new hardware :) > > On Wed, Aug 31, 2016 at 7:23 AM, Gareth Pye <gar...@cerberos.id.au> wrote: >> On Wed, Aug 31, 2016 at 4:28 AM, Chris Murphy <li...@colorremedies.com> >> wrote: >>> But I'd try a newer kernel before you >>> give up on it. >> >> >> Any recommendations on liveCDs that have recent kernels & btrfs tools? >> For no apparent reason system isn't booting normally either, and I'm >> reluctant to fix that before at least confirming the things I at least >> partially care about have a recent backup. >> >> -- >> Gareth Pye - blog.cerberos.id.au >> Level 2 MTG Judge, Melbourne, Australia > > > > -- > Gareth Pye - blog.cerberos.id.au > Level 2 MTG Judge, Melbourne, Australia -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommendation on raid5 drive error resolution
Or I could just once again select the right boot device in the bios. I think I want some new hardware :) On Wed, Aug 31, 2016 at 7:23 AM, Gareth Pye <gar...@cerberos.id.au> wrote: > On Wed, Aug 31, 2016 at 4:28 AM, Chris Murphy <li...@colorremedies.com> wrote: >> But I'd try a newer kernel before you >> give up on it. > > > Any recommendations on liveCDs that have recent kernels & btrfs tools? > For no apparent reason system isn't booting normally either, and I'm > reluctant to fix that before at least confirming the things I at least > partially care about have a recent backup. > > -- > Gareth Pye - blog.cerberos.id.au > Level 2 MTG Judge, Melbourne, Australia -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommendation on raid5 drive error resolution
On Wed, Aug 31, 2016 at 4:28 AM, Chris Murphy <li...@colorremedies.com> wrote: > But I'd try a newer kernel before you > give up on it. Any recommendations on liveCDs that have recent kernels & btrfs tools? For no apparent reason system isn't booting normally either, and I'm reluctant to fix that before at least confirming the things I at least partially care about have a recent backup. -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommendation on raid5 drive error resolution
Okay, things aren't looking good. The FS wont mount for me: http://pastebin.com/sEEdRxsN On Tue, Aug 30, 2016 at 9:01 AM, Gareth Pye <gar...@cerberos.id.au> wrote: > When I can get this stupid box to boot from an external drive I'll > have some idea of what is going on -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommendation on raid5 drive error resolution
When I can get this stupid box to boot from an external drive I'll have some idea of what is going on -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommendation on raid5 drive error resolution
Am I right that the wr: 0 means that the disks should at least be in a nice consistent state? I know that overlapping read fails can still cause everything to fail. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommendation on raid5 drive error resolution
Current status: Knowing things were bad I did set the scterc values sanely, but the box was getting less stable so I thought a reboot was a good idea. That reboot failed to mount the partition at all and eveything triggered my 'is this a psu issue' sense so I've left the box off till I've got time to check if a psu replacement makes anything happier. That might happen tonight or tomorrow. I'll update the thread when I do that. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Recommendation on raid5 drive error resolution
So I've been living on the reckless-side (meta RAID6, data RAID5) and I have a drive or two that isn't playing nicely any more. dmesg of the system running for a few minutes: http://pastebin.com/9pHBRQVe Everything of value is backed up, but I'd rather keep data than download it all again. When I only saw one disk having troubles I was concerned. Now I notice both sda and sdc having issues I'm thinking I might be about to have a bad time. What else should I provide? -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
Is there some simple muddling of meta data that could be done to force dup meta data on deduping SSDs? Like a simple 'random' byte repeated often enough it would defeat any sane dedup? I know it would waste data but clearly that is considered worth it with dup metadata (what is the difference between 50% metadata efficiency and 45%?) On Thu, Aug 11, 2016 at 2:50 PM, Duncan <1i5t5.dun...@cox.net> wrote: > Dave T posted on Wed, 10 Aug 2016 18:01:44 -0400 as excerpted: > >> Does anyone have any thoughts about using dup mode for metadata on a >> Samsung 950 Pro (or any NVMe drive)? > > The biggest problem with dup on ssds is that some ssds (particularly the > ones with the sandforce controllers) do dedup, so you'd be having btrfs > do dup while the filesystem dedups, to no effect except more cpu and > device processing! > > (The other argument for single on ssd that I've seen is that because the > FTL ultimately places the data, and because both copies are written at > the same time, there's a good chance that the FTL will write them into > the same erase block and area, and a defect in one will likely be a > defect in the other as well. That may or may not be, I'm not qualified > to say, but as explained below, I do choose to take my chances on that > and thus do run dup on ssd.) > > So as long as the SSD doesn't have a deduping FTL, I'd suggest dup for > metadata on ssd does make sense. Data... not so sure on, but certainly > metadata, because one bad block of metadata can be many messed up files. > > On my ssds here, which I know don't do dedup, most of my btrfs are raid1 > on the pair of ssds. However, /boot is different since I can't really > point grub at two different /boots, so I have my working /boot on one > device, with the backup /boot on the other, and the grub on each one > pointed at its respective /boot, so I can select working or backup /boot > from the BIOS and it'll just work. Since /boot is so small, it's mixed- > mode chunks, meaning data and metadata are mixed together and the > redundancy mode applies to both at once instead of each separately. And > I chose dup, so it's dup for both data and metadata. > > Works fine, dup for both data and metadata on non-deduping ssds, but of > course that means data takes double the space since there's two copies of > it, and that gets kind of expensive on ssd, if it's more than the > fraction of a GiB that's /boot. > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question: raid1 behaviour on failure
PDF doc info dates it at 23/1/2013, which is the best guess that can easily be found. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrub priority, am I using it wrong?
Yeah, RAID5. I'm now doing pause and resume on it to let it take multiple nights, the idle should let other processes complete in reasonable time. On Wed, Apr 6, 2016 at 3:34 AM, Henk Slager <eye...@gmail.com> wrote: > On Tue, Apr 5, 2016 at 4:37 AM, Duncan <1i5t5.dun...@cox.net> wrote: >> Gareth Pye posted on Tue, 05 Apr 2016 09:36:48 +1000 as excerpted: >> >>> I've got a btrfs file system set up on 6 drbd disks running on 2Tb >>> spinning disks. The server is moderately loaded with various regular >>> tasks that use a fair bit of disk IO, but I've scheduled my weekly btrfs >>> scrub for the best quiet time in the week. >>> >>> The command that is run is: >>> /usr/local/bin/btrfs scrub start -Bd -c idle /data >>> >>> Which is my best attempt to try and get it to have a low impact on user >>> operations >>> >>> But iotop shows me: >>> >>> 1765 be/4 root 14.84 M/s0.00 B/s 0.00 % 96.65 % btrfs scrub >>> start -Bd -c idle /data >>> 1767 be/4 root 14.70 M/s0.00 B/s 0.00 % 95.35 % btrfs >>> scrub start -Bd -c idle /data >>> 1768 be/4 root 13.47 M/s0.00 B/s 0.00 % 92.59 % btrfs >>> scrub start -Bd -c idle /data >>> 1764 be/4 root 12.61 M/s0.00 B/s 0.00 % 88.77 % btrfs >>> scrub start -Bd -c idle /data >>> 1766 be/4 root 11.24 M/s0.00 B/s 0.00 % 85.18 % btrfs >>> scrub start -Bd -c idle /data >>> 1763 be/4 root7.79 M/s0.00 B/s 0.00 % 63.30 % btrfs >>> scrub start -Bd -c idle /data >>> 28858 be/4 root0.00 B/s 810.50 B/s 0.00 % 61.32 % [kworker/ >> u16:25] >>> >>> >>> Which doesn't look like an idle priority to me. And the system sure >>> feels like a system with a lot of heavy io going on. Is there something >>> I'm doing wrong? > > When I see the throughput numbers, it lets me think that the > filesystem is raid5 or raid6. On single or raid1 or raid10 one easily > gets around 100M/s without the notice/feeling of heavy IO ongoing, > mostly independent of scrub options. > >> Two points: >> >> 1) It appears btrfs scrub start's -c option only takes numeric class, so >> try -c3 instead of -c idle. > > Thanks to Duncan for pointing this out. I don't remember exactly, but > I think I also had issues with this in the past, but did not realize > or have a further look at it. > >> Works for me with the numeric class (same results as you with spelled out >> class), tho I'm on ssd with multiple independent btrfs on partitions, the >> biggest of which is 24 GiB, 18.something GiB used, which scrubs in all of >> 20 seconds, so I don't need and hadn't tried the -c option at all until >> now. >> >> 2) What a difference an ssd makes! >> >> $$ sudo btrfs scrub start -c3 /p >> scrub started on /p, [...] >> >> $$ sudo iotop -obn1 >> Total DISK READ : 626.53 M/s | Total DISK WRITE : 0.00 B/s >> Actual DISK READ: 596.93 M/s | Actual DISK WRITE: 0.00 B/s >> TID PRIO USER DISK READ DISK WRITE SWAPIN IOCOMMAND >> 872 idle root 268.40 M/s0.00 B/s 0.00 % 0.00 % btrfs scrub >> start -c3 /p >> 873 idle root 358.13 M/s0.00 B/s 0.00 % 0.00 % btrfs scrub >> start -c3 /p >> >> CPU bound, 0% IOWait even at idle IO priority, in addition to the >> hundreds of M/s values per thread/device, here. You OTOH are showing >> under 20 M/s per thread/device on spinning rust, with an IOWait near 90%, >> thus making it IO bound. > > This low M/s and high IOWait is the kind of behavior I noticed with 3x > 2TB raid5 when scrubbing or balancing (no bcache activated, kernel > 4.3.3). > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrub priority, am I using it wrong?
On Tue, Apr 5, 2016 at 12:37 PM, Duncan <1i5t5.dun...@cox.net> wrote: > CPU bound, 0% IOWait even at idle IO priority, in addition to the > hundreds of M/s values per thread/device, here. You OTOH are showing > under 20 M/s per thread/device on spinning rust, with an IOWait near 90%, > thus making it IO bound. And yes I'd love to switch to SSD, but 12 2TB drives is a bit pricey still -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrub priority, am I using it wrong?
On Tue, Apr 5, 2016 at 12:37 PM, Duncan <1i5t5.dun...@cox.net> wrote: > 1) It appears btrfs scrub start's -c option only takes numeric class, so > try -c3 instead of -c idle. Does it count as a bug if it silently accepts the way I was doing it? I've switched to -c3 and at least now the idle class listed in iotop is idle, so I hope that means it will be more friendly to other processes. -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Scrub priority, am I using it wrong?
I've got a btrfs file system set up on 6 drbd disks running on 2Tb spinning disks. The server is moderately loaded with various regular tasks that use a fair bit of disk IO, but I've scheduled my weekly btrfs scrub for the best quiet time in the week. The command that is run is: /usr/local/bin/btrfs scrub start -Bd -c idle /data Which is my best attempt to try and get it to have a low impact on user operations But iotop shows me: 1765 be/4 root 14.84 M/s0.00 B/s 0.00 % 96.65 % btrfs scrub start -Bd -c idle /data 1767 be/4 root 14.70 M/s0.00 B/s 0.00 % 95.35 % btrfs scrub start -Bd -c idle /data 1768 be/4 root 13.47 M/s0.00 B/s 0.00 % 92.59 % btrfs scrub start -Bd -c idle /data 1764 be/4 root 12.61 M/s0.00 B/s 0.00 % 88.77 % btrfs scrub start -Bd -c idle /data 1766 be/4 root 11.24 M/s0.00 B/s 0.00 % 85.18 % btrfs scrub start -Bd -c idle /data 1763 be/4 root7.79 M/s0.00 B/s 0.00 % 63.30 % btrfs scrub start -Bd -c idle /data 28858 be/4 root0.00 B/s 810.50 B/s 0.00 % 61.32 % [kworker/u16:25] Which doesn't look like an idle priority to me. And the system sure feels like a system with a lot of heavy io going on. Is there something I'm doing wrong? System details: # uname -a Linux emile 4.4.3-040403-generic #201602251634 SMP Thu Feb 25 21:36:25 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux # /usr/local/bin/btrfs --version btrfs-progs v4.4.1 I'm waiting on the ppa version of 4.5.1 before upgrading, that is my usual kernel update strategy. # cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=14.04 DISTRIB_CODENAME=trusty DISTRIB_DESCRIPTION="Ubuntu 14.04.4 LTS" Any other details that people would like to see that are relevant to this question? -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 6 full, but there is still space left on some devices
ly needs one disk >>>> with unallocated space. >>>> And btrfs chunk allocater will allocate chunk to device with most >>>> unallocated space. >>>> >>>> So after 1) and 2) you should found that chunk allocation is almost >>>> perfectly balanced across all devices, as long as they are in same size. >>>> >>>> Now you have a balance base layout for RAID6 allocation. Should make >>>> things >>>> go quite smooth and result a balanced RAID6 chunk layout. >>> >>> >>> This is a good trick to get out of 'the RAID6 full' situation. I have >>> done some RAID5 tests on 100G VM disks with kernel/tools 4.5-rcX/v4.4, >>> and various balancing starts, cancels, profile converts etc, worked >>> surprisingly well, compared to my experience a year back with RAID5 >>> (hitting bugs, crashes). >>> >>> A RAID6 full balance with this setup might be very slow, even if the >>> fs would be not so full. The VMs I use are on a mixed SSD/HDD >>> (bcache'd) array so balancing within the last GB(s), so almost no >>> workspace, still makes progress. But on HDD only, things can take very >>> long. The 'Unallocated' space on devid 1 should be at least a few GiB, >>> otherwise rebalancing will be very slow or just not work. >> >> >> That's true the rebalance of all chunks will be quite slow. >> I just hope OP won't encounter super slow >> >> BTW, the 'unallocated' space can on any device, as btrfs will choose devices >> by the order of unallocated space, to alloc new chunk. >> In the case of OP, balance itself should continue without much porblem as >> several devices have a lot of unallocated space. >> >>> >>> The way from RAID6 -> single/RAID1 -> RAID6 might also be more >>> acceptable w.r.t. speed in total. Just watch progress I would say. >>> Maybe its not needed to do a full convert, just make sure you will >>> have enough workspace before starting a convert from single/RAID1 to >>> RAID6 again. >>> >>> With v4.4 tools, you can do filtered balance based on stripe-width, so >>> it avoids complete balance again of block groups that are already >>> allocated across the right amount of devices. >>> >>> In this case, avoiding the re-balance of the '320.00KiB group' (in the >>> means time could be much larger) you could do this: >>> btrfs balance start -v -dstripes=1..6 /mnt/data >> >> >> Super brilliant idea!!! >> >> I didn't realize that's the silver bullet for such use case. >> >> BTW, can stripes option be used with convert? >> IMHO we still need to use single as a temporary state for those not fully >> allocated RAID6 chunks. >> Or we won't be able to alloc new RAID6 chunk with full stripes. >> >> Thanks, >> Qu >> >> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: utils version and convert crash
Just noting that I left things till I put a 4.4 kernel on (4.4.3 as it turns out) and now convert is going much nicer. Well it's still got some silly thing where the newly allocated blocks are mostly empty. It appears that the convert likes to take the 1Gig RAID1 block and write it to a new RAID5 block (6x1Gig=5Gig capacity). Meaning that the disks are likely to hit full during the convert. To avoid that I'm looping a convert with a block limit with a balance to target those blocks. The balance is pretty quick, but it does slow the process down. On Thu, Dec 3, 2015 at 9:14 AM, Gareth Pye <gar...@cerberos.id.au> wrote: > Yeah having a scrub take 9 hours instead of 24 (+ latency of human > involvement) would be really nice. > > On Thu, Dec 3, 2015 at 1:32 AM, Austin S Hemmelgarn > <ahferro...@gmail.com> wrote: >> On 2015-12-02 08:45, Duncan wrote: >>> >>> Austin S Hemmelgarn posted on Wed, 02 Dec 2015 07:25:13 -0500 as >>> excerpted: >>> >>>> On 2015-12-02 05:01, Duncan wrote: >>> >>> >>> [on unverified errors returned by scrub] >>>>> >>>>> >>>>> Unverified errors are, I believe[1], errors where a metadata block >>>>> holding checksums itself has an error, so the blocks its checksums in >>>>> turn covered are not checksum-verified. >>>>> >>>>> What that means in practice is that once the first metadata block error >>>>> has been corrected in a first scrub run, a second scrub run can now >>>>> check the blocks that were recorded as unverified errors in the first >>>>> run, potentially finding and hopefully fixing additional errors[.] >>> >>> >>>>> --- >>>>> [1] I'm not a dev and am not absolutely sure of the technical accuracy >>>>> of this description, but from an admin's viewpoint it seems to be >>>>> correct at least in practice, based on the fact that further scrubs as >>>>> long as there were unverified errors often did find additional errors, >>>>> while once the unverified count dropped to zero and the last read >>>>> errors were corrected, further scrubs turned up no further errors. >>>>> >>>> AFAICT from reading the code, that is a correct assessment. It would be >>>> kind of nice though if there was some way to tell scrub to recheck up to >>>> X many times if there are unverified errors... >>> >>> >>> Yes. For me as explained it wasn't that big a deal as another scrub was >>> another minute or less, but definitely on terabyte-scale filesystems on >>> spinning rust, where scrubs take hours, having scrub be able to >>> automatically track just the corrected errors along with their >>> unverifieds, and rescan just those, should only take a matter of a few >>> minutes more, while a full rescan of /everything/ would take the same >>> number of hours yet again... and again if there's a third scan required, >>> etc. >>> >>> I'd say just make it automatic on corrected metadata errors as I can't >>> think of a reason people wouldn't want it, given the time it would save >>> over rerunning a full scrub over and over again, but making it an option >>> would be fine with me too. >>> >> I was thinking an option to do a full re-scrub, but having an automatic >> reparse of the metadata in a fixed metadata block would be a lot more >> efficient that what I was thinking :) >> > > > > -- > Gareth Pye - blog.cerberos.id.au > Level 2 MTG Judge, Melbourne, Australia > "Dear God, I would like to file a bug report" -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Missing half of available space (resend)
Sorry his message 4 hours ago mentioned 14.10. On Thu, Dec 10, 2015 at 8:41 AM, David Hampton <mailingli...@dhampton.net> wrote: > Ubuntu 14.04 actually ships with the 3.13 kernel. I had already > upgraded it to 3.19 from the Ubuntu 15.04 release. > > I'm pretty sure I created the btrfs partition, not the MythBuntu > installer. I don't remember if that was even an option. > > David > > > On Wed, 2015-12-09 at 14:28 -0700, Chris Murphy wrote: >> On Wed, Dec 9, 2015 at 12:56 PM, Gareth Pye <gar...@cerberos.id.au> wrote: >> > I wouldn't blame Ubuntu too much, 14.10 went out of support months ago >> >> OP reported 3.19.0-32-generic #37~14.04.1-Ubuntu. And 14.04 is LTS >> supported until 2019. I think it should have something newer for both >> kernel and progs, if it's going to offer btrfs as an install time >> option. It's really easy to just have the LTS installer not offer >> Btrfs, and not install btrfs-progs. >> >> >> > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia "Dear God, I would like to file a bug report" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Missing half of available space (resend)
I wouldn't blame Ubuntu too much, 14.10 went out of support months ago (which counts as a long time when it's only for people happy to upgrade every 6 months). The kernel ppa's builds tend to run fine on the latest LTS & regular releases, although they can cause issues (I've had some fun with nvidia drivers at times). That ppa will get you to 4.3 or 4.4rc4. On Thu, Dec 10, 2015 at 6:39 AM, Chris Murphy <li...@colorremedies.com> wrote: > On Wed, Dec 9, 2015 at 10:28 AM, David Hampton > <mailingli...@dhampton.net> wrote: >> On Wed, 2015-12-09 at 16:48 +, Duncan wrote: >>> David Hampton posted on Wed, 09 Dec 2015 01:30:09 -0500 as excerpted: >>> >>> > Seems I need to upgrade my tools. That command was added in 3.18 and I >>> > only have the 3.12 tools. >>> >>> Definitely so, especially because you're running raid6, which wasn't >>> stable until 4.1 for both kernel and userspace. 3.12? I guess it did >>> have the very basic raid56 support, but it's definitely nothing I'd >>> trust, at that old not for btrfs in general, but FOR SURE not raid56. >> >> I've upgraded to the 4.2.0 kernel and the 4.0 btrfs-tools package. > > I think btrfs-progs 4.0 has a mkfs bug in it (or was that 4.0.1?) > Anyway, even that is still old in Btrfs terms. I think Ubuntu needs to > do better than this, or just acknowledge Btrfs is not supported, don't > include btrfs-progs at all by default, and stop making it an install > time option. > > >> These are the latest that Ubuntu has packaged for 15.10, and I've pulled >> them into my 14.10 based release. Is this recent enough, or do I need >> to try the 4.3 kernel/tools build from the active development tree (that >> will eventually become 16.04)? > > It's probably fine day to day, but if you ever were to need btrfs > check or repair, you'd want the current version no matter what. There > are just too many bug fixes and enhancements happening to not make > that effort. You kinda have to understand that you're effectively > testing Btrfs by using raid56. It is stabilizing, but it can hardly be > called stable or even feature complete seeing as there are all sorts > of missing failure notifications. > > More than anything else you need to be willing to lose everything on > this volume, without further notice, i.e. you need a backup strategy > that you're prepared to use without undue stress. If you can't do > that, you need to look at another arrangement. Both LVM and mdadm > raid6 + XFS are more stable. > > > -- > Chris Murphy > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia "Dear God, I would like to file a bug report" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: utils version and convert crash
Yeah having a scrub take 9 hours instead of 24 (+ latency of human involvement) would be really nice. On Thu, Dec 3, 2015 at 1:32 AM, Austin S Hemmelgarn <ahferro...@gmail.com> wrote: > On 2015-12-02 08:45, Duncan wrote: >> >> Austin S Hemmelgarn posted on Wed, 02 Dec 2015 07:25:13 -0500 as >> excerpted: >> >>> On 2015-12-02 05:01, Duncan wrote: >> >> >> [on unverified errors returned by scrub] >>>> >>>> >>>> Unverified errors are, I believe[1], errors where a metadata block >>>> holding checksums itself has an error, so the blocks its checksums in >>>> turn covered are not checksum-verified. >>>> >>>> What that means in practice is that once the first metadata block error >>>> has been corrected in a first scrub run, a second scrub run can now >>>> check the blocks that were recorded as unverified errors in the first >>>> run, potentially finding and hopefully fixing additional errors[.] >> >> >>>> --- >>>> [1] I'm not a dev and am not absolutely sure of the technical accuracy >>>> of this description, but from an admin's viewpoint it seems to be >>>> correct at least in practice, based on the fact that further scrubs as >>>> long as there were unverified errors often did find additional errors, >>>> while once the unverified count dropped to zero and the last read >>>> errors were corrected, further scrubs turned up no further errors. >>>> >>> AFAICT from reading the code, that is a correct assessment. It would be >>> kind of nice though if there was some way to tell scrub to recheck up to >>> X many times if there are unverified errors... >> >> >> Yes. For me as explained it wasn't that big a deal as another scrub was >> another minute or less, but definitely on terabyte-scale filesystems on >> spinning rust, where scrubs take hours, having scrub be able to >> automatically track just the corrected errors along with their >> unverifieds, and rescan just those, should only take a matter of a few >> minutes more, while a full rescan of /everything/ would take the same >> number of hours yet again... and again if there's a third scan required, >> etc. >> >> I'd say just make it automatic on corrected metadata errors as I can't >> think of a reason people wouldn't want it, given the time it would save >> over rerunning a full scrub over and over again, but making it an option >> would be fine with me too. >> > I was thinking an option to do a full re-scrub, but having an automatic > reparse of the metadata in a fixed metadata block would be a lot more > efficient that what I was thinking :) > -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia "Dear God, I would like to file a bug report" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: utils version and convert crash
Thanks for that info, ram appears to be checking out fine and smartctl reported that the drives are old but one had some form of elevated error. Looks like I might be buying a new drive. On Wed, Dec 2, 2015 at 9:01 PM, Duncan <1i5t5.dun...@cox.net> wrote: > Gareth Pye posted on Wed, 02 Dec 2015 18:07:48 +1100 as excerpted: > >> Output from scrub: >> sudo btrfs scrub start -Bd /data > > [Omitted no-error device reports.] > >> scrub device /dev/sdh (id 6) done >>scrub started at Wed Dec 2 07:04:08 2015 and finished after 06:47:22 >>total bytes scrubbed: 1.09TiB with 2 errors >>error details: read=2 >>corrected errors: 2, uncorrectable errors: 0, unverified errors: 30 > > Also note those unverified errors... > > I have quite a bit of experience with btrfs scrub as I ran with a failing > ssd for awhile, using btrfs scrub on the multiple btrfs raid1 filesystems > on parallel partitions on the failing ssd and another good one to correct > the errors and continue operations. > > Unverified errors are, I believe[1], errors where a metadata block > holding checksums itself has an error, so the blocks its checksums in > turn covered are not checksum-verified. > > What that means in practice is that once the first metadata block error > has been corrected in a first scrub run, a second scrub run can now check > the blocks that were recorded as unverified errors in the first run, > potentially finding and hopefully fixing additional errors, tho unless > the problem's extreme, most of the unverifieds should end up being > correct once they can be verified, with only a few possible further > errors found. > > Of course if some of these previously unverified blocks are themselves > metadata blocks with further checksums, yet another run may be required. > > Fortunately, these trees are quite wide (121 items according to an old > post from Hugo I found myself rereading a few hours ago) and thus don't > tend to be very deep -- I think I ended up rerunning scrub four times at > one point, before both read and unverified errors went to zero, tho > that's on relatively small partitioned-up ssd filesystems of under 50 gig > usable capacity (pair-raid1, 50 gig per device), so I could see terabyte- > scale filesystems going to 6-7 levels. > > And, again on a btrfs raid1 with a known failing device -- several > thousand redirected sectors by the time I gave up and btrfs replaced -- > generally each successive scrub run would return an order of magnitude or > so fewer errors (corrected and unverified both) than the previous run, > tho occasionally I'd hit a bad spot and the number would go up a bit in > one run, before dropping an order of magnitude or so again on the next > run. > > So with only two corrected read-errors and 30 unverified, I'd expect > maybe another one or two corrected read-errors on a second run, and > probably no unverifieds, in which case a third run shouldn't be necessary > unless you just want the peace of mind of seeing that no errors found > message. Tho of course if you're unlucky, one of those 30 will turn out > to be a a read error on a full 121-item metadata block, so your > unverifieds will go up for that run, before going down again in > subsequent runs. > > Of course with filesystems of under 50 gig capacity on fast ssds, a > typical scrub ran in under a minute, so repeated scrubs to find and > correct all errors wasn't a big deal, generally under 10 minutes > including human response time. On terabyte-scale spinning rust with > scrubs taking hours, multiple scrubs could easily take a full 24-hour day > or more! =:^( > > So now that you did one scrub and did find errors, you do probably want > to trace them down and correct the problem if possible, before running > further scrubs to find and exterminate any errors still hiding behind > unverified in the first run. But once you're reasonably confident you're > running a reliable system again, you probably do want to run further > scrubs until that unverified count goes to zero (assuming no > uncorrectable errors in the mean time). > > --- > [1] I'm not a dev and am not absolutely sure of the technical accuracy of > this description, but from an admin's viewpoint it seems to be correct at > least in practice, based on the fact that further scrubs as long as there > were unverified errors often did find additional errors, while once the > unverified count dropped to zero and the last read errors were corrected, > further scrubs turned up no further errors. > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman >
Re: utils version and convert crash
On Wed, Dec 2, 2015 at 2:14 AM, Duncan <1i5t5.dun...@cox.net> wrote: > So if you're running into the same problem gentoo's live-git build did, > it's likely because you're building the devel branch cloned from > kernel.org, which is no longer updated. Woah, kernel.org is making a log that looks like it's up to date but isn't that's awkward :( Building now from the github you mentioned. Also running a scrub, but I'm starting to suspect something else is responsible. It ran fine overnight but crashed in less than a minute after I logged back in on ssh this morning . . . -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia "Dear God, I would like to file a bug report" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: utils version and convert crash
Output from scrub: sudo btrfs scrub start -Bd /data scrub device /dev/sdd (id 2) done scrub started at Wed Dec 2 07:04:08 2015 and finished after 08:53:53 total bytes scrubbed: 2.47TiB with 0 errors scrub device /dev/sde (id 3) done scrub started at Wed Dec 2 07:04:08 2015 and finished after 07:22:02 total bytes scrubbed: 2.47TiB with 0 errors scrub device /dev/sdf (id 4) done scrub started at Wed Dec 2 07:04:08 2015 and finished after 06:48:31 total bytes scrubbed: 1.09TiB with 0 errors scrub device /dev/sdg (id 5) done scrub started at Wed Dec 2 07:04:08 2015 and finished after 06:52:07 total bytes scrubbed: 1.10TiB with 0 errors scrub device /dev/sdh (id 6) done scrub started at Wed Dec 2 07:04:08 2015 and finished after 06:47:22 total bytes scrubbed: 1.09TiB with 2 errors error details: read=2 corrected errors: 2, uncorrectable errors: 0, unverified errors: 30 scrub device /dev/sdc (id 7) done scrub started at Wed Dec 2 07:04:08 2015 and finished after 04:11:59 total bytes scrubbed: 430.52GiB with 0 errors WARNING: errors detected during scrubbing, corrected Looks like I have some issues. Going to confirm cables are all secure and run a memtest. On Wed, Dec 2, 2015 at 9:22 AM, Gareth Pye <gar...@cerberos.id.au> wrote: > Will do that once the scrub finishes/I get home from work. > > On Wed, Dec 2, 2015 at 7:30 AM, Austin S Hemmelgarn > <ahferro...@gmail.com> wrote: >> On 2015-12-01 15:12, Gareth Pye wrote: >>> >>> On Wed, Dec 2, 2015 at 2:14 AM, Duncan <1i5t5.dun...@cox.net> wrote: >>>> >>>> So if you're running into the same problem gentoo's live-git build did, >>>> it's likely because you're building the devel branch cloned from >>>> kernel.org, which is no longer updated. >>> >>> >>> >>> Woah, kernel.org is making a log that looks like it's up to date but >>> isn't that's awkward :( >>> >>> Building now from the github you mentioned. >>> >>> Also running a scrub, but I'm starting to suspect something else is >>> responsible. It ran fine overnight but crashed in less than a minute >>> after I logged back in on ssh this morning . . . >>> >> Hmm, the fact that it's intermittent is the most concerning part IMHO. It >> means it's a lot harder to track down. If your hard drives aren't any >> noisier than normal (most traditional hard disks get noticeably noisier when >> they're failing), then I'd suggest running something like memtest86+ for at >> least a full cycle with default options to verify if your RAM is working >> correctly. Usually, when I see intermittent crashes like this it's either a >> race condition in software somewhere, or bad RAM, and it's a lot easier to >> test for bad RAM than it is to test for race conditions. >> > > > > -- > Gareth Pye - blog.cerberos.id.au > Level 2 MTG Judge, Melbourne, Australia > "Dear God, I would like to file a bug report" -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia "Dear God, I would like to file a bug report" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: utils version and convert crash
Poking around I just noticed that btrfs de stats /data points out that 3 of my drives have some read_io_errors. I'm guessing that is a bad thing. I assume this would indicate bad hardware and would be a likely cause of system crashes. :( On Tue, Dec 1, 2015 at 11:38 PM, Gareth Pye <gar...@cerberos.id.au> wrote: > I'm getting some crashes when converting from RAID1 to RAID5, I know > that there was some issues recently but was lead to believe that there > were fixes that should have been in 4.3. (I'm using the ubuntu kernel > ppa teams 4.3 kernel) > > One of the first things I checked was that I was using an up to date > btrfs util and it keeps reporting that I'm using version 4.0 (btrfs > --version). This is after confirming that my git clone is up to date, > the last commit to my master is talking about v4.3.1 and the > version.sh tells me it is 4.3.1 as well. Why the different reported > versions? > > i'm converting in small chunks so I can then do a balance to recover > the partial blocks, the following command locks the pc up > occassionally. Mostly the balance completes happily on next boot but > I've seen it cause a reboot once. > > btrfs fi balance start -dprofiles=raid1,convert=raid5,limit=1 /data > > After running that several times I'm using the following to clean up > the partial RAID5 chunks: > > btrfs fi balance start -dprofiles=raid5,usage=21 /data > > I haven't noticed that clean up balance ever crashing. Is this a > known bug or should I try and discover the cause better. Dmesg and > syslog don't have anything obvious in them. > > -- > Gareth Pye - blog.cerberos.id.au > Level 2 MTG Judge, Melbourne, Australia > "Dear God, I would like to file a bug report" -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia "Dear God, I would like to file a bug report" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
utils version and convert crash
I'm getting some crashes when converting from RAID1 to RAID5, I know that there was some issues recently but was lead to believe that there were fixes that should have been in 4.3. (I'm using the ubuntu kernel ppa teams 4.3 kernel) One of the first things I checked was that I was using an up to date btrfs util and it keeps reporting that I'm using version 4.0 (btrfs --version). This is after confirming that my git clone is up to date, the last commit to my master is talking about v4.3.1 and the version.sh tells me it is 4.3.1 as well. Why the different reported versions? i'm converting in small chunks so I can then do a balance to recover the partial blocks, the following command locks the pc up occassionally. Mostly the balance completes happily on next boot but I've seen it cause a reboot once. btrfs fi balance start -dprofiles=raid1,convert=raid5,limit=1 /data After running that several times I'm using the following to clean up the partial RAID5 chunks: btrfs fi balance start -dprofiles=raid5,usage=21 /data I haven't noticed that clean up balance ever crashing. Is this a known bug or should I try and discover the cause better. Dmesg and syslog don't have anything obvious in them. -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia "Dear God, I would like to file a bug report" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS raid 5/6 status
On Thu, Oct 15, 2015 at 3:11 PM, audio muze <audiom...@gmail.com> wrote: > Rebuilds and/or expanding the array should be pretty quick given only > actual data blocks are written on rebuild or expansion as opposed to > traditional raid systems that write out the entire array. While that might be the intended final functionality I don't think balances are anywhere near that optimised currently. Starting with a relatively green format isn't a great option for a file system you intend to use for ever. -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia "Dear God, I would like to file a bug report" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID1 storage server won't boot with one disk missing
I think you have stated that in a very polite and friendly way. I'm pretty sure I'd phrase it less politely :) Following mdadm's example of an easy option to allow degraded mounting, but that shouldn't be the default. Anyone with the expertise to set that option can be expected to implement a way of knowing that the mount is degraded. People tend to be looking at BTRFS for a guarantee that data doesn't die when hardware does. Defaults that defeat that shouldn't be used. On Fri, Sep 18, 2015 at 11:36 AM, Duncan <1i5t5.dun...@cox.net> wrote: > Anand Jain posted on Thu, 17 Sep 2015 23:18:36 +0800 as excerpted: > >>> What I expected to happen: >>> I expected that the [btrfs raid1 data/metadata] system would either >>> start as if nothing were wrong, or would warn me that one half of the >>> mirror was missing and ask if I really wanted to start the system with >>> the root array in a degraded state. >> >> as of now it would/should start normally only when there is an entry >> -o degraded >> >> it looks like -o degraded is going to be a very obvious feature, >> I have plans of making it a default feature, and provide -o nodegraded >> feature instead. Thanks for comments if any. > > As Chris Murphy, I have my doubts about this, and think it's likely to > cause as many unhappy users as it prevents. > > I'd definitely put -o nodegraded in my default options here, so it's not > about me, but about all those others that would end up running a silently > degraded system and have no idea until it's too late, as further devices > have failed or the one single other available copy of something important > (remember, still raid1 without N-mirrors option, unfortunately, so if a > device drops out, that's now data/metadata with only a single valid copy > regardless of the number of devices, and if it goes invalid...) fails > checksum for whatever reason. > > And since it only /allows/ degraded, not forcing it, if admins or distros > want it as the default, -o degraded can be added now. Nothing's stopping > them except lack of knowledge of the option, the *same* lack of knowledge > that would potentially cause so much harm if the default were switched. > > Put it this way. With the current default, if it fails and people have > to ask about the unexpected failure here, no harm to existing data done, > just add -o degraded and get on with things. If -o degraded were made > the default, failure mode would be *MUCH* worse, potential loss of the > entire filesystem due to silent and thus uncorrected device loss and > degraded mounting. > > So despite the inconvenience of less knowledgeable people losing the > availability of the filesystem until they can read the wiki or ask about > it here, I don't believe changing the default to -o degraded is wise, at > all. > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia "Dear God, I would like to file a bug report" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID0 wrong (raw) device?
On Thu, Aug 13, 2015 at 9:44 PM, Austin S Hemmelgarn ahferro...@gmail.com wrote: 3. See the warnings about doing block level copies and LVM snapshots of BTRFS volumes, the same applies to using it on DRBD currently as well (with the possible exception of remote DRBD nodes (ie, ones without a local copy of the backing store) (in this case, we need to blacklist backing devices for stacked storage (I think the same issue may be present with BTRFS on a MD based RAID1 set). I've been using BTRFS on top of DRBD for several years now, what specifically am I meant to avoid? I have 6 drives mirrored across a local network, this is done with DRBD. At any one time only a single server has the 6 drives mounted with btrfs. Is this a ticking time bomb? -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID0 wrong (raw) device?
I would have been surprised if any generic file system copes well with being mounted in several locations at once, DRBD appears to fight really hard to avoid that happening :) And yeah I'm doing the second thing, I've successfully switched which of the servers is active a few times with no ill effect (I would expect scrub to give me some significant warnings if one of the disks was a couple of months out of date) so I'm presuming that DRBD copes reasonably well or I've been very lucky. Either that luck is very deterministic, DRBD copes correctly, or I've been very very lucky. Very very lucky doesn't sound likely. On Fri, Aug 14, 2015 at 8:54 AM, Hugo Mills h...@carfax.org.uk wrote: On Fri, Aug 14, 2015 at 08:32:46AM +1000, Gareth Pye wrote: On Thu, Aug 13, 2015 at 9:44 PM, Austin S Hemmelgarn ahferro...@gmail.com wrote: 3. See the warnings about doing block level copies and LVM snapshots of BTRFS volumes, the same applies to using it on DRBD currently as well (with the possible exception of remote DRBD nodes (ie, ones without a local copy of the backing store) (in this case, we need to blacklist backing devices for stacked storage (I think the same issue may be present with BTRFS on a MD based RAID1 set). I've been using BTRFS on top of DRBD for several years now, what specifically am I meant to avoid? I have 6 drives mirrored across a local network, this is done with DRBD. At any one time only a single server has the 6 drives mounted with btrfs. Is this a ticking time bomb? There are two things which are potentially worrisome here: - Having the same filesystem mounted on more than one machine at a time (which you're not doing). - Having one or more of the DRBD backing store devices present on the same machine that the DRBD filesystem is mounted on (which you may be doing). Of these, the first is definitely going to be dangerous. The second may or may not be, depending on how well DRBD copes with direct writes to its backing store, and how lucky you are about the kernel identifying the right devices to use for the FS. Hugo. -- Hugo Mills | Big data doesn't just mean increasing the font hugo@... carfax.org.uk | size. http://carfax.org.uk/ | PGP: E2AB1DE4 | -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid 1 to 10 conversion
btrfs has a small bug at the moment where balance can't convert raid levels (it just does nothing), it is meant to be fixed with the next kernel release. On Wed, Jun 10, 2015 at 3:28 PM, Guilherme Gonçalves agamen...@gmail.com wrote: Hello!, i think i made a mistake i had two 3tb drivre on a raid 1 setup, i bought two aditional 3tb drives to make my raid 10 array i used this commands btrfs -f device add /dev/sdc /mnt/nas/(i used -f because i formatted my new drives using gpt) btrfs -f device add /dev/sdf /mnt/nas/ finally: btrfs balance start -dconvert=raid10 -mconvert=raid10 /mnt/nas/ after a couple of hours i ran: btrfs filesystem df /mnt/nas/ Data, RAID1: total=963.00GiB, used=962.69GiB System, RAID1: total=32.00MiB, used=176.00KiB Metadata, RAID1: total=6.00GiB, used=4.59GiB GlobalReserve, single: total=512.00MiB, used=0.00B should that not read raid 10 ? output for btrfs fi usage /mnt/nas Overall: Device size: 10.92TiB Device allocated: 1.89TiB Device unallocated: 9.02TiB Device missing: 0.00B Used: 1.89TiB Free (estimated): 4.51TiB (min: 4.51TiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,RAID1: Size:963.00GiB, Used:962.69GiB /dev/sdc 481.00GiB /dev/sdd1 482.00GiB /dev/sde1 482.00GiB /dev/sdf 481.00GiB Metadata,RAID1: Size:6.00GiB, Used:4.59GiB /dev/sdc 4.00GiB /dev/sdd1 2.00GiB /dev/sde1 2.00GiB /dev/sdf 4.00GiB System,RAID1: Size:32.00MiB, Used:176.00KiB /dev/sdd1 32.00MiB /dev/sde1 32.00MiB Unallocated: /dev/sdc 2.25TiB /dev/sdd1 2.26TiB /dev/sde1 2.26TiB /dev/sdf 2.25TiB I think i made a mess here... why is system only on two drives? why is it not showing raid 10? If i actually failed how do i acheive this? i want all four drives in a raid 10 setup. Thanks in advance -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gareth Pye Level 2 MTG Judge, Melbourne, Australia Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Did convert do anything?
After a full balance that is likely to change to 4.41TiB used of 4.41TiB total. Is that going to help anything, Peter is saying it's a known bug that convert can't do anything currently. On Tue, May 26, 2015 at 2:36 AM, Anthony Plack t...@plack.net wrote: On May 24, 2015, at 11:00 PM, Gareth Pye gar...@cerberos.id.au wrote: 2. 18% or 1.1T spare currently. That isn't what I'd call tiny free space. -- # btrfs fi df /data Data, RAID1: total=4.43TiB, used=4.41TiB Of 4.43TiB, btrfs believes you have used 4.41TiB. Chris’s fix to this has to deal with the ENOSPC error No space left on device”, not because you don’ t have free space but because your extents are marked as used and you need a balance operation to clean it up. https://patchwork.kernel.org/patch/6238111/ Before the reverted commit, this test case failed with ENOSPC because all chunks on the freshly converted filesystem were allocated, although many were empty. The reverted commit removed an allocation attempt in btrfs_set_block_group_ro(), but that fix wasn't right. After the reverted commit, the balance succeeds, but the data/metadata profiles aren't actually updated: -- Gareth Pye Level 2 MTG Judge, Melbourne, Australia Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Did convert do anything?
Just attempted to change the meta data of my 6 drive array to RAID6 with: # btrfs balance start -mconvert=raid6 /data Done, had to relocate 12 out of 4548 chunks Which looks pretty good. But: # btrfs fi df /data Data, RAID1: total=4.43TiB, used=4.41TiB System, RAID1: total=32.00MiB, used=772.00KiB Metadata, RAID1: total=11.00GiB, used=9.13GiB GlobalReserve, single: total=512.00MiB, used=0.00B And some context: # btrfs --version btrfs-progs v4.0 # uname -a Linux emile 4.0.1-040001-generic #201504290935 SMP Wed Apr 29 09:36:55 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux What am I missing here? -- Gareth Pye Level 2 MTG Judge, Melbourne, Australia Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Did convert do anything?
1. # btrfs fi sh /data Label: none uuid: b2986e1a-0891-4779-960c-e01f7534c6eb Total devices 6 FS bytes used 4.41TiB devid1 size 1.81TiB used 1.48TiB path /dev/drbd0 devid2 size 1.81TiB used 1.48TiB path /dev/drbd1 devid3 size 1.81TiB used 1.48TiB path /dev/drbd2 devid4 size 1.81TiB used 1.48TiB path /dev/drbd3 devid5 size 1.81TiB used 1.48TiB path /dev/drbd4 devid6 size 1.81TiB used 1.48TiB path /dev/drbd5 2. 18% or 1.1T spare currently. That isn't what I'd call tiny free space. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Did convert do anything?
I guess it might be relevant that this array was originally created as raid5 back in the early days of raid5 and converted to raid1 over a year ago. On Mon, May 25, 2015 at 2:00 PM, Gareth Pye gar...@cerberos.id.au wrote: 1. # btrfs fi sh /data Label: none uuid: b2986e1a-0891-4779-960c-e01f7534c6eb Total devices 6 FS bytes used 4.41TiB devid1 size 1.81TiB used 1.48TiB path /dev/drbd0 devid2 size 1.81TiB used 1.48TiB path /dev/drbd1 devid3 size 1.81TiB used 1.48TiB path /dev/drbd2 devid4 size 1.81TiB used 1.48TiB path /dev/drbd3 devid5 size 1.81TiB used 1.48TiB path /dev/drbd4 devid6 size 1.81TiB used 1.48TiB path /dev/drbd5 2. 18% or 1.1T spare currently. That isn't what I'd call tiny free space. -- Gareth Pye Level 2 MTG Judge, Melbourne, Australia Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Did convert do anything?
Cool, had heard of that, hadn't actually understood what it meant :) On Mon, May 25, 2015 at 3:04 PM, Peter Marheine pe...@taricorp.net wrote: It's a known regression in 4.0. See [1], I don't think Chris's fix has landed yet. [1] https://patchwork.kernel.org/patch/6238111/ -- Peter Marheine Don't Panic -- Gareth Pye Level 2 MTG Judge, Melbourne, Australia Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS crash after flac tag writing
usbcore snd_hda_intel snd_hda_controller snd_hda_codec kvm_amd kvm snd_hwdep crct10dif_pclmul snd_pcm crc32_pclmul crc32c_intel ppdev ghash_clmulni_intel aesni_intel aes_x86_64 lrw snd_timer e1000 gf128mul glue_helper r8169 ablk_helper snd parport_pc mii cryptd serio_raw parport shpchp pcspkr edac_core sp5100_tco usb_common button fam15h_power wmi i2c_piix4 edac_mce_amd soundcore [372668.322610] k10temp asus_atk0110 acpi_cpufreq processor sch_fq_codel [372668.322652] CPU: 1 PID: 23395 Comm: kworker/u16:3 Tainted: GW 4.0.2-gentoo-2-default #1 [372668.322721] Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, BIOS 200109/11/2014 [372668.322803] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] [372668.322840] a04bfe32 8801e8937c08 816edc91 0007 [372668.322911] 8801e8937c58 8801e8937c48 81056675 8801e8937ce8 [372668.322986] 8803ad83fd28 ffa1 880147d2a800 a04bea60 [372668.323058] Call Trace: [372668.323097] [816edc91] dump_stack+0x45/0x57 [372668.323134] [81056675] warn_slowpath_common+0x95/0xe0 [372668.323171] [81056706] warn_slowpath_fmt+0x46/0x50 [372668.323219] [a0454e65] ? try_merge_map+0x45/0x150 [btrfs] [372668.323261] [a040f46f] __btrfs_abort_transaction+0x5f/0x130 [btrfs] [372668.323339] [a0447b82] btrfs_finish_ordered_io+0x552/0x5e0 [btrfs] [372668.323418] [a0447e85] finish_ordered_fn+0x15/0x20 [btrfs] [372668.323466] [a046f168] normal_work_helper+0xb8/0x2a0 [btrfs] [372668.323515] [a046f4e2] btrfs_endio_write_helper+0x12/0x20 [btrfs] [372668.323552] [8106ed63] process_one_work+0x153/0x410 [372668.323589] [8106f7bb] worker_thread+0x6b/0x480 [372668.323693] [8106f750] ? rescuer_thread+0x300/0x300 [372668.323730] [8107473b] kthread+0xdb/0x100 [372668.323767] [81074660] ? kthread_create_on_node+0x190/0x190 [372668.323805] [816f4218] ret_from_fork+0x58/0x90 [372668.323845] [81074660] ? kthread_create_on_node+0x190/0x190 [372668.323885] ---[ end trace c0894b8b0ebe359e ]--- [372668.323925] BTRFS: error (device sdd) in btrfs_finish_ordered_io:2896: errno=-95 unknown Abort transaction warning itself doesn't really help, but btrfs_finish_order_io and its errno seems interesting. Especially the errno, 95 is EOPNOTSUPP. A quick search leads me to inline extent - regular extent change part. Also you mentioned cue tagging, I'm also curious about the size of your music files. Is there any files which is smaller or equal to 4K? If you only listen to loseless, then my guess would be wrong. :( BTW, it would be perfect if you find some consistent method to trigger the bug... Thanks, Qu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gareth Pye Level 2 MTG Judge, Melbourne, Australia Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs convert running out of space
Yeah, copying them all back on has gone event free, now running some balance passes to clear up the 310G of slack allocation. I did realize that something I'd claimed earlier wasn't true: there were 3 files larger than a gig in the apt mirror snapshots, so large files in snapshots could have been contributing to the issue (If I'd realized that before moving all the large files off I'd have tried moving just those few off) Thank you all for your help. This file systems possibilities do excite me, the future is bright. On Tue, Jan 27, 2015 at 5:20 PM, Duncan 1i5t5.dun...@cox.net wrote: Gareth Pye posted on Tue, 27 Jan 2015 14:24:03 +1100 as excerpted: Have gone with the move stuff off then finish convert plan. Convert has now finished and I'm 60% of the way through moving all the big files back on. Thanks for the help guys. Glad the big-file-move-off seems to have worked for you, and thanks for confirming that moving them off did indeed solve your conversion blockers. Evidently btrfs still has a few rough spots to iron out when it comes to those big files. =:^( Please confirm when all big files are moved back on, too, just to be sure there's nothing unexpected on that side, but based on the conversion-from- ext* reports, I expect it to be smooth sailing. Btrfs really does have a problem with large files existing in large enough extents that it can't really handle them properly, and once they are off the filesystem temporarily, generally even moved off and back on, it does seem to break up the log jam (well, in this case huge-extent-jam =:^) and you're back in business! =:^) -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gareth Pye Level 2 MTG Judge, Melbourne, Australia Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs convert running out of space
Have gone with the move stuff off then finish convert plan. Convert has now finished and I'm 60% of the way through moving all the big files back on. Thanks for the help guys. On Mon, Jan 26, 2015 at 2:23 AM, Marc Joliet mar...@gmx.de wrote: Am Fri, 23 Jan 2015 08:46:23 + (UTC) schrieb Duncan 1i5t5.dun...@cox.net: Marc Joliet posted on Fri, 23 Jan 2015 08:54:41 +0100 as excerpted: Am Fri, 23 Jan 2015 04:34:19 + (UTC) schrieb Duncan 1i5t5.dun...@cox.net: Gareth Pye posted on Fri, 23 Jan 2015 08:58:08 +1100 as excerpted: What are the chances that splitting all the large files up into sub gig pieces, finish convert, then recombine them all will work? [...] Option 2: Since new files should be created using the desired target mode (raid1 IIRC), you may actually be able to move them off and immediately back on, so they appear as new files and thus get created in the desired mode. With current coreutils, wouldn't that also work if he moves the files to another (temporary) subvolume? (And with future coreutils, by copying the files without using reflinks and then removing the originals.) If done correctly, yes. However, off the filesystem is far simpler to explain over email or the like, and is much less ambiguous in terms of OK, but did you do it 'correctly' if it doesn't end up helping. If it doesn't work, it doesn't work. If move to a different subvolume under specific conditions in terms of reflinking and the like doesn't work, there's always the question of whether it /really/ didn't work, or if somehow the instructions weren't clear enough and thus failure was simply the result of a failure to fully meet the technical requirements. Of course if I was doing it myself, and if I was absolutely sure of the technical details in terms of what command I had to use to be /sure/ it didn't simply reflink and thus defeat the whole exercise, I'd likely use the shortcut. But in reality, if it didn't work I'd be second-guessing myself and would probably move everything entirely off and back on to be sure, and knowing that, I'd probably do it the /sure/ way in the first place, avoiding the chance of having to redo it to prove to myself that I'd done it correctly. Of course, having demonstrated to myself that it worked, if I ever had the problem again, I might try the shortcut, just to demonstrate to my own satisfaction the full theory that the effect of the shortcut was the same as the effect of doing it the longer and more fool-proof way. But of course I'd rather not have the opportunity to try that second-half proof. =:^) Make sense? =:^) I was going to argue that my suggestion was hardly difficult to get right, but then I read that cp defaults to --reflink=always and that it is not possible to turn off reflinks (i.e., there is no --reflink=never). So then would have to consider alternatives like dd, and, well, you are right, I suppose :) . (Of course, with the *current* version of coreutils, the simple mv somefile tmp_subvol/; mv tmp_subvol/somefile . will still work.) -- Marc Joliet -- People who think they know everything really annoy those of us who know we don't - Bjarne Stroustrup -- Gareth Pye Level 2 MTG Judge, Melbourne, Australia Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs convert running out of space
What are the chances that splitting all the large files up into sub gig pieces, finish convert, then recombine them all will work? On Wed, Jan 21, 2015 at 3:03 PM, Chris Murphy li...@colorremedies.com wrote: On Tue, Jan 20, 2015 at 4:04 PM, Gareth Pye gar...@cerberos.id.au wrote: Yeah, we don't have that much space spare :( File system has been going strong from when it was created with early RAID5 code, then converted to RAID10 with kernel 3.12. There aren't any nocow files to my knowledge but there are plenty of files larger than a gig on the file system. The first few results from logical-resolve have been for files in the 1G~2G range, so that could be some sticky spaghetti. Are any of those big files in a snapshot? The snapshotting may be pinning a bunch of large extents, so even if it seems like the volume has enough space, it might actually be running out of space. All I can think of is progressively removing the files that are implicated in the conversion failure. That could mean just deleting older snapshots that you probably don't need, progressively getting to the point where you migrate those files off this fs to another one, and then delete them (all instances in all subvol/snapshots) and just keep trying. Is a btrfs check happy? Or does it complain about anything? I've had quite good luck just adding a drive (two drives for raid1/10 volumes) to an existing btrfs volume, they don't have to be drdb, they can be local block devices, either physical drives or LV's. I've even done this with flash drives (kinda scary and slow but it worked). I'd still suggest contingency planning in case this volume becomes temperamental and you have no choice but to migrate it elsewhere. Better to do it on your timetable than the filesystem's. -- Chris Murphy -- Gareth Pye Level 2 MTG Judge, Melbourne, Australia Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs convert running out of space
PS: the only snapshots are of apt-mirror, which doesn't have large files. On Fri, Jan 23, 2015 at 8:58 AM, Gareth Pye gar...@cerberos.id.au wrote: What are the chances that splitting all the large files up into sub gig pieces, finish convert, then recombine them all will work? On Wed, Jan 21, 2015 at 3:03 PM, Chris Murphy li...@colorremedies.com wrote: On Tue, Jan 20, 2015 at 4:04 PM, Gareth Pye gar...@cerberos.id.au wrote: Yeah, we don't have that much space spare :( File system has been going strong from when it was created with early RAID5 code, then converted to RAID10 with kernel 3.12. There aren't any nocow files to my knowledge but there are plenty of files larger than a gig on the file system. The first few results from logical-resolve have been for files in the 1G~2G range, so that could be some sticky spaghetti. Are any of those big files in a snapshot? The snapshotting may be pinning a bunch of large extents, so even if it seems like the volume has enough space, it might actually be running out of space. All I can think of is progressively removing the files that are implicated in the conversion failure. That could mean just deleting older snapshots that you probably don't need, progressively getting to the point where you migrate those files off this fs to another one, and then delete them (all instances in all subvol/snapshots) and just keep trying. Is a btrfs check happy? Or does it complain about anything? I've had quite good luck just adding a drive (two drives for raid1/10 volumes) to an existing btrfs volume, they don't have to be drdb, they can be local block devices, either physical drives or LV's. I've even done this with flash drives (kinda scary and slow but it worked). I'd still suggest contingency planning in case this volume becomes temperamental and you have no choice but to migrate it elsewhere. Better to do it on your timetable than the filesystem's. -- Chris Murphy -- Gareth Pye Level 2 MTG Judge, Melbourne, Australia Dear God, I would like to file a bug report -- Gareth Pye Level 2 MTG Judge, Melbourne, Australia Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs convert running out of space
I just tried from a slightly different tack, after doing another -dusage=2 pass I did the following: # btrfs balance start -v -dconvert=raid1 -dsoft -dusage=96 /data Dumping filters: flags 0x1, state 0x0, force is off DATA (flags 0x302): converting, target=16, soft is on, usage=96 Done, had to relocate 0 out of 3763 chunks # btrfs balance start -v -dconvert=raid1 -dsoft -dusage=99 /data Dumping filters: flags 0x1, state 0x0, force is off DATA (flags 0x302): converting, target=16, soft is on, usage=99 ERROR: error during balancing '/data' - No space left on device There may be more info in syslog - try dmesg | tail # dmesg | tail [1301598.556845] BTRFS info (device drbd5): relocating block group 19366003343360 flags 17 [1301601.300990] BTRFS info (device drbd5): relocating block group 19364929601536 flags 17 [1301606.043675] BTRFS info (device drbd5): relocating block group 19363855859712 flags 17 [1301609.564754] BTRFS info (device drbd5): relocating block group 19362782117888 flags 17 [1301612.453678] BTRFS info (device drbd5): relocating block group 19361708376064 flags 17 [1301616.911777] BTRFS info (device drbd5): relocating block group 19360634634240 flags 17 [1301901.823345] BTRFS info (device drbd5): relocating block group 15298300215296 flags 65 [1301904.206732] BTRFS info (device drbd5): relocating block group 15285415313408 flags 65 [1301904.675298] BTRFS info (device drbd5): relocating block group 14946985312256 flags 65 [1301954.658780] BTRFS info (device drbd5): 3 enospc errors during balance # btrfs balance start -v -dusage=2 /data Dumping filters: flags 0x1, state 0x0, force is off DATA (flags 0x2): balancing, usage=2 Done, had to relocate 9 out of 3772 chunks That looks to me like when converting 3 blocks it wrote 9 blocks with less than 2% usage (Plus presumably 3 mostly full blocks). That sounds like a bug. On Tue, Jan 20, 2015 at 10:45 AM, Gareth Pye gar...@cerberos.id.au wrote: Hi, I'm attempting to convert a btrfs filesystem from raid10 to raid1. Things had been going well through a couple of pauses and resumes, but last night it errored with: ERROR: error during balancing '/data' - No space left on device Which is strange because there is around 1.4T spare on the drives. df: /dev/drbd0 5.5T 4.6T 1.4T 77% /data btrfs fi df: Data, RAID10: total=1.34TiB, used=1.34TiB Data, RAID1: total=3.80TiB, used=3.20TiB System, RAID1: total=32.00MiB, used=720.00KiB Metadata, RAID1: total=13.00GiB, used=9.70GiB GlobalReserve, single: total=512.00MiB, used=204.00KiB btrfs fi show: Label: none uuid: b2986e1a-0891-4779-960c-e01f7534c6eb Total devices 6 FS bytes used 4.55TiB devid1 size 1.81TiB used 1.72TiB path /dev/drbd0 devid2 size 1.81TiB used 1.72TiB path /dev/drbd1 devid3 size 1.81TiB used 1.72TiB path /dev/drbd2 devid4 size 1.81TiB used 1.72TiB path /dev/drbd3 devid5 size 1.81TiB used 1.72TiB path /dev/drbd4 devid6 size 1.81TiB used 1.72TiB path /dev/drbd5 The above numbers are from after a quick bit of testing. When the error occured the RAID1 total number was much larger and the device used totals were 1.81TiB. So I ran a balance with -dusage=2 and all the numbers went back to where I expected them to be. RAID1 total of 3.21TiB and appropriate device usage numbers. With the system looking healthy again I checked my btrfs tools version (3.12) and updated that to the current git (3.18.1, matching my kernel version) and tried the convert to raid1 again (this time with the dsoft option) but that quickly got to the above 600G empty allocation, where I canceled it. # uname -a Linux emile 3.18.1-031801-generic #201412170637 SMP Wed Dec 17 11:38:50 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux # btrfs --version Btrfs v3.12 dmesg doesn't tell me much, this is the end of it: [1295952.558506] BTRFS info (device drbd5): relocating block group 13734193922048 flags 65 [1295971.813271] BTRFS info (device drbd5): relocating block group 13716980498432 flags 65 [1295976.492826] BTRFS info (device drbd5): relocating block group 13713759272960 flags 65 [1295976.921302] BTRFS info (device drbd5): relocating block group 13710538047488 flags 65 [1295977.593500] BTRFS info (device drbd5): relocating block group 13707316822016 flags 65 [1295988.490751] BTRFS info (device drbd5): relocating block group 13704095596544 flags 65 [1295999.193131] BTRFS info (device drbd5): relocating block group 13613800620032 flags 65 [1296003.036323] BTRFS info (device drbd5): relocating block group 13578367139840 flags 65 [1296009.333859] BTRFS info (device drbd5): relocating block group 13539712434176 flags 65 [1296041.246938] BTRFS info (device drbd5): relocating block group 13513942630400 flags 65 [1296056.891600] BTRFS info (device drbd5): relocating block group 13488172826624 flags 65 [1296071.386463] BTRFS info (device drbd5): relocating block group 13472066699264 flags 65
btrfs convert running out of space
[1296163.842627] BTRFS info (device drbd5): 203 enospc errors during balance Before that there is just lots of not particularly different relocating block group messages. Any ideas on what is going on here? -- Gareth Pye Level 2 MTG Judge, Melbourne, Australia Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unfinished convert to raid6, enospc
So I'm reading: * Progs support for parity rebuild. Missing drives upset the progs today, but the kernel does rebuild parity properly. wrong? As that sounds like the programs will bork but it can be mounted and it'll rebuild. On Thu, Jul 18, 2013 at 6:53 AM, Stefan Behrens sbehr...@giantdisaster.de wrote: On 07/17/2013 21:56, Dan van der Ster wrote: Well, I'm trying a balance again with -dconvert=raid6 -dusage=5 this time. Will report back... On Wed, Jul 17, 2013 at 3:34 PM, Dan van der Ster d...@vanderster.com wrote: Hi, Two days ago I decided to throw caution to the wind and convert my raid1 array to raid6, for the space and redundancy benefits. I did # btrfs fi balance start -dconvert=raid6 /media/btrfs Eventually today the balance finished, but the conversion to raid6 was incomplete: # btrfs fi df /media/btrfs Data, RAID1: total=693.00GB, used=690.47GB Data, RAID6: total=6.36TB, used=4.35TB System, RAID1: total=32.00MB, used=1008.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=8.00GB, used=6.04GB A recent btrfs balance status (before finishing) said: # btrfs balance status /media/btrfs Balance on '/media/btrfs' is running 4289 out of about 5208 chunks balanced (4988 considered), 18% left and at the end I have: [164935.053643] btrfs: 693 enospc errors during balance Here is the array: # btrfs fi show /dev/sdb Label: none uuid: 743135d0-d1f5-4695-9f32-e682537749cf Total devices 7 FS bytes used 5.04TB devid2 size 2.73TB used 2.73TB path /dev/sdh devid1 size 2.73TB used 2.73TB path /dev/sdg devid5 size 1.36TB used 1.31TB path /dev/sde devid6 size 1.36TB used 1.31TB path /dev/sdf devid4 size 1.82TB used 1.82TB path /dev/sdd devid3 size 1.82TB used 1.82TB path /dev/sdc devid7 size 1.82TB used 1.82TB path /dev/sdb I'm running latest stable, plus the patch free csums when we're done scrubbing an extent (otherwise I get OOM when scrubbing). # uname -a Linux dvanders-webserver 3.10.1+ #1 SMP Mon Jul 15 17:07:19 CEST 2013 x86_64 x86_64 x86_64 GNU/Linux I still have plenty of free space: # df -h /media/btrfs Filesystem Size Used Avail Use% Mounted on /dev/sdd 14T 5.8T 2.2T 74% /media/btrfs Any idea how I can get out of this? Thanks! You know the limitations of the current Btrfs RAID5/6 implementation, don't you? No protection against power loss or disk failures. No support for scrub. These limits are explained very explicitly in the commit message: http://lwn.net/Articles/536038/ I'd recommend Btrfs RAID1 for the time being. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gareth Pye Level 2 Judge, Melbourne, Australia Australian MTG Forum: mtgau.com gar...@cerberos.id.au - www.rockpaperdynamite.wordpress.com Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Impossible or Possible to Securely Erase File on Btrfs?
Would it make sense for btrfs to support a write zeros to empty space erase? I know it would be slow as it would have to write to all the free space in the file system but it would be useful. It's probably pretty far down the priority list for development though I expect. On Tue, Mar 19, 2013 at 6:18 AM, Chris Mason chris.ma...@fusionio.com wrote: Quoting Kyle (2013-03-18 14:15:17) Hi, After reading through the btrfs documentation I'm curious to know if it's possible to ever securely erase a file from a btrfs filesystem (or ZFS for that matter). On non-COW filesystems atop regular HDDs one can simply overwrite the file with zeros or random data using dd or some other tool and rest assured that the blocks which contained the sensitive information have been wiped. However on btrfs it would seem any such attempt would write the zeros/random data to a new location, leaving the old blocks with the sensitive data intact. Further, since specifying NOCOW is only possible for newly created files, there seems to be no way to overwrite the appropriate blocks short of deleting the associated file and then filling the entire free filesystem space with zeros/random data such that the old blocks are eventually overwritten. What's the verdict on this? We don't do this now for other reasons mentioned in the thread. The best path to get here is to use trim, and to find a device that supports a secure erase trim (I don't know if this even exists, sorry). Outside of that, the only way to do this securely is to delete the files, rotate in new drives, rotate out the old drives, and run a full drive secure delete (not very practical). -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gareth Pye Level 2 Judge, Melbourne, Australia Australian MTG Forum: mtgau.com gar...@cerberos.id.au - www.rockpaperdynamite.wordpress.com Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hybrid Storage proposal
On Thu, Feb 21, 2013 at 3:46 AM, Matias Bjorling m...@itu.dk wrote: Recent development in Linux SSD caches, uses a block IO approach to solve caching. The approach assumes that data is stable on disk and evicts data based on LRU, temparature, etc. This is great for read only IO patterns and in-place writes. However, btrfs uses a copy on write approach, that reduces the benefits of block IO caching. The block caches are unable to track updates (require extensive hints forth and back between the cache layer). Additionally, data and metadata is the same to the block layer. Another great reason for this to be implemented in btrfs is that knowledge of redundancy can be applied. Chunks that are mirrored should be unlikely to need both copies on SSD devices unless they are very highly used (probably true for some of the meta data but not for data). Conversely there is little benefit to putting one stripe of a raid0/5/6 into the SSD device without the rest of that data reaching the same level. Not that additional reasons to do this work in btrfs were needed it does need to be thought about how this implementation interacts with those features. -- Gareth Pye Level 2 Judge, Melbourne, Australia Australian MTG Forum: mtgau.com gar...@cerberos.id.au - www.rockpaperdynamite.wordpress.com Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: experimental raid5/6 code in git
I felt like having a small play with this stuff, as I've been wanting it for so long :) But apparently I've made some incredibly newb error. I used the following two lines to check out the code: git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git raid56-experimental git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git raid56-experimental-progs Then I did not very much to compile both of them (installed lots and lots of packages that various places told me would be needed so they'd both compile) finishing up with a sudo make install for both the kernel and the tools. Rebooting miracuously it came up with the new kernel and uname -a assures me that I have a new kernel running: btrfs@ubuntu:/kernel/raid56-experimental$ uname -a Linux ubuntu 3.6.0+ #1 SMP Tue Feb 5 12:26:03 EST 2013 x86_64 x86_64 x86_64 GNU/Linux but 3.6.0 sounds rather low, but it is newer than Ubuntu 12.10's 3.5 so I believe I am running the kernel I just compiled Where things fail is that I can figure out how to make a raid5 btrfs, I'm certain I'm using the mkfs.btrfs that I just compiled (by explicitly calling it in the make folder) but it wont recognise what I assume the parameter to be: btrfs@ubuntu:/kernel/raid56-experimental-progs$ ./mkfs.btrfs -m raid5 -d raid5 /dev/sd[bcdef] Unknown profile raid5 Which flavour of newb am I today? PS: I use newb in a very friendly way, I feel no shame over that term :) On Tue, Feb 5, 2013 at 1:26 PM, H. Peter Anvin h...@zytor.com wrote: Also, a 2-member raid5 or 3-member raid6 are a raid1 and can be treated as such. Chris Mason chris.ma...@fusionio.com wrote: On Mon, Feb 04, 2013 at 02:42:24PM -0700, H. Peter Anvin wrote: @@ -1389,6 +1392,14 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) } btrfs_dev_replace_unlock(root-fs_info-dev_replace); +if ((all_avail (BTRFS_BLOCK_GROUP_RAID5 | + BTRFS_BLOCK_GROUP_RAID6) num_devices = 3)) { +printk(KERN_ERR btrfs: unable to go below three devices + on raid5 or raid6\n); +ret = -EINVAL; +goto out; +} + if ((all_avail BTRFS_BLOCK_GROUP_RAID10) num_devices = 4) { printk(KERN_ERR btrfs: unable to go below four devices on raid10\n); @@ -1403,6 +1414,21 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) goto out; } +if ((all_avail BTRFS_BLOCK_GROUP_RAID5) +root-fs_info-fs_devices-rw_devices = 2) { +printk(KERN_ERR btrfs: unable to go below two + devices on raid5\n); +ret = -EINVAL; +goto out; +} +if ((all_avail BTRFS_BLOCK_GROUP_RAID6) +root-fs_info-fs_devices-rw_devices = 3) { +printk(KERN_ERR btrfs: unable to go below three + devices on raid6\n); +ret = -EINVAL; +goto out; +} + if (strcmp(device_path, missing) == 0) { struct list_head *devices; struct btrfs_device *tmp; This seems inconsistent? Whoops, missed that one. Thanks! -chris -- Sent from my mobile phone. Please excuse brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gareth Pye Level 2 Judge, Melbourne, Australia Australian MTG Forum: mtgau.com gar...@cerberos.id.au - www.rockpaperdynamite.wordpress.com Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: experimental raid5/6 code in git
Thank you, that makes a lot of sense :) It's been a good day, I've learnt something :) On Tue, Feb 5, 2013 at 4:29 PM, Chester somethingsome2...@gmail.com wrote: The last argument should be the directory you want to clone into. Use '-b branch' to specify the branch you want to clone. I'm pretty sure you've compiled just the master branch of both linux-btrfs and btrfs-progs. On Mon, Feb 4, 2013 at 8:59 PM, Gareth Pye gar...@cerberos.id.au wrote: I felt like having a small play with this stuff, as I've been wanting it for so long :) But apparently I've made some incredibly newb error. I used the following two lines to check out the code: git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git raid56-experimental git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git raid56-experimental-progs Then I did not very much to compile both of them (installed lots and lots of packages that various places told me would be needed so they'd both compile) finishing up with a sudo make install for both the kernel and the tools. Rebooting miracuously it came up with the new kernel and uname -a assures me that I have a new kernel running: btrfs@ubuntu:/kernel/raid56-experimental$ uname -a Linux ubuntu 3.6.0+ #1 SMP Tue Feb 5 12:26:03 EST 2013 x86_64 x86_64 x86_64 GNU/Linux but 3.6.0 sounds rather low, but it is newer than Ubuntu 12.10's 3.5 so I believe I am running the kernel I just compiled Where things fail is that I can figure out how to make a raid5 btrfs, I'm certain I'm using the mkfs.btrfs that I just compiled (by explicitly calling it in the make folder) but it wont recognise what I assume the parameter to be: btrfs@ubuntu:/kernel/raid56-experimental-progs$ ./mkfs.btrfs -m raid5 -d raid5 /dev/sd[bcdef] Unknown profile raid5 Which flavour of newb am I today? PS: I use newb in a very friendly way, I feel no shame over that term :) On Tue, Feb 5, 2013 at 1:26 PM, H. Peter Anvin h...@zytor.com wrote: Also, a 2-member raid5 or 3-member raid6 are a raid1 and can be treated as such. Chris Mason chris.ma...@fusionio.com wrote: On Mon, Feb 04, 2013 at 02:42:24PM -0700, H. Peter Anvin wrote: @@ -1389,6 +1392,14 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) } btrfs_dev_replace_unlock(root-fs_info-dev_replace); +if ((all_avail (BTRFS_BLOCK_GROUP_RAID5 | + BTRFS_BLOCK_GROUP_RAID6) num_devices = 3)) { +printk(KERN_ERR btrfs: unable to go below three devices + on raid5 or raid6\n); +ret = -EINVAL; +goto out; +} + if ((all_avail BTRFS_BLOCK_GROUP_RAID10) num_devices = 4) { printk(KERN_ERR btrfs: unable to go below four devices on raid10\n); @@ -1403,6 +1414,21 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) goto out; } +if ((all_avail BTRFS_BLOCK_GROUP_RAID5) +root-fs_info-fs_devices-rw_devices = 2) { +printk(KERN_ERR btrfs: unable to go below two + devices on raid5\n); +ret = -EINVAL; +goto out; +} +if ((all_avail BTRFS_BLOCK_GROUP_RAID6) +root-fs_info-fs_devices-rw_devices = 3) { +printk(KERN_ERR btrfs: unable to go below three + devices on raid6\n); +ret = -EINVAL; +goto out; +} + if (strcmp(device_path, missing) == 0) { struct list_head *devices; struct btrfs_device *tmp; This seems inconsistent? Whoops, missed that one. Thanks! -chris -- Sent from my mobile phone. Please excuse brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gareth Pye Level 2 Judge, Melbourne, Australia Australian MTG Forum: mtgau.com gar...@cerberos.id.au - www.rockpaperdynamite.wordpress.com Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gareth Pye Level 2 Judge, Melbourne, Australia Australian MTG Forum: mtgau.com gar...@cerberos.id.au - www.rockpaperdynamite.wordpress.com Dear God, I would like to file a bug report On Tue, Feb 5, 2013 at 4:29 PM, Chester somethingsome2...@gmail.com wrote: The last argument should be the directory you want to clone into. Use '-b branch' to specify the branch you want to clone. I'm pretty sure you've compiled just the master branch of both linux-btrfs and btrfs-progs. On Mon, Feb 4, 2013 at 8:59 PM, Gareth Pye gar...@cerberos.id.au wrote: I felt like having a small play with this stuff, as I've been wanting it for so long
Re: RAID-[56]?
More testing usually means more bugs found etc… Yes, but releasing code before it's somewhat polished just generates a mountain of bug reports. Back in 2010 when I set up a server at work I was eagerly awaiting the RAID5 implementation that was just a couple of months away. Don't worry it does appear to be coming, as a couple of months has shortened over 2 years to be end of this week just before Christmas. PS: This email is intended to be helpful, and while slightly snarky I am very grateful for the work that has been done and I eagerly await the RAID56 features. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Upgrading from 2.6.38, how?
Firstly I know what I've been doing has been less than 100% safe, but I've been prepared to live with it. For about 2 years now (you know from around the time btrfs looked like RAID5/6 was just around the corner) I've had a server with a 5 disk RAID10 btrfs array. I realise there has been quite some change to the btrfs implementation since 2.6.38 but I'm hoping that there shouldn't be anything blocking me moving to a much more modern kernel. My proposed upgrade method is: Boot from a live CD with the latest kernel I can find so I can do a few tests: A - run the fsck in read only mode to confirm things look good B - mount read only, confirm that I can read files well C - mount read write, confirm working Install latest OS, upgrade to latest kernel, then repeat above steps. Any likely hiccups with the above procedure and suggested alternatives? -- Gareth Pye Level 2 Judge, Melbourne, Australia Australian MTG Forum: mtgau.com gar...@cerberos.id.au - www.rockpaperdynamite.wordpress.com Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature request: true RAID-1 mode
On Tue, Jun 26, 2012 at 8:37 AM, H. Peter Anvin h...@zytor.com wrote: They do? E.g. mdadm doesn't make them... Hrm, you are right. It is something I always confirm is happening though. Without a M=N mode there would need to be two balances as the first balance would be doing it wrong :( -- Gareth Pye Level 2 Judge, Melbourne, Australia Australian MTG Forum: mtgau.com gar...@cerberos.id.au - www.rockpaperdynamite.wordpress.com Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: COW a file from snapshot
Chris, I recommend reading the previously linked thread. The supplied (and reportedly working) patch was nacked because it violates some principles or another of file systems. (although from my limited understanding it only does it in the same way that btrfs snapshots do in the first place) On Thu, Dec 22, 2011 at 10:35 PM, Chris Samuel ch...@csamuel.org wrote: On Thu, 22 Dec 2011 07:12:13 PM Roman Kapusta wrote: I'm using btrfs for about two years and this is the key feature I'm missing all the time. Why is it not part of mainline btrfs already? Because nobody has written the code to do it yet? I'm sure the developers would welcome patches for this with open arms! cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP -- Gareth Pye Level 2 Judge, Melbourne, Australia Australian MTG Forum: mtgau.com gar...@cerberos.id.au - www.rockpaperdynamite.wordpress.com Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html