RE: how to best segment a big block device in resizeable btrfs filesystems?
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org ow...@vger.kernel.org> On Behalf Of Marc MERLIN > Sent: Tuesday, 3 July 2018 2:16 PM > To: Qu Wenruo > Cc: Su Yue ; linux-btrfs@vger.kernel.org > Subject: Re: how to best segment a big block device in resizeable btrfs > filesystems? > > On Tue, Jul 03, 2018 at 09:37:47AM +0800, Qu Wenruo wrote: > > > If I do this, I would have > > > software raid 5 < dmcrypt < bcache < lvm < btrfs That's a lot of > > > layers, and that's also starting to make me nervous :) > > > > If you could keep the number of snapshots to minimal (less than 10) > > for each btrfs (and the number of send source is less than 5), one big > > btrfs may work in that case. > > Well, we kind of discussed this already. If btrfs falls over if you reach > 100 snapshots or so, and it sure seems to in my case, I won't be much better > off. > Having btrfs check --repair fail because 32GB of RAM is not enough, and it's > unable to use swap, is a big deal in my case. You also confirmed that btrfs > check lowmem does not scale to filesystems like mine, so this translates into > "if regular btrfs check repair can't fit in 32GB, I am completely out of luck > if > anything happens to the filesystem" Just out of curiosity I had a look at my backup filesystem. vm-server /media/backup # btrfs fi us /media/backup/ Overall: Device size: 5.46TiB Device allocated: 3.42TiB Device unallocated:2.04TiB Device missing: 0.00B Used: 1.80TiB Free (estimated): 1.83TiB (min: 1.83TiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,RAID1: Size:1.69TiB, Used:906.26GiB /dev/mapper/a-backup--a 1.69TiB /dev/mapper/b-backup--b 1.69TiB Metadata,RAID1: Size:19.00GiB, Used:16.90GiB /dev/mapper/a-backup--a19.00GiB /dev/mapper/b-backup--b19.00GiB System,RAID1: Size:64.00MiB, Used:336.00KiB /dev/mapper/a-backup--a64.00MiB /dev/mapper/b-backup--b64.00MiB Unallocated: /dev/mapper/a-backup--a 1.02TiB /dev/mapper/b-backup--b 1.02TiB compress=zstd,space_cache=v2 202 snapshots, heavily de-duplicated 551G / 361,000 files in latest snapshot Btrfs check normal mode took 12 mins and 11.5G ram Lowmem mode I stopped after 4 hours, max memory usage was around 3.9G -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: how to best segment a big block device in resizeable btrfs filesystems?
> -Original Message- > From: Marc MERLIN > Sent: Tuesday, 3 July 2018 2:07 PM > To: Paul Jones > Cc: linux-btrfs@vger.kernel.org > Subject: Re: how to best segment a big block device in resizeable btrfs > filesystems? > > On Tue, Jul 03, 2018 at 12:51:30AM +, Paul Jones wrote: > > You could combine bcache and lvm if you are happy to use dm-cache > instead (which lvm uses). > > I use it myself (but without thin provisioning) and it works well. > > Interesting point. So, I used to use lvm and then lvm2 many years ago until I > got tired with its performance, especially as asoon as I took even a single > snapshot. > But that was a long time ago now, just saying that I'm a bit rusty on LVM > itself. > > That being said, if I have > raid5 > dm-cache > dm-crypt > dm-thin > > That's still 4 block layers under btrfs. > Am I any better off using dm-cache instead of bcache, my understanding is > that it only replaces one block layer with another one and one codebase with > another. True, I didn't think of it like that. > Mmmh, a bit of reading shows that dm-cache is now used as lvmcache, which > might change things, or not. > I'll admit that setting up and maintaining bcache is a bit of a pain, I only > used it > at the time because it seemed more ready then, but we're a few years later > now. > > So, what do you recommend nowadays, assuming you've used both? > (given that it's literally going to take days to recreate my array, I'd > rather do it > once and the right way the first time :) ) I don't have any experience with this, but since it's the internet let me tell you how I'd do it anyway raid5 dm-crypt lvm (using thin provisioning + cache) btrfs The cache mode on lvm requires you to set up all your volumes first, then add caching to those volumes last. If you need to modify the volume then you have to remove the cache, make your changes, then re-add the cache. It sounds like a pain, but having the cache separate from the data is quite handy. Given you are running a backup server I don't think the cache would really do much unless you enable writeback mode. If you can split up your filesystem a bit to the point that btrfs check doesn't OOM that will seriously help performance as well. Rsync might be feasible again. Paul. N�r��y���b�X��ǧv�^�){.n�+{�n�߲)���w*jg����ݢj/���z�ޖ��2�ޙ���&�)ߡ�a�����G���h��j:+v���w�٥
RE: how to best segment a big block device in resizeable btrfs filesystems?
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org ow...@vger.kernel.org> On Behalf Of Marc MERLIN > Sent: Tuesday, 3 July 2018 1:19 AM > To: Qu Wenruo > Cc: Su Yue ; linux-btrfs@vger.kernel.org > Subject: Re: how to best segment a big block device in resizeable btrfs > filesystems? > > Hi Qu, > > I'll split this part into a new thread: > > > 2) Don't keep unrelated snapshots in one btrfs. > >I totally understand that maintain different btrfs would hugely add > >maintenance pressure, but as explains, all snapshots share one > >fragile extent tree. > > Yes, I understand that this is what I should do given what you explained. > My main problem is knowing how to segment things so I don't end up with > filesystems that are full while others are almost empty :) > > Am I supposed to put LVM thin volumes underneath so that I can share the > same single 10TB raid5? > > If I do this, I would have > software raid 5 < dmcrypt < bcache < lvm < btrfs That's a lot of layers, and > that's also starting to make me nervous :) You could combine bcache and lvm if you are happy to use dm-cache instead (which lvm uses). I use it myself (but without thin provisioning) and it works well. > > Is there any other way that does not involve me creating smaller block > devices for multiple btrfs filesystems and hope that they are the right size > because I won't be able to change it later? > > Thanks, > Marc > -- > "A mouse is a device used to point at the xterm you want to type in" - A.S.R. > Microsoft is to operating systems > what McDonalds is to gourmet > cooking > Home page: http://marc.merlins.org/ | PGP > 7F55D5F27AAF9D08 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the > body of a message to majord...@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH RFC] btrfs: Do extra device generation check at mount time
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org ow...@vger.kernel.org> On Behalf Of Qu Wenruo > Sent: Thursday, 28 June 2018 5:16 PM > To: Nikolay Borisov ; Qu Wenruo ; > linux-btrfs@vger.kernel.org > Subject: Re: [PATCH RFC] btrfs: Do extra device generation check at mount > time > > > > On 2018年06月28日 15:06, Nikolay Borisov wrote: > > > > > > On 28.06.2018 10:04, Qu Wenruo wrote: > >> There is a reporter considering btrfs raid1 has a major design flaw > >> which can't handle nodatasum files. > >> > >> Despite his incorrect expectation, btrfs indeed doesn't handle device > >> generation mismatch well. > >> > >> This means if one devices missed and re-appeared, even its generation > >> no longer matches with the rest device pool, btrfs does nothing to > >> it, but treat it as normal good device. > >> > >> At least let's detect such generation mismatch and avoid mounting the > >> fs. > >> Currently there is no automatic rebuild yet, which means if users > >> find device generation mismatch error message, they can only mount > >> the fs using "device" and "degraded" mount option (if possible), then > >> replace the offending device to manually "rebuild" the fs. > >> > >> Signed-off-by: Qu Wenruo > > > > I think a testcase of this functionality is important as well. > > It's currently an RFC patch, test case would come along with final version. > > I'd like to make sure everyone, including developers and end-users, are fine > with the restrict error-out behavior. I've been bitten by this before and was most surprised the first time it happened. I had assumed that of course btrfs would check such a thing before mounting. Refusing to mount is a great first step, auto scrub is even better, and only "scrubbing" files with incorrect generation is better yet. Paul.
RE: About more loose parameter sequence requirement
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org ow...@vger.kernel.org> On Behalf Of Hugo Mills > Sent: Monday, 18 June 2018 9:44 PM > To: dste...@suse.cz; Qu Wenruo ; linux- > bt...@vger.kernel.org > Subject: Re: About more loose parameter sequence requirement > > On Mon, Jun 18, 2018 at 01:34:32PM +0200, David Sterba wrote: > > On Thu, Jun 14, 2018 at 03:17:45PM +0800, Qu Wenruo wrote: > > > I understand that btrfs-progs introduced restrict parameter/option > > > order to distinguish global and sub-command parameter/option. > > > > > > However it's really annoying if one just want to append some new > > > options to previous command: > > > > > > E.g. > > > # btrfs check /dev/data/btrfs > > > # !! --check-data-csum > > > > > > The last command will fail as current btrfs-progs doesn't allow any > > > option after parameter. > > > > > > > > > Despite the requirement to distinguish global and subcommand > > > option/parameter, is there any other requirement for such restrict > > > option-first-parameter-last policy? > > > > I'd say that it's a common and recommended pattern. Getopt is able to > > reorder the parameters so mixed options and non-options are accepted, > > unless POSIXLY_CORRECT (see man getopt(3)) is not set. With the more > > strict requirement, 'btrfs' option parser works the same regardless of > > that. > >I got bitten by this the other day. I put an option flag at the end of the > line, > after the mountpoint, and it refused to work. > >I would definitely prefer it if it parsed options in any position. (Or at > least, > any position after the group/command parameters). Same with me - I do it all the time. I type the arguments as I think of them, which is usually back-to-front of what is required. Eg. Btrfs check this mountpoint, oh yeah, and use this specific option = fail. Arrow arrow arrow arrow arrow Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Help with leaf parent key incorrect
> -Original Message- > From: Anand Jain [mailto:anand.j...@oracle.com] > Sent: Monday, 26 February 2018 7:27 PM > To: Paul Jones <p...@pauljones.id.au>; linux-btrfs@vger.kernel.org > Subject: Re: Help with leaf parent key incorrect > > > > > There is one io error in the log below, > > Apparently, that's not a real EIO. We need to fix it. > But can't be the root cause we are looking for here. > > > > Feb 24 22:41:59 home kernel: BTRFS: error (device dm-6) in > btrfs_run_delayed_refs:3076: errno=-5 IO failure > Feb 24 22:41:59 home > kernel: BTRFS info (device dm-6): forced readonly > > static int run_delayed_extent_op(struct btrfs_trans_handle *trans, > struct btrfs_fs_info *fs_info, > struct btrfs_delayed_ref_head *head, > struct btrfs_delayed_extent_op *extent_op) { > :: > > } else { > err = -EIO; > goto out; > } > > > > but other than that I have never had io errors before, or any other > troubles. > > Hm. btrfs dev stat shows real disk IO errors. > As this FS isn't mountable .. pls try >btrfs dev stat > file >search for 'device stats', there will be one for each disk. > Or it reports in the syslog when it happens not necessarily > during dedupe. vm-server ~ # btrfs dev stat /media/storage/ [/dev/mapper/b-storage--b].write_io_errs0 [/dev/mapper/b-storage--b].read_io_errs 0 [/dev/mapper/b-storage--b].flush_io_errs0 [/dev/mapper/b-storage--b].corruption_errs 0 [/dev/mapper/b-storage--b].generation_errs 0 [/dev/mapper/a-storage--a].write_io_errs0 [/dev/mapper/a-storage--a].read_io_errs 0 [/dev/mapper/a-storage--a].flush_io_errs0 [/dev/mapper/a-storage--a].corruption_errs 0 [/dev/mapper/a-storage--a].generation_errs 0 vm-server ~ # btrfs dev stat / [/dev/sdb1].write_io_errs0 [/dev/sdb1].read_io_errs 0 [/dev/sdb1].flush_io_errs0 [/dev/sdb1].corruption_errs 0 [/dev/sdb1].generation_errs 0 [/dev/sda1].write_io_errs0 [/dev/sda1].read_io_errs 0 [/dev/sda1].flush_io_errs0 [/dev/sda1].corruption_errs 0 [/dev/sda1].generation_errs 0 vm-server ~ # btrfs dev stat /dev/mapper/a-backup--a ERROR: '/dev/mapper/a-backup--a' is not a mounted btrfs device I check syslog regularly and I haven't seen any errors on any drives for over a year. > > > One of my other filesystems share the same two discs and it is still fine, > so I > think the hardware is probably ok. > Right. I guess that too. A confirmation will be better. > > I've copied the beginning of the errors below. > > > At my end finding the root cause of 'parent transid verify failed' > during/after dedupe is is kind of fading as disk seems to be had > no issues. which I had in mind. > > Also, there wasn't abrupt power-recycle here? I presume. No, although now that I think about it I just realised it happened right after I upgraded from 4.15.4 to 4.15.5 and I didn't quit bees before rebooting, I let the system do it. Not sure if it's relevant or not. I also just noticed that the kernel has spawned hundreds of kworkers - the highest number I can see is 516. > > It's better to save the output disk1-log and disk2-log as below > before further efforts to recovery. Just in case if something > pops out. > >btrfs in dump-super -fa disk1 > disk1-log >btrfs in dump-tree --degraded disk1 >> disk1-log [1] I applied the patch and started dumping the tree, but I stopped it after about 10 mins and 9GB. Because I use zstd and free space tree the recovery tools wouldn't do anything in RW mode, so I've decided to just blow it away and restore from a backup. I made a block level copy of both discs in case I need anything. Thanks for your help anyway. Regards, Paul.
Help with leaf parent key incorrect
Hi all, I was running dedupe on my filesystem and something went wrong overnight, by the time I noticed the fs was readonly. When trying to check it this is what I get: vm-server ~ # btrfs check /dev/mapper/a-backup--a parent transid verify failed on 2371034071040 wanted 62977 found 62893 parent transid verify failed on 2371034071040 wanted 62977 found 62893 parent transid verify failed on 2371034071040 wanted 62977 found 62893 parent transid verify failed on 2371034071040 wanted 62977 found 62893 Ignoring transid failure leaf parent key incorrect 2371034071040 ERROR: cannot open file system Is there a way to fix this? I'm using kernel 4.15.5 This is the last part of dmesg [ +0.02] BTRFS error (device dm-6): parent transid verify failed on 2374016368640 wanted 63210 found 63208 [ +0.04] BTRFS error (device dm-6): parent transid verify failed on 2374016368640 wanted 63210 found 63208 [ +1.107963] BTRFS error (device dm-6): parent transid verify failed on 2374016368640 wanted 63210 found 63208 [ +0.05] BTRFS error (device dm-6): parent transid verify failed on 2374016368640 wanted 63210 found 63208 [ +1.473598] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +0.04] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +0.001927] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +0.03] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +0.60] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +0.01] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +2.676048] verify_parent_transid: 10362 callbacks suppressed [ +0.02] BTRFS error (device dm-6): parent transid verify failed on 2373991677952 wanted 63210 found 63208 [ +0.03] BTRFS error (device dm-6): parent transid verify failed on 2373991677952 wanted 63210 found 63208 [ +0.078432] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.04] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.43] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.01] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.058638] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.04] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.139174] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.04] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [Feb25 20:48] BTRFS info (device dm-6): using free space tree [ +0.02] BTRFS error (device dm-6): Remounting read-write after error is not allowed [Feb25 20:49] BTRFS error (device dm-6): cleaner transaction attach returned -30 [ +0.238718] BTRFS warning (device dm-6): page private not zero on page 1596642967552 [ +0.03] BTRFS warning (device dm-6): page private not zero on page 1596642971648 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596642975744 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596642979840 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643672064 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643676160 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643680256 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643684352 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643704832 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643708928 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643713024 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643717120 [ +0.28] BTRFS warning (device dm-6): page private not zero on page 2363051098112 [ +0.01] BTRFS warning (device dm-6): page private not zero on page 2363051102208 [ +0.01] BTRFS warning (device dm-6): page private not zero on page 2363051106304 [ +0.01] BTRFS warning (device dm-6): page private not zero on page 2363051110400 [ +0.01] BTRFS warning (device dm-6): page private not zero on page 2368056344576 [ +0.00] BTRFS warning (device dm-6): page private not zero on page 2368056348672 [ +0.01] BTRFS warning (device dm-6): page private not zero on page 2368056352768 [ +0.01] BTRFS warning (device dm-6): page private not zero on page
RE: A Big Thank You, and some Notes on Current Recovery Tools.
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of ein > Sent: Tuesday, 2 January 2018 9:03 PM > To: swest...@gmail.com; Kai Krakow> Cc: linux-btrfs@vger.kernel.org > Subject: Re: A Big Thank You, and some Notes on Current Recovery Tools. > Forgive me if it's not relevant, but I own quite a few disks from that series, > like: > > root@iomega-ordo:~# hdparm -i /dev/sda > /dev/sda: > Model=ST2000DM001-1CH164, FwRev=CC27, SerialNo=Z1E6EV85 Config={ > HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% } > > root@iomega-acm:~# smartctl -d sat -a /dev/sda === START OF > INFORMATION SECTION === > Device Model: ST3000DM001-9YN166 > Serial Number:S1F0PGQJ > LU WWN Device Id: 5 000c50 0516fce00 > Firmware Version: CC4B > > root@iomega-europol:~# smartctl -d sat -a /dev/sda smartctl 5.41 2011-06-09 > r3365 [armv5tel-linux-2.6.31.8] (local build) === START OF INFORMATION > SECTION === > Device Model: ST3000DM001-9YN166 > Serial Number:Z1F1H5KA > LU WWN Device Id: 5 000c50 04ec18fda > > Different locations, different environments, different boards one more > stable (the power) than others. > > I replaced at least three four in the past 3 years. All of them died because > heavy random wirte workload. (rsnapshot, massive cp -al of millions of files > every day). In my case every time bad sectors occurred too, but I didn't > analyze where exactly, it was just a backup destination drive. I pretty > convinced it could be ext2 supers too though. I think the 1-3TB Seagate drives are garbage. Out of 6 drives I replaced all under warranty due to bad sectors, 2 of them were replaced twice! As the replacements failed out of warranty they were replaced with 3-4TB HGST drives and I've had no problems ever since. My workload was just a daily backup store, so they sat there idling about 22 hours a day. I hear the 4+TB Seagate drives are much better quality but I have no experience with them. Paul.
RE: [PATCH RFC] btrfs: self heal from SB fail
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Austin S. Hemmelgarn > Sent: Friday, 8 December 2017 11:51 PM > To: Anand Jain; linux-btrfs@vger.kernel.org > Subject: Re: [PATCH RFC] btrfs: self heal from SB fail > > On 2017-12-08 02:57, Anand Jain wrote: > > -EXPERIMENTAL- > > As of now when primary SB fails we won't self heal and would fail > > mount, this is an experimental patch which thinks why not go and read > > backup copy. > I like the concept, and actually think this should be default behavior on a > filesystem that's already mounted (we fix other errors, why not SB's), but I > don't think it should be default behavior at mount time for the reasons Qu > has outlined (picking up old BTRFS SB's after reformatting is bad). However, > I > do think it's useful to be able to ask for this behavior on mount, so that you > don't need to fight with the programs to get a filesystem to mount when the > first SB is missing (perhaps add a 'usebackupsb' option to mirror > 'usebackuproot'?). I agree with this. The behaviour I'd like to see would be refusal to mount (without additional mount options) but also: print the needed info to the kernel log so the user can add the required mount option or read the wiki for more information, and print some diagnostic info on the primary + secondary super blocks. Paul.
RE: Read before you deploy btrfs + zstd
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Martin Steigerwald > Sent: Tuesday, 14 November 2017 6:35 PM > To: dste...@suse.cz; linux-btrfs@vger.kernel.org > Subject: Re: Read before you deploy btrfs + zstd > > Hello David. > > David Sterba - 13.11.17, 23:50: > > while 4.14 is still fresh, let me address some concerns I've seen on > > linux forums already. > > > > The newly added ZSTD support is a feature that has broader impact than > > just the runtime compression. The btrfs-progs understand filesystem > > with ZSTD since 4.13. The remaining key part is the bootloader. > > > > Up to now, there are no bootloaders supporting ZSTD. This could lead > > to an unmountable filesystem if the critical files under /boot get > > accidentally or intentionally compressed by ZSTD. > > But otherwise ZSTD is safe to use? Are you aware of any other issues? > > I consider switching from LZO to ZSTD on this ThinkPad T520 with > Sandybridge. I've been using it since rc2 and had no trouble at all so far. The filesystem is running faster now (with zstd) than it did uncompressed on 4.13 Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Why do full balance and deduplication reduce available free space?
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Niccolò Belli > Sent: Monday, 2 October 2017 9:29 PM > To: Hans van Kranenburg> Cc: linux-btrfs@vger.kernel.org > Subject: Re: Why do full balance and deduplication reduce available free > space? > > Il 2017-10-02 12:16 Hans van Kranenburg ha scritto: > > On 10/02/2017 12:02 PM, Niccolò Belli wrote: > >> [...] > >> > >> Since I use lots of snapshots [...] I had to create a systemd timer > >> to perform a full balance and deduplication each night. > > > > Can you explain what's your reasoning behind this 'because X it needs > > Y'? I don't follow. > > Available free space is important to me, so I want snapshots to be > deduplicated as well. Since I cannot deduplicate snapshots because they are > read-only, then the data must be already deduplicated before the snapshots > are taken. I do not consider the hourly snapshots because in a day they will > be gone anyway, but daily snapshots will stay there for much longer so I want > them to be deduplicated. I use bees for deduplication and it will quite happily dedupe read-only snapshots. You could always change them to RW while dedupe is running then change back to RO. Paul.
WARNING: CPU: 1 PID: 13825 at fs/btrfs/backref.c:1255 find_parent_nodes+0xb5c/0x1310
Hi, Just ran into this warning while running deduplication. There were 10's of thousands of them over a 24hr period. No other problems were reported. Filesystem is raid1, freshly converted from single. Zstd compression. 4.14.0-rc2 kernel Sep 28 14:57:06 home kernel: [ cut here ] Sep 28 14:57:06 home kernel: WARNING: CPU: 1 PID: 13825 at fs/btrfs/backref.c:1255 find_parent_nodes+0xb5c/0x1310 Sep 28 14:57:06 home kernel: Modules linked in: l2tp_netlink l2tp_core udp_tunnel ip6_udp_tunnel cls_u32 sch_htb sch_sfq nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_sane nf_conntrack_sip ts_kmp nf_conntrack_amanda nf_conntrack_snmp nf_conntrack_h323 nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_tftp nf_conntrack_ftp nf_conntrack_irc xt_NETMAP xt_TCPMSS xt_CHECKSUM ipt_rpfilter xt_DSCP xt_dscp xt_statistic xt_CT xt_AUDIT xt_NFLOG xt_time xt_connlimit xt_realm xt_NFQUEUE xt_tcpmss xt_addrtype xt_pkttype iptable_raw xt_TPROXY nf_defrag_ipv6 xt_CLASSIFY xt_mark xt_hashlimit xt_comment xt_length xt_connmark xt_owner xt_recent xt_iprange xt_physdev xt_policy iptable_mangle xt_nat xt_multiport xt_conntrack ipt_REJECT nf_reject_ipv4 ipt_MASQUERADE nf_nat_masquerade_ipv4 ipt_ECN ipt_CLUSTERIP ipt_ah Sep 28 14:57:06 home kernel: iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_filter ip_tables nfsd auth_rpcgss oid_registry nfs_acl binfmt_misc dm_cache_smq dm_cache dm_persistent_data dm_bufio dm_bio_prison k10temp hwmon_vid intel_powerclamp coretemp pcbc iTCO_wdt iTCO_vendor_support aesni_intel crypto_simd cryptd glue_helper pcspkr i2c_i801 lpc_ich mfd_core xts aes_x86_64 cbc sha512_generic iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ixgb macvlan igb dca i2c_algo_bit e1000 atl1c fuse nfs lockd grace sunrpc dm_mirror dm_region_hash dm_log dm_mod hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_gyration usbhid xhci_plat_hcd ohci_pci ohci_hcd uhci_hcd usb_storage megaraid_sas megaraid_mbox megaraid_mm megaraid mptsas scsi_transport_sas mptspi scsi_transport_spi mptscsih mptbase Sep 28 14:57:06 home kernel: sata_inic162x ata_piix sata_nv sata_sil24 pata_jmicron pata_amd pata_mpiix ahci libahci xhci_pci ehci_pci r8169 xhci_hcd mii ehci_hcd Sep 28 14:57:06 home kernel: CPU: 1 PID: 13825 Comm: crawl Not tainted 4.14.0-rc2 #2 Sep 28 14:57:06 home kernel: Hardware name: System manufacturer System Product Name/P8Z68-V LE, BIOS 4101 05/09/2013 Sep 28 14:57:06 home kernel: task: 8803dde96140 task.stack: c90018f9 Sep 28 14:57:06 home kernel: RIP: 0010:find_parent_nodes+0xb5c/0x1310 Sep 28 14:57:06 home kernel: RSP: 0018:c90018f93b30 EFLAGS: 00010286 Sep 28 14:57:06 home kernel: RAX: RBX: 8803f8453318 RCX: 0001 Sep 28 14:57:06 home kernel: RDX: RSI: 88040b9ca338 RDI: 8802c831bec8 Sep 28 14:57:06 home kernel: RBP: c90018f93c50 R08: 8803bb36d4e0 R09: Sep 28 14:57:06 home kernel: R10: 8802c831bec8 R11: c90018f93bf0 R12: 0001 Sep 28 14:57:06 home kernel: R13: c90018f93c10 R14: 8803fc295ac0 R15: 8802c831bec8 Sep 28 14:57:06 home kernel: FS: 7f5dd2ca5700() GS:88041ec4() knlGS: Sep 28 14:57:06 home kernel: CS: 0010 DS: ES: CR0: 80050033 Sep 28 14:57:06 home kernel: CR2: 7f8a30efc000 CR3: 00034037a005 CR4: 001606a0 Sep 28 14:57:06 home kernel: Call Trace: Sep 28 14:57:06 home kernel: btrfs_find_all_roots_safe+0x91/0x100 Sep 28 14:57:06 home kernel: ? btrfs_find_all_roots_safe+0x91/0x100 Sep 28 14:57:06 home kernel: ? extent_same_check_offsets+0x70/0x70 Sep 28 14:57:06 home kernel: iterate_extent_inodes+0x1d1/0x260 Sep 28 14:57:06 home kernel: iterate_inodes_from_logical+0x7d/0xa0 Sep 28 14:57:06 home kernel: ? iterate_inodes_from_logical+0x7d/0xa0 Sep 28 14:57:06 home kernel: ? extent_same_check_offsets+0x70/0x70 Sep 28 14:57:06 home kernel: btrfs_ioctl+0x8aa/0x23a0 Sep 28 14:57:06 home kernel: ? generic_file_read_iter+0x322/0x7d0 Sep 28 14:57:06 home kernel: ? _copy_to_user+0x26/0x30 Sep 28 14:57:06 home kernel: ? cp_new_stat+0x108/0x120 Sep 28 14:57:06 home kernel: do_vfs_ioctl+0x8d/0x5b0 Sep 28 14:57:06 home kernel: ? do_vfs_ioctl+0x8d/0x5b0 Sep 28 14:57:06 home kernel: ? SyS_newfstat+0x35/0x50 Sep 28 14:57:06 home kernel: SyS_ioctl+0x3c/0x70 Sep 28 14:57:06 home kernel: entry_SYSCALL_64_fastpath+0x13/0x94 Sep 28 14:57:06 home kernel: RIP: 0033:0x7f5dd2f8afd7 Sep 28 14:57:06 home kernel: RSP: 002b:7f5dd2ca25c8 EFLAGS: 0246 ORIG_RAX: 0010 Sep 28 14:57:06 home kernel: RAX: ffda RBX: 7f5dcc38f0e0 RCX: 7f5dd2f8afd7 Sep 28 14:57:06 home kernel: RDX: 7f5dd2ca2698 RSI: c0389424 RDI: 0003 Sep 28 14:57:06 home kernel: RBP: 0030 R08: R09: 7ffe8b102080 Sep 28 14:57:06 home kernel: R10: 7f5dcc3e0870 R11:
RE: [PATCH 1/2] Btrfs: fix kernel oops while reading compressed data
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of David Sterba > Sent: Sunday, 24 September 2017 11:46 PM > To: Liu Bo <bo.li@oracle.com> > Cc: linux-btrfs@vger.kernel.org > Subject: Re: [PATCH 1/2] Btrfs: fix kernel oops while reading compressed > data > > On Wed, Sep 20, 2017 at 05:50:18PM -0600, Liu Bo wrote: > > The kernel oops happens at > > > > kernel BUG at fs/btrfs/extent_io.c:2104! > > ... > > RIP: clean_io_failure+0x263/0x2a0 [btrfs] > > > > It's showing that read-repair code is using an improper mirror index. > > This is due to the fact that compression read's endio hasn't recorded > > the failed mirror index in %cb->orig_bio. > > > > With this, btrfs's read-repair can work properly on reading compressed > > data. > > > > Signed-off-by: Liu Bo <bo.li@oracle.com> > > Reported-by: Paul Jones <p...@pauljones.id.au> > > Reviewed-by: David Sterba <dste...@suse.com> Tested-by: <p...@pauljones.id.au> For both patches. I caused the same thing to happen again, this time by unplugging the wrong hard drive. Applied the patches and problem (BUG_ON) is gone. Should this also go to stable? Seems like a rather glaring problem to me. Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: SSD caching an existing btrfs raid1
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Kai Krakow > Sent: Thursday, 21 September 2017 6:45 AM > To: linux-btrfs@vger.kernel.org > Subject: Re: SSD caching an existing btrfs raid1 > > Am Wed, 20 Sep 2017 17:51:15 +0200 > schrieb Psalle: > > > On 19/09/17 17:47, Austin S. Hemmelgarn wrote: > > (...) > > > > > > A better option if you can afford to remove a single device from > > > that array temporarily is to use bcache. Bcache has one specific > > > advantage in this case, multiple backend devices can share the same > > > cache device. This means you don't have to carve out dedicated cache > > > space for each disk on the SSD and leave some unused space so that > > > you can add new devices if needed. The downside is that you can't > > > convert each device in-place, but because you're using BTRFS, you > > > can still convert the volume as a whole in-place. The procedure for > > > doing so looks like this: > > > > > > 1. Format the SSD as a bcache cache. > > > 2. Use `btrfs device delete` to remove a single hard drive from the > > > array. > > > 3. Set up the drive you just removed as a bcache backing device > > > bound to the cache you created in step 1. > > > 4. Add the new bcache device to the array. > > > 5. Repeat from step 2 until the whole array is converted. > > > > > > A similar procedure can actually be used to do almost any underlying > > > storage conversion (for example, switching to whole disk encryption, > > > or adding LVM underneath BTRFS) provided all your data can fit on > > > one less disk than you have. > > > > Thanks Austin, that's just great. For some reason I had discarded > > bcache thinking that it would force me to rebuild from scratch, but > > this kind of incremental migration is exactly why I hoped was > > possible. I have plenty of space to replace the devices one by one. > > > > I will report back my experience in a few days, I hope. > > I've done it exactly that way in the past and it worked flawlessly (but it > took > 24+ hours). But it was easy for me because I was also adding a third disk to > the pool, so existing stuff could easily move. Device delete takes freaking ages! I would avoid using it if you can. Device replace is much faster. Paul. N�r��yb�X��ǧv�^�){.n�+{�n�߲)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥
RE: SSD caching an existing btrfs raid1
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Pat Sailor > Sent: Wednesday, 20 September 2017 1:31 AM > To: Btrfs BTRFS> Subject: SSD caching an existing btrfs raid1 > > Hello, > > I have a half-filled raid1 on top of six spinning devices. Now I have come > into > a spare SSD I'd like to use for caching, if possible without having to > rebuild or, > failing that, without having to renounce to btrfs and flexible reshaping. > > I've been reading about the several options out there; I thought that > EnhanceIO would be the simplest bet but unfortunately I couldn't get it to > build with my recent kernel (last commits are from years ago). > > Failing that, I read that lvmcache could be the way to go. However, I can't > think of a way of setting it up in which I retain the ability to > add/remove/replace drives as I can do now with pure btrfs; if I opted to drop > btrfs to go to ext4 I still would have to offline the filesystem for > downsizes. > Not a frequent occurrence I hope, but now I'm used to keep working while I > reshape things in btrfs, and it's better if I can avoid large downtimes. > > Is what I want doable at all? Thanks in advance for any > suggestions/experiences to proceed. When I did mine I used a spare disk to create the initial LVM filesystem, then used btrfs dev replace to swap one of the raid1 mirrors to the new lvm device. Then repeat for the other mirror. My suggestion for lvm setup is to use two different pools, one for each btrfs mirror. That ensures you don't accidently have btrfs sharing the one physical disk by mistake, or lvm using the same SSD to cache the two discs. Paul.
RE: kernel BUG at fs/btrfs/extent_io.c:1989
> -Original Message- > From: Liu Bo [mailto:bo.li@oracle.com] > Sent: Tuesday, 19 September 2017 3:10 AM > To: Paul Jones <p...@pauljones.id.au> > Cc: linux-btrfs@vger.kernel.org > Subject: Re: kernel BUG at fs/btrfs/extent_io.c:1989 > > > This 'mirror 0' looks fishy, (as mirror comes from btrfs_io_bio->mirror_num, > which should be at least 1 if raid1 setup is in use.) > > Not sure if 4.13.2-gentoo made any changes on btrfs, but can you please > verify with the upstream kernel, say, v4.13? It's basically a vanilla kernel with a handful of unrelated patches. The filesystem fell apart overnight, there were a few thousand checksum errors and eventually it went read-only. I tried to remount it, but got open_ctree failed. Btrfs check segfaulted, lowmem mode completed with so many errors I gave up and will restore from the backup. I think I know the problem now - the lvm cache was in writeback mode (by accident) so during a defrag there would be gigabytes of unwritten data in memory only, which was all lost when the system crashed (motherboard failure). No wonder the filesystem didn't quite survive. I must say though, I'm seriously impressed at the data integrity of BTRFS - there were near 10,000 checksum errors, 4 which were uncorrectable, and from what I could tell nearly all of the data was still intact according to rsync checksums. Cheers, Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kernel BUG at fs/btrfs/extent_io.c:1989
Hi I have a system that crashed during a defrag, upon reboot I got the following trace while resuming the defrag. Filesystem is BTRFS Raid1 on lvm+cache, kernel 4.13.2 Check --repair gives lots of warnings about parent transid verify failed, but otherwise completes without issue. Ran scrub which seems to have fixed most of the issues without crashing: scrub status for d844164a-239e-4f37-9126-d3b2f3ab72be scrub started at Mon Sep 18 15:59:05 2017 and finished after 02:04:00 total bytes scrubbed: 2.22TiB with 22890 errors error details: verify=1078 csum=21812 corrected errors: 22886, uncorrectable errors: 4, unverified errors: 0 I'll see how it goes when I use rsync to verify from the other backup. Thanks, Paul. [ 52.687705] BTRFS error (device dm-15): parent transid verify failed on 6822688718848 wanted 1044475 found 1044411 [ 52.688346] BTRFS info (device dm-15): read error corrected: ino 0 off 6822688718848 (dev /dev/mapper/lvmB-backup--b sector 2340415488) [ 52.688401] BTRFS info (device dm-15): read error corrected: ino 0 off 6822688722944 (dev /dev/mapper/lvmB-backup--b sector 2340415496) [ 52.688451] BTRFS info (device dm-15): read error corrected: ino 0 off 6822688727040 (dev /dev/mapper/lvmB-backup--b sector 2340415504) [ 52.688501] BTRFS info (device dm-15): read error corrected: ino 0 off 6822688731136 (dev /dev/mapper/lvmB-backup--b sector 2340415512) [ 53.332383] BTRFS error (device dm-15): parent transid verify failed on 6522612940800 wanted 1044486 found 1042732 [ 53.332668] BTRFS info (device dm-15): read error corrected: ino 0 off 6522612940800 (dev /dev/mapper/lvmB-backup--b sector 491844480) [ 53.332732] BTRFS info (device dm-15): read error corrected: ino 0 off 6522612944896 (dev /dev/mapper/lvmB-backup--b sector 491844488) [ 53.332794] BTRFS info (device dm-15): read error corrected: ino 0 off 6522612948992 (dev /dev/mapper/lvmB-backup--b sector 491844496) [ 53.332846] BTRFS info (device dm-15): read error corrected: ino 0 off 6522612953088 (dev /dev/mapper/lvmB-backup--b sector 491844504) [ 53.395581] BTRFS error (device dm-15): parent transid verify failed on 6823548452864 wanted 1044475 found 1044413 [ 53.395979] BTRFS info (device dm-15): read error corrected: ino 0 off 6823548452864 (dev /dev/mapper/lvmB-backup--b sector 2342094656) [ 53.396054] BTRFS info (device dm-15): read error corrected: ino 0 off 6823548456960 (dev /dev/mapper/lvmB-backup--b sector 2342094664) [ 53.527429] BTRFS error (device dm-15): parent transid verify failed on 6823548583936 wanted 1044475 found 1044413 [ 55.516066] br0: port 1(eth0) entered forwarding state [ 55.516068] br0: topology change detected, propagating [ 55.516101] IPv6: ADDRCONF(NETDEV_CHANGE): br0: link becomes ready [ 126.354423] BTRFS error (device dm-15): parent transid verify failed on 6522613661696 wanted 1044486 found 1043710 [ 126.354696] repair_io_failure: 6 callbacks suppressed [ 126.354698] BTRFS info (device dm-15): read error corrected: ino 0 off 6522613661696 (dev /dev/mapper/lvmB-backup--b sector 491845888) [ 126.354765] BTRFS info (device dm-15): read error corrected: ino 0 off 6522613665792 (dev /dev/mapper/lvmB-backup--b sector 491845896) [ 126.354824] BTRFS info (device dm-15): read error corrected: ino 0 off 6522613669888 (dev /dev/mapper/lvmB-backup--b sector 491845904) [ 126.354886] BTRFS info (device dm-15): read error corrected: ino 0 off 6522613673984 (dev /dev/mapper/lvmB-backup--b sector 491845912) [ 126.484340] BTRFS error (device dm-15): parent transid verify failed on 6517401976832 wanted 1044482 found 1044204 [ 126.484890] BTRFS info (device dm-15): read error corrected: ino 0 off 6517401976832 (dev /dev/mapper/lvmB-backup--b sector 798336768) [ 126.484939] BTRFS info (device dm-15): read error corrected: ino 0 off 6517401980928 (dev /dev/mapper/lvmB-backup--b sector 798336776) [ 126.484989] BTRFS info (device dm-15): read error corrected: ino 0 off 6517401985024 (dev /dev/mapper/lvmB-backup--b sector 798336784) [ 126.485040] BTRFS info (device dm-15): read error corrected: ino 0 off 6517401989120 (dev /dev/mapper/lvmB-backup--b sector 798336792) [ 126.667061] BTRFS error (device dm-15): parent transid verify failed on 6523036008448 wanted 1044486 found 1044206 [ 126.667340] BTRFS info (device dm-15): read error corrected: ino 0 off 6523036008448 (dev /dev/mapper/lvm-backup--a sector 375252800) [ 126.667377] BTRFS info (device dm-15): read error corrected: ino 0 off 6523036012544 (dev /dev/mapper/lvm-backup--a sector 375252808) [ 126.828898] BTRFS error (device dm-15): parent transid verify failed on 6522547240960 wanted 1044486 found 1044206 [ 126.829325] BTRFS error (device dm-15): parent transid verify failed on 6522547257344 wanted 1044486 found 1043052 [ 126.831141] BTRFS error (device dm-15): parent transid verify failed on 6522547650560 wanted 1044486 found 1044206 [ 126.846967] BTRFS
RE: BUG: BTRFS and O_DIRECT could lead to wrong checksum and wrong data
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Marat Khalili > Sent: Friday, 15 September 2017 7:50 PM > To: Hugo Mills; Goffredo Baroncelli > ; linux-btrfs > Subject: Re: BUG: BTRFS and O_DIRECT could lead to wrong checksum and > wrong data > > May I state my user's point of view: > > I know one applications that uses O_DIRECT, and it is subtly broken on BTRFS. > I know no applications that use O_DIRECT and are not broken. > (Really more statistics would help here, probably some exist that provably > work.) According to developers making O_DIRECT work on BTRFS is difficult if > not impossible. Isn't it time to disable O_DIRECT like ZFS does AFAIU? Data > safety is certainly more important than performance gain it may or may not > give some applications. I agree - I've had trouble with this before because I didn't understand the consequences of using it. It would be better to hide it behind a mount option or similar IMHO. Paul.
RE: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Qu Wenruo > Sent: Monday, 14 August 2017 4:37 PM > To: Christoph Hellwig; Christoph Anton Mitterer > > Cc: Btrfs BTRFS > Subject: Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut? > > > > On 2017年08月12日 15:42, Christoph Hellwig wrote: > > On Sat, Aug 12, 2017 at 02:10:18AM +0200, Christoph Anton Mitterer wrote: > >> Qu Wenruo wrote: > >>> Although Btrfs can disable data CoW, nodatacow also disables data > >>> checksum, which is another main feature for btrfs. > >> > >> Then decoupling of the two should probably decoupled and support for > >> notdatacow+checksumming be implemented?! > > > > And how are you going to write your data and checksum atomically when > > doing in-place updates? > > Exactly, that's the main reason I can figure out why btrfs disables checksum > for nodatacow. But does it matter if it's not strictly atomic? By turning off COW it implies you accept the risk of an ill-timed failure. Although from my point of view any reason that would require COW to be disabled implies you're using the wrong filesystem anyway. Paul.
RE: Btrfs + compression = slow performance and high cpu usage
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Konstantin V. Gavrilenko > Sent: Tuesday, 1 August 2017 7:58 PM > To: Peter Grandi> Cc: Linux fs Btrfs > Subject: Re: Btrfs + compression = slow performance and high cpu usage > > Peter, I don't think the filefrag is showing the correct fragmentation status > of > the file when the compression is used. > At least the one that is installed by default in Ubuntu 16.04 - e2fsprogs | > 1.42.13-1ubuntu1 > > So for example, fragmentation of compressed file is 320 times more then > uncompressed one. > > root@homenas:/mnt/storage/NEW# filefrag test5g-zeroes > test5g-zeroes: 40903 extents found > > root@homenas:/mnt/storage/NEW# filefrag test5g-data > test5g-data: 129 extents found Compressed extents are about 128kb, uncompressed extents are about 128Mb. (can't remember the exact numbers.) I've had trouble with slow filesystems when using compression. The problem seems to go away when removing compression. Paul.
RE: [PATCH] btrfs: allow defrag compress to override NOCOMPRESS attribute
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Anand Jain > Sent: Saturday, 15 July 2017 2:53 PM > To: David Sterba; linux-btrfs@vger.kernel.org > Subject: Re: [PATCH] btrfs: allow defrag compress to override NOCOMPRESS > attribute > > > On 07/13/2017 09:18 PM, David Sterba wrote: > > Currently, the BTRFS_INODE_NOCOMPRESS will prevent any compression > on > > a given file, except when the mount is force-compress. As users have > > reported on IRC, this will also prevent compression when requested by > > defrag (btrfs fi defrag -c file). > > > > The nocompress flag is set automatically by filesystem when the ratios > > are bad and the user would have to manually drop the bit in order to > > make defrag -c work. This is not good from the usability perspective. > > > > This patch will raise priority for the defrag -c over nocompress, ie. > > any file with NOCOMPRESS bit set will get defragmented. The bit will > > remain untouched. > > > > Alternate option was to also drop the nocompress bit and keep the > > decision logic as is, but I think this is not the right solution. > > > Now the compression set through property will act same as '-o compress- > force'. Before this patch is was like '-o compress'. > I am ok to fix that patch with a new patch. While we are at it, would it be possible to add an option to remove compression? Ie. btrfs fi defrag -c none file Currently this doesn't seem to exist. Thanks, Paul.
RE: 4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Martin Steigerwald > Sent: Sunday, 9 July 2017 5:58 PM > To: Marc MERLIN> Cc: Lu Fengqi ; Btrfs BTRFS bt...@vger.kernel.org>; David Sterba > Subject: Re: 4.11.6 / more corruption / root 15455 has a root item with a more > recent gen (33682) compared to the found root node (0) > > Hello Marc. > > Marc MERLIN - 08.07.17, 21:34: > > Sigh, > > > > This is now the 3rd filesystem I have (on 3 different machines) that > > is getting corruption of some kind (on 4.11.6). > > Anyone else getting corruptions with 4.11? > > I happily switch back to 4.10.17 or even 4.9 if that is the case. I may even > do > so just from your reports. Well, yes, I will do exactly that. I just switch > back > for 4.10 for now. Better be safe, than sorry. No corruption for me - I've been on 4.11 since about .2 and everything seems fine. Currently on 4.11.8 Paul.
RE: Btrfs Compression
> -Original Message- > From: Austin S. Hemmelgarn [mailto:ahferro...@gmail.com] > Sent: Thursday, 6 July 2017 9:52 PM > To: Paul Jones <p...@pauljones.id.au>; linux-btrfs@vger.kernel.org > Subject: Re: Btrfs Compression > > On 2017-07-05 23:19, Paul Jones wrote: > > While reading the thread about adding zstd compression, it occurred to > > me that there is potentially another thing affecting performance - > > Compressed extent size. (correct my terminology if it's incorrect). I > > have two near identical RAID1 filesystems (used for backups) on near > > identical discs (HGST 3T), one compressed and one not. The filesystems > > have about 40 snapshots and are about 50% full. The uncompressed > > filesystem runs at about 60 MB/s, the compressed filesystem about 5-10 > > MB/s. There is noticeably more "noise" from the compressed filesystem > > from all the head thrashing that happens while rsync is happening. > > > > Which brings me to my point - In terms of performance for compression, > > is there some low hanging fruit in adjusting the extent size to be > > more like uncompressed extents so there is not so much seeking > > happening? With spinning discs with large data sets it seems pointless > > making the numerical calculations faster if the discs can't keep up. > > Obviously this is assuming optimisation for speed over compression > > ratio. > > > > Thoughts?That really depends on too much to be certain. In all > > likelihood, your > CPU or memory are your bottleneck, not your storage devices. The data > itself gets compressed in memory, and then sent to the storage device, it's > not streamed directly there from the compression thread, so if the CPU was > compressing data faster than the storage devices could transfer it, you would > (or at least, should) be seeing better performance on the compressed > filesystem than the uncompressed one (because you transfer less data on > the compressed filesystem), assuming the datasets are functionally identical. > > That in turn brings up a few other questions: > * What are the other hardware components involved (namely, CPU, RAM< > and storage controller)? If you're using some dinky little Atom or > Cortex-A7 CPU (or almost anything else 32-bit running at less than 2GHz > peak), then that's probably your bottleneck. Similarly, if you've got a cheap > storage controller than needs a lot of attention from the CPU, then that's > probably your bottleneck (you can check this by seeing how much processing > power is being used when just writing to the uncompressed array (check > how much processing power rsync uses copying between two tmpfs mounts, > then subtract that from the total for copying the same data to the > uncompressed filesystem)). Hardware is AMD Phenom II X6 1055T with 8GB DDR3 on the compressed filesystem, Intel i7-3770K with 8GB DDR3 on the uncompressed. Slight difference, but both are up to the task. > * Which compression algorithm are you using, lzo or zlib? If the answer is > zlib, then what you're seeing is generally expected behavior except on > systems with reasonably high-end CPU's and fast memory, because zlib is > _slow_. Zlib. > * Are you storing the same data on both arrays? If not, then that > immediately makes the comparison suspect (if one array is storing lots of > small files and the other is mostly storing small numbers of large files, > then I > would expect the one with lots of small files to get worse performance, and > compression on that one will just make things worse). > This is even more important when using rsync, because the size of the files > involved has a pretty big impact on it's hashing performance and even data > transfer rate (lots of small files == more time spent in syscalls other than > read() and write()). The dataset is rsync-ed to the primary backup and then to the secondary backup, so contains the same content. > > Additionally, when you're referring to extent size, I assume you mean the > huge number of 128k extents that the FIEMAP ioctl (and at least older > versions of `filefrag`) shows for compressed files? If that's the case, then > it's > important to understand that that's due to an issue with FIEMAP, it doesn't > understand compressed extents in BTRFS correctly, so it shows one extent > per compressed _block_ instead, even if they are internally an extent in > BTRFS. You can verify the actual number of extents by checking how many > runs of continuous 128k 'extents' there are. It was my understanding that compressed extents are significantly smaller in size than uncompressed ones? (like 64k vs 128M? perhaps I'm thinking of something else.) I couldn't find any info about this, but I remember it being mentioned here before. Either way disk io is maxed out so something is different with compression in a way that spinning rust doesn't seem to like. Paul.
Btrfs Compression
While reading the thread about adding zstd compression, it occurred to me that there is potentially another thing affecting performance - Compressed extent size. (correct my terminology if it's incorrect). I have two near identical RAID1 filesystems (used for backups) on near identical discs (HGST 3T), one compressed and one not. The filesystems have about 40 snapshots and are about 50% full. The uncompressed filesystem runs at about 60 MB/s, the compressed filesystem about 5-10 MB/s. There is noticeably more "noise" from the compressed filesystem from all the head thrashing that happens while rsync is happening. Which brings me to my point - In terms of performance for compression, is there some low hanging fruit in adjusting the extent size to be more like uncompressed extents so there is not so much seeking happening? With spinning discs with large data sets it seems pointless making the numerical calculations faster if the discs can't keep up. Obviously this is assuming optimisation for speed over compression ratio. Thoughts? Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: btrfs, journald logs, fragmentation, and fallocate
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Goffredo Baroncelli > Sent: Saturday, 29 April 2017 3:05 AM > To: Chris Murphy> Cc: Btrfs BTRFS > Subject: Re: btrfs, journald logs, fragmentation, and fallocate > > > In the past I faced the same problems; I collected some data here > http://kreijack.blogspot.it/2014/06/btrfs-and-systemd-journal.html. > Unfortunately the journald files are very bad, because first the data is > written (appended), then the index fields are updated. Unfortunately these > indexes are near after the last write . So fragmentation is unavoidable. Perhaps a better idea for COW filesystems is to store the index in a separate file, and/or rewrite the last 1 MB block (or part thereof) of the data file every time data is appended? That way the data file will use 1MB extents and hopefully avoid ridiculous amounts of metadata. Paul.
RE: About free space fragmentation, metadata write amplification and (no)ssd
-Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Hans van Kranenburg Sent: Sunday, 9 April 2017 6:19 AM To: linux-btrfsSubject: About free space fragmentation, metadata write amplification and (no)ssd > So... today a real life story / btrfs use case example from the trenches at > work... Snip!! Great read. I do the same thing for backups on a much smaller scale and it works brilliantly. Two 4T drives in btrfs raid1. I will mention that I recently setup caching using LLVM (1 x 300G ssd for each 4T drive), and it's extraordinary how much of a difference it makes. Especially when running deduplication. If it's feasible perhaps you could try it with a nvme drive. Paul.
RE: BTRFS: space_info 4 has 18446742286429913088 free, is not full
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Paul Jones > Sent: Friday, 7 October 2016 6:48 PM > To: Wang Xiaoguang <wangxg.f...@cn.fujitsu.com>; Stefan Priebe - > Profihost AG <s.pri...@profihost.ag>; linux-btrfs@vger.kernel.org > Subject: RE: BTRFS: space_info 4 has 18446742286429913088 free, is not full > > > > -Original Message- > > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > > ow...@vger.kernel.org] On Behalf Of Wang Xiaoguang > > Sent: Friday, 7 October 2016 6:17 PM > > To: Stefan Priebe - Profihost AG <s.pri...@profihost.ag>; linux- > > bt...@vger.kernel.org > > Subject: Re: BTRFS: space_info 4 has 18446742286429913088 free, is not > > full > > > > Hi, > > > > On 10/07/2016 03:03 PM, Stefan Priebe - Profihost AG wrote: > > > Dear Wang, > > > > > > can't use v4.8.0 as i always get OOMs and total machine crashes. > > > > > > Complete traces with your patch and some more btrfs patches applied > > > (in the hope in fixes the OOM but it did not): > > > http://pastebin.com/raw/6vmRSDm1 > > I didn't see any such OOMs... > > Can you try holger's tree with my patches. > > > > Regards, > > Xiaoguang Wang > > > > > > 4.8.5 has fixed all the OOM problems for me, so try that one. Sorry, just realised I meant 4.7.5! Paul.
RE: BTRFS: space_info 4 has 18446742286429913088 free, is not full
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Wang Xiaoguang > Sent: Friday, 7 October 2016 6:17 PM > To: Stefan Priebe - Profihost AG; linux- > bt...@vger.kernel.org > Subject: Re: BTRFS: space_info 4 has 18446742286429913088 free, is not full > > Hi, > > On 10/07/2016 03:03 PM, Stefan Priebe - Profihost AG wrote: > > Dear Wang, > > > > can't use v4.8.0 as i always get OOMs and total machine crashes. > > > > Complete traces with your patch and some more btrfs patches applied > > (in the hope in fixes the OOM but it did not): > > http://pastebin.com/raw/6vmRSDm1 > I didn't see any such OOMs... > Can you try holger's tree with my patches. > > Regards, > Xiaoguang Wang > > 4.8.5 has fixed all the OOM problems for me, so try that one. Paul.
RE: gazillions of Incorrect local/global backref count
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Christoph Anton Mitterer > Sent: Sunday, 4 September 2016 2:51 PM > To: linux-btrfs@vger.kernel.org > Subject: gazillions of Incorrect local/global backref count > > Hey. > > I just did a btrfs check on my notebooks root fs, with: > $ uname -a > Linux heisenberg 4.7.0-1-amd64 #1 SMP Debian 4.7.2-1 (2016-08-28) > x86_64 GNU/Linux > $ btrfs --version > btrfs-progs v4.7.1 > > > > during: > checking extents > > it found gazillions of these: > Incorrect local backref count on 1107980288 root 257 owner 17807428 > offset 13568135168 found 2 wanted 3 back 0x2d69990 > Incorrect local backref count on 1107980288 root 257 owner 14055042 > offset 13568135168 found 2 wanted 3 back 0x2d69930 > Incorrect global backref count on 1107980288 found 4 wanted 6 > backpointer mismatch on [1107980288 61440] > Incorrect local backref count on 1108049920 root 257 owner 17807428 > offset 13568262144 found 2 wanted 5 back 0x2d69ac0 > Incorrect local backref count on 1108049920 root 257 owner 14055042 > offset 13568262144 found 2 wanted 5 back 0x2d69b20 > Incorrect global backref count on 1108049920 found 4 wanted 10 > backpointer mismatch on [1108049920 77824] > > See stdout/err[0] logfiles from the check. > > > What do they mean? > > And does this now mean that data is corrupted and I should try to > recover that from a backup? > And if so... how to I map the affected addresses above back to files? > > Or can I somehow simply (and foremost cleanly/perfectly) correct these > errors? The errors are wrong. I nearly ruined my filesystem a few days ago by trying to repair similar errors, thankfully all seems ok. Check again with btrfs-progs 4.6.1 and see if the errors go away, mine did. See open bug https://bugzilla.kernel.org/show_bug.cgi?id=155791 for more details. Cheers, Paul.
RE: btrfs-progs 4.7, check reports many "incorrect local backref count" messages
> -Original Message- > From: ch...@colorremedies.com [mailto:ch...@colorremedies.com] On > > On Thu, Sep 1, 2016 at 12:51 AM, Paul Jones <p...@pauljones.id.au> wrote: > >> -Original Message- > >> From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > >> ow...@vger.kernel.org] On Behalf Of Chris Murphy > >> Sent: Thursday, 1 September 2016 7:59 AM > >> To: Btrfs BTRFS <linux-btrfs@vger.kernel.org> > >> Subject: Re: btrfs-progs 4.7, check reports many "incorrect local > >> backref count" messages > >> > >> This is still happening with btrfs-progs 4.7.1 and there is zero > >> information in the long result what to do about the problem, and > >> whether it's sane to try have --repair fix it, let alone what the original > cause of the problem was. > > > > I just potentially damaged a perfectly good filesystem because of this. I > > was > getting hundreds of "Incorrect local backref count" so I decided to try > repair, > which seemed to complete ok. I then rescanned without repair and btrfs > check eventually crashed with an assertion. That's when I figured something > may be wrong. > > Wait, so you did a --repair and then the following-up check crashes? > But does the file system mount and does it still work? The first is bad > enough, > but if it won't mount this is terrible. Correct. I forgot to mention the crash was with btrfs-progs 4.7.1 and kernel 4.7.2. After that I reverted to btrfs-progs 4.6.1 and kernel 4.6.7 The filesystem still mounted, so I tried to run scrub but that got stuck at 0 bytes (with no errors). I reset the system (reboot didn't finish) and run repair a few more times, and even though there were still errors every time I noticed the last few errors appeared to be the same. I mounted and tried scrub again, which worked the second time. I let it run overnight and it didn't find any errors. I'm just about to rsync (with checksums) from the backup to verify the data is still good. I'd suggest btrfs-progs 4.7.x is withdrawn until this can be fixed. Paul.
RE: btrfs-progs 4.7, check reports many "incorrect local backref count" messages
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Chris Murphy > Sent: Thursday, 1 September 2016 7:59 AM > To: Btrfs BTRFS> Subject: Re: btrfs-progs 4.7, check reports many "incorrect local backref > count" messages > > This is still happening with btrfs-progs 4.7.1 and there is zero information > in > the long result what to do about the problem, and whether it's sane to try > have --repair fix it, let alone what the original cause of the problem was. I just potentially damaged a perfectly good filesystem because of this. I was getting hundreds of "Incorrect local backref count" so I decided to try repair, which seemed to complete ok. I then rescanned without repair and btrfs check eventually crashed with an assertion. That's when I figured something may be wrong. Thankfully I have an offsite backup (this was the onsite backup), but what a pain Paul.
RE: btrfs and systemd
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Imran Geriskovan > Sent: Monday, 29 August 2016 9:19 PM > To: Stefan Priebe - Profihost AG> Cc: Qu Wenruo ; linux-btrfs@vger.kernel.org > Subject: Re: btrfs and systemd > > >>> I can't find any fstab setting for systemd to higher this timeout. > >>> There's just the x-systemd.device-timeout but this controls how > >>> long to wait for the device and not for the mount command. > >>> Is there any solution for big btrfs volumes and systemd? > >>> Stefan > > Switch to Runit. > > First time I seriously consider another init on my notebook is when I have a > problem like yours. > > Even when / (root) is mounted just fine, if there is any problem with any > other fstab entry, you'll get into such a situation on systemd. > > Give it a try, appending "init=/usr/bin/runit-init" > to your kernel command line on your bootloader. > You dont need to uninstall any package until getting Runit behave "exactly" > as you like. Why not just create a Systemd unit (or whatever the proper term is) that runs on boot and runs the mount command manually and doesn't wait for it to return? Seems easier than messing with init systems. Paul. N�r��yb�X��ǧv�^�){.n�+{�n�߲)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥
RE: linux 4.7.2 & btrfs & rsync & OOM gone crazy
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of E V > Sent: Saturday, 27 August 2016 5:01 AM > To: linux-btrfs> Subject: linux 4.7.2 & btrfs & rsync & OOM gone crazy > > Just upgraded from 4.6.5 to 4.7.2 for my btrfs backup server with 32GB of > ram. Only thing that run's on it is an rsync of an NFS filesystem to the local > btrfs. Cached mem tends to hang out around 26-30GB, but with 4.7.2 the > OOM is now going crazy and trying to kill whatever it can including my ssh and > rsync process. Anyone seen anything similar? > Trying it again with /proc/sys/vm/swappiness at 0 will see if that makes a > difference. Yes, I had the same issue on a backup VM that only has 1G ram - rsync constantly is killed by OOM. Doesn't happen on the main machine with 24G ram. Paul. N�r��yb�X��ǧv�^�){.n�+{�n�߲)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥
RE: [Not TLS] Re: Reducing impact of periodic btrfs balance
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Graham Cobb > Sent: Thursday, 19 May 2016 8:11 PM > To: linux-btrfs@vger.kernel.org > Subject: Re: [Not TLS] Re: Reducing impact of periodic btrfs balance > > On 19/05/16 05:09, Duncan wrote: > > So to Graham, are these 1.5K snapshots all of the same subvolume, or > > split into snapshots of several subvolumes? If it's all of the same > > subvolume or of only 2-3 subvolumes, you still have some work to do in > > terms of getting down to recommended snapshot levels. Also, if you > > have quotas on and don't specifically need them, try turning them off > > and see if that alone makes it workable. > > I have just under 20 subvolumes but the snapshots are only taken if > something has changed (actually I use btrbk: I am not sure if it takes the > snapshot and then removes it if nothing changed or whether it knows not to > even take it). The most frequently changing subvolumes have just under 400 > snapshots each. I have played with snapshot retention and think it unlikely I > would want to reduce it further. > > I have quotas turned off. At least, I am not using quotas -- how can I double > check it is really turned off? > > I know that very large numbers of snapshots are not recommended, and I > expected the balance to be slow. I was quite prepared for it to take many > days. My full backups take several days and even incrementals take several > hours. What I did not expect, and think is a MUCH more serious problem, is > that the balance prevented use of the disk, holding up all writes to the disk > for (quite literally) hours each. I have not seen that effect mentioned > anywhere! > > That means that for a large, busy data disk, it is impossible to do a balance > unless the server is taken down to single-user mode for the time the balance > takes (presumably still days). I assume this would also apply to doing a RAID > rebuild (I am not using multiple disks at the moment). > > At the moment I am still using my previous backup strategy, alongside the > snapshots (that is: rsync-based rsnapshots to another disk daily and with > fairly long retentions, and separate daily full/incremental backups using dar > to a nas in another building). I was hoping the btrfs snapshots might replace > the daily rsync snapshots but it doesn't look like that will work out. I do a similar thing - on my main fs I have only minimal snapshots - like less than 10. I rsync (with checksumming off and diff copy on) the fs to the backup fs which is where all the snapshots live. That fs only gets the occasional 20% balance when it runs out of space, and weekly scrubs. Performance doesn't seem to suffer that way. Paul. N�r��yb�X��ǧv�^�){.n�+{�n�߲)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥
RE: Reducing impact of periodic btrfs balance
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Graham Cobb > Sent: Wednesday, 18 May 2016 11:30 PM > To: linux-btrfs@vger.kernel.org > Subject: Reducing impact of periodic btrfs balance > > Hi, > > I have a 6TB btrfs filesystem I created last year (about 60% used). It is my > main data disk for my home server so it gets a lot of usage (particularly > mail). > I do frequent snapshots (using btrbk) so I have a lot of snapshots (about 1500 > now, although it was about double that until I cut back the retention times > recently). > > A while ago I had a "no space" problem (despite fi df, fi show and fi usage > all > agreeing I had over 1TB free). But this email isn't about that. > > As part of fixing that problem, I tried to do a "balance -dusage=20" on the > disk. I was expecting it to have system impact, but it was a major disaster. > The balance didn't just run for a long time, it locked out all activity on > the disk > for hours. A simple "touch" command to create one file took over an hour. > > More seriously, because of that, mail was being lost: all mail delivery timed > out and the timeout error was interpreted as a fatal delivery error causing > mail to be discarded, mailing lists to cancel subscriptions, etc. The balance > never completed, of course. I eventually got it cancelled. > > I have since managed to complete the "balance -dusage=20" by running it > repeatedly with "limit=N" (for small N). I wrote a script to automate that > process, and rerun it every week. If anyone is interested, the script is on > GitHub: https://github.com/GrahamCobb/btrfs-balance-slowly Hi Graham, I've experienced similar problems from time to time. It seems to be fragmentation of the metadata. In my case I have a volume with about 20 million smallish (100k) files scattered through around 20,000 directories, and originally they were created at random. Updating the files at a data rate of around 5 MB/s took 100% disk utilisation on Raid1 SSD. After a few iterations I needed to delete the files and start again, this took 4 days!! I cancelled it a few times and tried defrags and balances, but they didn't help. Needless to say, the filesystem was basically unusable at the time. Long story short, I discovered that populating each directory completely, one at a time, alleviated the speed issue. I then remembered that if you run defrag with the compress option it writes out the files again, which also fixes the problem. (Note that there is no option for no compression) So if you are ok with using compression try a defrag with compression. That massively fixed my problems. Regards, Paul.
RE: btrfs goes readonly + No space left on 4.3
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Omar Sandoval > Sent: Tuesday, 3 May 2016 8:06 AM > To: Stefan Priebe> Cc: linux-btrfs@vger.kernel.org > Subject: Re: btrfs goes readonly + No space left on 4.3 > > On Fri, Apr 29, 2016 at 10:48:15PM +0200, Stefan Priebe wrote: > > just want to drop a note that all those ENOSPC msg are gone with v4.5 > > and space_cache=v2. Any plans to make space_cache=v2 default? > > > > Greets, > > Stefan > > Yup, we want to make space_cache=v2 the default at some point. I'm > running it on my own machines and testing it here at Facebook and haven't > run into any issues yet. Besides stability, I also want to make sure there > aren't any performance regressions versus the old free space cache that we > haven't thought about yet. > > Thanks for trying it out :) I have also been testing it and have had no problems. One question I have about it: I use Grub2 to boot my systems directly from a BTRFS root partition (i.e. no separate /boot), I assume Grub shouldn't need to care about free space tree/cache as it's only reading data? I don't know enough about either to know if it's an issue or not. Thanks, Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: BTRFS as image store for KVM?
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Brendan Heading > Sent: Wednesday, 16 September 2015 9:36 PM > To: Duncan <1i5t5.dun...@cox.net> > Cc: linux-btrfs@vger.kernel.org > Subject: Re: BTRFS as image store for KVM? > > > Btrfs has two possible solutions to work around the problem. The > > first one is the autodefrag mount option, which detects file > > fragmentation during the write and queues up the affected file for a > > defragmenting rewrite by a lower priority worker thread. This works > > best on the small end, because as file size increases, so does time to > > actually write it out, and at some point, depending on the size of the > > file and how busy the database/VM is, writes are (trying to) come in > > faster than the file can be rewritten. Typically, there's no problem > > under a quarter GiB, with people beginning to notice performance > > issues at half to 3/4 GiB, tho on fast disks and not too busy VMs/DBs > > (which may well include your home system, depending on what you use > > the VMs for), you might not see problems until size reaches 2 GiB or > > so. As such, autodefrag tends to be a very good option for firefox > > sqlite database files, for instance, as they tend to be small enough > > not to have issues. But it's not going to work so well for multi-GiB VM > images. > > [unlurking for the first time] > > This problem has been faced by a certain very large storage vendor whom I > won't name, who provide an option similar to the above. Reading between > the lines I think their approach is to try to detect which accesses are read- > sequential, and schedule those blocks for rewriting in sequence. They also > have a feature to run as a background job which can be scheduled to run > during an off peak period where they can reorder entire files that are > significantly out of sequence. I'd expect the algorithm is intelligent ie > there's > no need to rewrite entire large files that are mostly sequential with a few > out-of-order sections. > > Has anyone considered these options for btrfs ? Not being able to run VMs > on it is probably going to be a bit of a killer .. I run VMs on BTRFS using regular consumer grade SSDs and hardware, it works great I think. My hosts are windows server + MS SQL. Not the most ideal workload, but I care about data integrity so I'm willing to sacrifice a bit of speed for it. Checksums have prevented countless corruption issues. Although now that I think about it the spinning rust backup disks are the only ones that have ever had any corruption. I guess SSDs have their own internal checksumming as well. The speed seems quite reasonable, but the server has around 16G ram free which I presume is being used as a cache, which seems to help. Paul.
Cancel device remove?
Hi, Is there some way to cancel a device remove operation? I have discovered that if I reboot that will cancel it, but that's not always possible. What I'm after is something the same as cancelling scrub. I keep running into situations where I want to pause a remove operation for speed reasons. Thanks! Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: trim not working and irreparable errors from btrfsck
-Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- ow...@vger.kernel.org] On Behalf Of Marc Joliet Sent: Friday, 14 August 2015 6:06 PM To: linux-btrfs@vger.kernel.org Subject: Re: trim not working and irreparable errors from btrfsck Am Thu, 13 Aug 2015 17:14:36 -0600 schrieb Chris Murphy li...@colorremedies.com: Right now I think there's no status because a.) no bug report and b.) not enough information. I was mainly asking because apparently there *is* a patch that helps some people affected by this, but nobody ever commented on it. Perhaps there's a reason for that, but I found it curious. (I see now that it was submitted in early January, in the thread [PATCH V2] Btrfs: really fix trim 0 bytes after a device delete.) I can open a bug (I mean, that's part of being a user of btrfs at this stage), I'm just surprised that nobody else has. I have to use that patch on one of my systems. I just assumed it was never merged because it wasn't quite ready yet. It seems to work fine for me though. Paul. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: BTRFS disaster (of my own making). Is this recoverable?
-Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- ow...@vger.kernel.org] On Behalf Of Chris Murphy Sent: Thursday, 6 August 2015 2:54 AM To: Sonic sonicsm...@gmail.com Cc: Btrfs BTRFS linux-btrfs@vger.kernel.org; Hugo Mills h...@carfax.org.uk Subject: Re: BTRFS disaster (of my own making). Is this recoverable? On Wed, Aug 5, 2015 at 6:31 AM, Sonic sonicsm...@gmail.com wrote: On Tue, Aug 4, 2015 at 4:23 PM, Sonic sonicsm...@gmail.com wrote: Seems that if there was someway to edit something in those first overwritten 32MB of disc 2 to say hey, I'm really here, just a bit screwed up maybe some of the recovery tools could actually work. Just want to reiterate this thought. The basic error in most cases with the tools at hand is that Disc 2 is missing so there's little the tools can do. Somewhere in those first 32MB should be something to properly identify the disc as part of the array. Yes but it was probably uniquely only on that disk, because there's no redundancy for metadata or system chunks. Therefore there's no copy on the other disk to use as a model. The btrfs check command has an option to use other superblocks, so you could try that switch and see if it makes a difference but it sounds like it's finding backup superblocks automatically. That's the one thing that is pretty much always duplicated on the same disk; for sure the first superblock is munged and would need repair. But there's still other chunks missing... so I don't think it'll help. Would it be possible to store this type of critical information twice on each disk, at the beginning and end? I thought BTRFS already did that, but I might be thinking of some other filesystem. I've had my share of these types of oops! moments as well. Paul. N�r��yb�X��ǧv�^�){.n�+{�n�߲)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥
RE: trim not working and irreparable errors from btrfsck
-Original Message- From: Lutz Euler [mailto:lutz.eu...@freenet.de] Sent: Sunday, 21 June 2015 12:11 AM To: Christian; Paul Jones; Austin S Hemmelgarn Cc: linux-btrfs@vger.kernel.org Subject: RE: trim not working and irreparable errors from btrfsck Hi Christian, Paul and Austin, Christian wrote: However, fstrim still gives me 0 B (0 bytes) trimmed, so that may be another problem. Is there a way to check if trim works? Paul wrote: I've got the same problem. I've got 2 SSDs with 2 partitions in RAID1, fstrim always works on the 2nd partition but not the first. There are no errors on either filesystem that I know of, but the first one is root so I can't take it offline to run btrfs check. Austin wrote: I'm seeing the same issue here, but with a Crucial brand SSD. Somewhat interestingly, I don't see any issues like this with BTRFS on top of LVM's thin-provisioning volumes, or with any other filesystems, so I think it has something to do with how BTRFS is reporting unused space or how it is submitting the discard requests. Probably you all suffer from the same problem I had a few years ago. It is a bug in how btrfs implements fstrim. To check whether you are a victim of this bug simply run: # btrfs-debug-tree /dev/whatever | grep 'FIRST_CHUNK_TREE CHUNK_ITEM' where /dev/whatever is a device of your filesystem, and interrupt after the first several output lines with C-c. (Officially the filesystem should be unmounted when running btrfs-debug-tree, but that is not necessary as we only read from it and the relevant data doesn't change very often.) You get something like: item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 0) item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 12947816448) item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 14021558272) ... (This output is from an old version of btrfs-progs. I understand newer version are more verbose, but you should nevertheless easily be able to interpret the output). If the first number different from 0 (here, the 12947816448) is larger than the sum of the sizes of the devices the filesystem consists of, bingo. This has been discussed already in the past and there is a patch. Please see for the patch: http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg40618.html and for the background: http://comments.gmane.org/gmane.comp.file-systems.btrfs/15597 Kind regards, Lutz Euler I tried the test and the numbers I was getting seemed reasonable, however I went ahead and applied the patch anyway. Trim now works correctly! Thanks, Paul. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in
RE: trim not working and irreparable errors from btrfsck
-Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- ow...@vger.kernel.org] On Behalf Of Christian Sent: Thursday, 18 June 2015 12:34 AM To: linux-btrfs@vger.kernel.org Subject: Re: trim not working and irreparable errors from btrfsck On 06/17/2015 10:22 AM, Chris Murphy wrote: On Wed, Jun 17, 2015 at 6:56 AM, Christian Dysthe cdys...@gmail.com wrote: Hi, Sorry for asking more about this. I'm not a developer but trying to learn. In my case I get several errors like this one: root 2625 inode 353819 errors 400, nbytes wrong Is it inode 353819 I should focus on and what is the number after root, in this case 2625? I'm going to guess it's tree root 2625, which is the same thing as fs tree, which is the same thing as subvolume. Each subvolume has its own inodes. So on a given Btrfs volume, an inode number can exist more than once, but in separate subvolumes. When you use btrfs inspect inode it will list all files with that inode number, but only the one in subvol ID 2625 is what you care about deleting and replacing. Thanks! Deleting the file for that inode took care of it. No more errors. Restored it from a backup. However, fstrim still gives me 0 B (0 bytes) trimmed, so that may be another problem. Is there a way to check if trim works? I've got the same problem. I've got 2 SSDs with 2 partitions in RAID1, fstrim always works on the 2nd partition but not the first. There are no errors on either filesystem that I know of, but the first one is root so I can't take it offline to run btrfs check. Paul. N�r��yb�X��ǧv�^�){.n�+{�n�߲)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥
RE: kvm bug, guest I/O blk device errors when qcow2 backing file is on Btrfs
-Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- ow...@vger.kernel.org] On Behalf Of Chris Murphy Sent: Wednesday, 25 March 2015 3:10 AM To: Chris Mason; Chris Murphy; Btrfs BTRFS Subject: Re: kvm bug, guest I/O blk device errors when qcow2 backing file is on Btrfs On Mon, Mar 23, 2015 at 3:13 PM, Chris Mason c...@fb.com wrote: The last time we tracked down a similar problem, Josef found it was only on windows guests. Basically he tracked it down to buffers changing while in flight. I'll take a look. Looks like cache=none and directsync share O_DIRECT in common. This patch suggests neither of those cache options should be used (for different reasons). https://github.com/libguestfs/libguestfs/commit/749e947bb0103f19feda0f2 9b6cbbf3cbfa350da I stumbled on this testing GNOME Boxes which is using cache=none, which it probably shouldn't, but nevertheless none and directsync also shouldn't cause problems on Btrfs. I've got a Windows 2012 guest VM running on a linux host and I also have trouble with cache=none. There is one particular inode on the BTRFS filesystem that gets csum errors about every 6-18 hours. I swapped just about everything (hardware) trying to find the problem, but then I remembered I was experimenting with cache options. I changed it back to default and the problem went away. Interrestingly there are no reported errors on the VM. Paul.
Error while balancing
Hi, The below error was obtained while running a balance with a slightly flakey disk (SSD). I had to powercycle the server to get it to reboot, and mounting with skip_balance made it stop ok. I copied the data to another filesystem and recreated the faulty one, but surprisingly there were no errors while copying the files This is about all I could find before the errors started Feb 14 19:18:59 vm-server kernel: BTRFS: checksum error at logical 734897631232 on dev /dev/sdg1, sector 720162808, root 5, inode 4835, offset 1546301440, length 4096, links 1 (path: vm/win-server.raw) Feb 14 19:18:59 vm-server kernel: BTRFS: bdev /dev/sdg1 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0 Feb 14 19:18:59 vm-server kernel: BTRFS: unable to fixup (regular) error at logical 734897631232 on dev /dev/sdg1 While running balance the following happened: (about 30 times per second - I did a grep for one minute of /var/log/message and it was 96 MB!) Feb 14 21:08:04 vm-server kernel: BTRFS info (device sdb1): found 2023 extents Feb 14 21:08:15 vm-server kernel: BTRFS info (device sdb1): found 2023 extents Feb 14 21:08:15 vm-server kernel: BTRFS info (device sdb1): relocating block group 827959803904 flags 17 Feb 14 21:08:24 vm-server kernel: BTRFS info (device sdb1): found 14383 extents Feb 14 21:08:42 vm-server kernel: BTRFS info (device sdb1): found 14383 extents Feb 14 21:08:42 vm-server kernel: BTRFS info (device sdb1): relocating block group 826886062080 flags 17 Feb 14 21:08:53 vm-server kernel: BTRFS info (device sdb1): found 18344 extents Feb 14 21:09:13 vm-server kernel: BTRFS info (device sdb1): found 18344 extents Feb 14 21:09:13 vm-server kernel: BTRFS info (device sdb1): relocating block group 825812320256 flags 20 Feb 14 21:09:13 vm-server kernel: [ cut here ] Feb 14 21:09:13 vm-server kernel: WARNING: CPU: 1 PID: 10863 at fs/btrfs/relocation.c:925 build_backref_tree+0x5ca/0xe82() Feb 14 21:09:13 vm-server kernel: Modules linked in: vhost_net vhost macvtap macvlan xt_TCPMSS xt_REDIRECT nf_nat_redirect cls_u32 sch_htb sch_sfq nf_conntrack_sip nf_conntrack_ftp nf_conntrack_sane ts_kmp nf_conntrack_amanda nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netbios_ns nf_conntrack_snmp nf_conntrack_broadcast nf_conntrack_tftp nf_conntrack_h323 nf_conntrack_irc xt_CHECKSUM ipt_rpfilter xt_DSCP xt_dscp xt_statistic xt_CT xt_AUDIT xt_NFLOG xt_time xt_connlimit xt_realm xt_NFQUEUE xt_tcpmss xt_addrtype xt_pkttype iptable_raw xt_TPROXY nf_defrag_ipv6 xt_CLASSIFY xt_mark xt_hashlimit xt_comment xt_length xt_connmark xt_owner xt_recent xt_iprange xt_physdev xt_policy iptable_mangle xt_nat xt_multiport xt_conntrack ipt_REJECT nf_reject_ipv4 ipt_MASQUERADE nf_nat_masquerade_ipv4 ipt_ECN ipt_CLUSTERIP ipt_ah iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_filter ip_tables nfsd lockd grace sunrpc x86_pkg_temp_thermal coretemp pcspkr microcode r8169 i2c_i801 mii fan thermal battery processor xts gf128mul aes_x86_64 cbc sha512_generic fuse dm_mod hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_gyration sl811_hcd ohci_pci ohci_hcd uhci_hcd usb_storage megaraid_sas megaraid_mbox usbhid megaraid_mm megaraid sata_inic162x ata_piix sata_nv sata_sil24 pata_jmicron pata_amd pata_mpiix ahci libahci ehci_pci ehci_hcd Feb 14 21:09:13 vm-server kernel: CPU: 1 PID: 10863 Comm: btrfs Not tainted 3.19.0-gentoo #1 Feb 14 21:09:13 vm-server kernel: Hardware name: System manufacturer System Product Name/P8Z68-V LE, BIOS 3808 05/10/2012 Feb 14 21:09:13 vm-server kernel: 0009 88058b313918 8154306a 8000 Feb 14 21:09:13 vm-server kernel: 88058b313958 81065bf3 1000 Feb 14 21:09:13 vm-server kernel: 81216991 880602b23800 0001 88060279b800 Feb 14 21:09:13 vm-server kernel: Call Trace: Feb 14 21:09:13 vm-server kernel: [8154306a] dump_stack+0x4f/0x7b Feb 14 21:09:13 vm-server kernel: [81065bf3] warn_slowpath_common+0x97/0xb1 Feb 14 21:09:13 vm-server kernel: [81216991] ? build_backref_tree+0x5ca/0xe82 Feb 14 21:09:13 vm-server kernel: [81065ca1] warn_slowpath_null+0x15/0x17 Feb 14 21:09:13 vm-server kernel: [81216991] build_backref_tree+0x5ca/0xe82 Feb 14 21:09:13 vm-server kernel: [811ce112] ? free_root_pointers+0x56/0x56 Feb 14 21:09:13 vm-server kernel: [81217d3b] relocate_tree_blocks+0x1a1/0x4d9 Feb 14 21:09:13 vm-server kernel: [81289f6a] ? debug_smp_processor_id+0x17/0x19 Feb 14 21:09:13 vm-server kernel: [8121364b] ? tree_insert+0x48/0x4c Feb 14 21:09:13 vm-server kernel: [8121626a] ? add_tree_block+0x13c/0x166 Feb 14 21:09:13 vm-server kernel: [812190df] relocate_block_group+0x29c/0x4de Feb 14 21:09:13 vm-server kernel: [8121947a] btrfs_relocate_block_group+0x159/0x26e Feb 14 21:09:13 vm-server kernel: [811f6c19]
csum error shows wrong device
Hi, I have a failing SSD I need to replace so I added another disk to my array, but didn't get around to removing the faulty one. A few days later I went to remove the faulty one and there were no more errors, I presume because the bad portion of the SSD was now not in use. Because I didn't know what disk to remove (2 are identical) I removed the new one in the hope to trigger the error with the old one so I could note it's serial number. This was successful, but btrfs is claiming the error is on the disk that I removed. I presume this is a bug? This is the command I used to remove the new disks: btrfs dev del /dev/sda1 /dev/sdb1 /media/fast/ I note in dmesg there was no mention of /dev/sda1 'Fast' is the filesystem in question. .. [46474.872519] BTRFS info (device sdb1): relocating block group 382315003904 flags 17 [46477.090985] BTRFS info (device sdb1): found 8 extents [46479.747538] BTRFS info (device sdb1): found 8 extents [46479.921532] BTRFS info (device sdb1): relocating block group 383388745728 flags 17 [46482.140193] BTRFS info (device sdb1): found 8 extents [46484.880514] BTRFS info (device sdb1): found 8 extents [46485.024671] BTRFS info (device sdb1): relocating block group 384462487552 flags 17 [46487.291166] BTRFS info (device sdb1): found 8 extents [46490.073738] BTRFS info (device sdb1): found 8 extents [46490.235649] BTRFS info (device sdb1): relocating block group 385536229376 flags 17 [46492.364366] BTRFS info (device sdb1): found 3903 extents [46495.326886] BTRFS info (device sdb1): found 3902 extents [46495.446426] BTRFS info (device sdb1): disk deleted /dev/sdb1 [59516.406169] BTRFS info (device sdb1): csum failed ino 40605 off 54317748224 csum 3288447127 expected csum 2629848265 [59516.406175] BTRFS info (device sdb1): csum failed ino 40605 off 54412632064 csum 4245055996 expected csum 3450832795 [59516.406359] BTRFS: read error corrected: ino 40605 off 54317748224 (dev /dev/sde1 sector 431254408) [59516.406449] BTRFS: read error corrected: ino 40605 off 54412632064 (dev /dev/sde1 sector 431439728) [59516.406548] BTRFS info (device sdb1): csum failed ino 40605 off 55994003456 csum 42039089 expected csum 2302529568 [59516.406727] BTRFS: read error corrected: ino 40605 off 55994003456 (dev /dev/sdf1 sector 434440280) [59516.408562] BTRFS info (device sdb1): csum failed ino 40605 off 14022791168 csum 3859837505 expected csum 3337046449 [59516.408573] BTRFS info (device sdb1): csum failed ino 40605 off 55445471232 csum 1108896639 expected csum 3859837505 [59516.408683] BTRFS info (device sdb1): csum failed ino 40605 off 56895586304 csum 2469353283 expected csum 761457267 [59516.408691] BTRFS info (device sdb1): csum failed ino 40605 off 61744017408 csum 1385882588 expected csum 2439148438 [59516.408702] BTRFS info (device sdb1): csum failed ino 40605 off 63789928448 csum 3337046449 expected csum 2586270899 [59516.408717] BTRFS info (device sdb1): csum failed ino 40605 off 14022791168 csum 1108896639 expected csum 3337046449 [59516.408785] BTRFS info (device sdb1): csum failed ino 40605 off 55445471232 csum 3867769240 expected csum 3859837505 [59516.408894] BTRFS: read error corrected: ino 40605 off 56895586304 (dev /dev/sde1 sector 434192096) [59516.409129] BTRFS: read error corrected: ino 40605 off 63789928448 (dev /dev/sde1 sector 574273160) [59516.409230] BTRFS: read error corrected: ino 40605 off 61744017408 (dev /dev/sde1 sector 505265528) [59516.417206] BTRFS: read error corrected: ino 40605 off 62792196096 (dev /dev/sdf1 sector 441426544) [59516.417241] BTRFS: read error corrected: ino 40605 off 59971190784 (dev /dev/sdf1 sector 579309536) [59517.423114] BTRFS: read error corrected: ino 40605 off 59917271040 (dev /dev/sde1 sector 505891968) [59518.423088] BTRFS: read error corrected: ino 40605 off 22458654720 (dev /dev/sde1 sector 400486896) Linux vm-server 3.19.0-gentoo #1 SMP PREEMPT Wed Feb 11 16:48:13 AEDT 2015 x86_64 Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz GenuineIntel GNU/Linux Label: 'Root' uuid: 58d27dbd-7c1e-4ef7-8d43-e93df1537b08 Total devices 3 FS bytes used 14.85GiB devid 3 size 40.00GiB used 33.03GiB path /dev/sdf3 devid 4 size 40.00GiB used 33.03GiB path /dev/sde3 devid 6 size 40.00GiB used 0.00B path /dev/sdb2 Label: 'Storage' uuid: 63ec312b-5f7a-4137-a8f0-e877d5a85902 Total devices 2 FS bytes used 1.07TiB devid 1 size 3.64TiB used 1.08TiB path /dev/sdd1 devid 2 size 3.64TiB used 1.08TiB path /dev/sdj1 Label: 'Backup' uuid: 8149e719-022b-4a7a-8465-704e24ba7898 Total devices 4 FS bytes used 1.11TiB devid 1 size 2.73TiB used 1.12TiB path /dev/sdc1 devid 2 size 931.51GiB used 567.00GiB path /dev/sdi1 devid 3 size 931.51GiB used 569.03GiB path /dev/sdh1 devid 4 size 2.73TiB used 7.00GiB path /dev/sdg1 Label: 'Fast' uuid: e15bab07-1fcf-48b1-92d5-a2609f0fe469 Total devices 2 FS bytes used
RE: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode.
-Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- ow...@vger.kernel.org] On Behalf Of Martin Steigerwald Sent: Wednesday, 4 February 2015 8:16 PM To: Qu Wenruo Cc: linux-btrfs@vger.kernel.org Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode. Am Mittwoch, 4. Februar 2015, 15:16:44 schrieb Qu Wenruo: Btrfs's metadata csum is a good mechanism, keeping bit error away from sensitive kernel. But such mechanism will also be too sensitive, like bit error in csum bytes or low all zero bits in nodeptr. It's a trade using error tolerance for stable, and is reasonable for most cases since there is DUP/RAID1/5/6/10 duplication level. But in some case, whatever for development purpose or despair user who can't tolerant all his/her inline data lost, or even crazy QA team hoping btrfs can survive heavy random bits bombing, there are some guys want to get rid of the csum protection and face the crucial raw data no matter what disaster may happen. So, introduce the new '--dangerous' (or destruction/debug if you like) option for btrfsck to reset all csum of tree blocks. I often wondered about this: AFAIK if you get a csum error BTRFS makes this an input/output error. For being able to access the data in place, how about a iwantmycorrupteddataback mount option where BTRFS just logs csum errors but allows one to access the files nonetheless. This could even work together with remount. Maybe it would be good not to allow writing to broken csum blocks, i.e. fail these with input/output error. This way, the csum would not be automatically fixed, *but* one is able to access the broken data, *while* knowing it is broken. I seriously could have used that yesterday - I had a raw VM image with a csum error that wouldn't go away. The VM worked fine (even rebooting) so I figured I would just copy the file to another filesystem and then copy it back. Rsync doesn't play nicely with errors so I used dd if=disk1 of=/elsewhere/disk1 bs=4096 conv=notrunc,noerror but after waiting for 100G to copy twice it no longer booted. The backup was only 8 hours old so no big deal, but if it was a busy day that could have been nasty! (Why I didn't press the backup button before I did the above I don't know...) Paul. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RAID1 migrate to bigger disks
-Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- ow...@vger.kernel.org] On Behalf Of Daniel Pocock Sent: Sunday, 25 January 2015 1:46 AM To: Hugo Mills; linux-btrfs@vger.kernel.org Subject: Re: RAID1 migrate to bigger disks -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 24/01/15 15:36, Hugo Mills wrote: On Sat, Jan 24, 2015 at 03:32:44PM +0100, Daniel Pocock wrote: I've got a RAID1 on two 1TB partitions, /dev/sda3 and /dev/sdb3 I'm adding two new disks, they will have bigger partitions /dev/sdc3 and /dev/sdd3 I'd like the BtrFs to migrate from the old partitions to the new ones as safely and quickly as possible and if it is reasonable to do so, keeping it online throughout the migration. Should I do the following: btrfs device add /dev/sdc3 /dev/sdd3 /mnt/btrfs0 btrfs device delete /dev/sda3 /dev/sdb3 /mnt/btrfs0 or should I do it this way: btrfs device add /dev/sdc3 /mnt/btrfs0 btrfs device delete /dev/sda3 /mnt/btrfs0 btrfs device add /dev/sdd3 /mnt/btrfs0 btrfs device delete /dev/sdb3 /mnt/btrfs0 or is there some other way to go about it? btrfs replace start /dev/sda3 /dev/sdc3 /mountpoint btrfs fi resize 3:max /mountpoint btrfs replace start /dev/sdb3 /dev/sdd3 /mountpoint btrfs fi resize 4:max /mountpoint The 3 and 4 in the resize commands should be the devid of the newly-added device. Thanks for the fast reply In the event of power failure, can I safely shutdown the server during this operation and resume after starting again? I get more than 2 hours runtime from the UPS but I suspect that migrating 1TB will take at least 12 hours. I know that removing a device can be interrupted safely by a reboot - I do it all the time as there is no cancel option for removal. It seems to operate the same way balance does, by moving data around, and when there is nothing left on the disk it is removed and de-associated. Paul. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Extra info
Another way to defrag the file is to move the file to another disk and then move it back. I've had trouble with virtual machine disks before (Windows server raw) and this has fixed the problem. FYI 3.17.2 and beyond seems much better now. No crazy slow downs. Paul. -Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Daniele Testa Sent: Friday, 19 December 2014 4:33 AM To: Hugo Mills; Daniele Testa; linux-btrfs@vger.kernel.org Subject: Re: Extra info I am running latest Debian stable. However, I used backports to update the kernel to 3.16. root@s4 /opt/drives/ssd # uname -a Linux s4.podnix.com 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 (2014-12-08) x86_64 GNU/Linux root@s4 /opt/drives/ssd # btrfs --version Btrfs v3.14.1 It still reports over-use, so I am running a defrag on the file: root@s4 /opt/drives/ssd # btrfs filesystem defragment /opt/drives/ssd/disk_208.img But I see it slowly eats even more disk space durring the defrag. I had about 7GB before. When it went down close to 1GB, I cancelled it as I'm afraid it will corrupt the file if it runs out of space. Do you know how btrfs behaves if it runs out of space durring a defrag? Any other ideas how I can solve it? Regards, Daniele 2014-12-18 23:35 GMT+08:00 Hugo Mills h...@carfax.org.uk: On Thu, Dec 18, 2014 at 11:02:34PM +0800, Daniele Testa wrote: Sorry, did not read the guidelines correctly. Here comes more info: root@s4 /opt/drives/ssd # uname -a Linux s4.podnix.com 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1+deb7u1 x86_64 GNU/Linux This is your problem. I think the difficulty is that writes into the middle of an extent didn't split the extent and allow the overwritten area to be reclaimed, so the whole extent still takes up space. IIRC, josef fixed this about 18 months ago. You should upgrade your kernel to something that isn't written in cueniform (like 3.18, say), and defrag the file in question. I think that should fix the problem. root@s4 /opt/drives/ssd # btrfs --version Btrfs Btrfs v0.19 This is also an antique, and probably needs an upgrade too (although it's less critical than the kernel). Hugo. root@s4 /opt/drives/ssd # btrfs fi show Label: none uuid: 752ed11b-defc-4717-b4c9-a9e08ad64ba6 Total devices 1 FS bytes used 404.74GB devid1 size 410.50GB used 410.50GB path /dev/md3 Regards, Daniele -- Hugo Mills | Python is executable pseudocode; perl is executable hugo@... carfax.org.uk | line-noise. http://carfax.org.uk/ | PGP: 65E74AC0 |Ben Burton -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html N�r��yb�X��ǧv�^�){.n�+{�n�߲)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥
btrfs check - Couldn't open file system
Hi, As the topic says, btrfs check - Couldn't open file system. Check runs fine on all btrfs volumes except one - Backup. There is nothing special about it, it uses the same options as all the other ones (raid1, compress). As you can see in the output below I double check the filesystem is unmounted, and then try and check it (using a few different devices) but it won't work. Checking another volume works fine. There are no error or warning messages in dmesg. Any ideas?? Thanks, Paul. IWT-VM ~ # uname -a Linux IWT-VM 3.17.2-gentoo #1 SMP Mon Nov 3 15:46:50 AEDT 2014 x86_64 Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz GenuineIntel GNU/Linux IWT-VM ~ # mount /dev/sdb2 on / type btrfs (rw,noatime,compress=zlib,ssd,noacl,space_cache) /dev/sdh1 on /media/fast type btrfs (rw,noatime,ssd,noacl,space_cache) /dev/sdg3 on /media/backup type btrfs (rw,noatime,compress=zlib,noacl,space_cache) /dev/sda1 on /media/data type btrfs (rw,noatime,compress=zlib,noacl,space_cache) IWT-VM ~ # btrfs fi sh Label: 'Root' uuid: 61f6ce80-6d05-414f-9f0f-3d540fa82f2e Total devices 2 FS bytes used 6.58GiB devid4 size 60.00GiB used 18.03GiB path /dev/sdb2 devid5 size 59.93GiB used 18.03GiB path /dev/sdi3 Label: 'Fast' uuid: c41dc6db-6f00-4d60-a2f7-acbceb25e4e7 Total devices 3 FS bytes used 396.85GiB devid1 size 471.93GiB used 416.02GiB path /dev/sdh1 devid2 size 412.00GiB used 356.01GiB path /dev/sdi1 devid3 size 163.57GiB used 108.01GiB path /dev/sdb1 Label: 'Backup' uuid: 92162be2-e52f-42fe-a9fd-da4f26c6abd1 Total devices 5 FS bytes used 2.74TiB devid6 size 891.50GiB used 822.00GiB path /dev/sdg3 devid8 size 698.64GiB used 629.00GiB path /dev/sdd1 devid9 size 891.51GiB used 821.03GiB path /dev/sde3 devid 10 size 2.73TiB used 2.66TiB path /dev/sdc1 devid 11 size 931.51GiB used 924.00GiB path /dev/sdf1 Label: 'Data' uuid: 89181f84-bace-43f8-9534-693f99c4d033 Total devices 2 FS bytes used 42.50GiB devid1 size 279.46GiB used 211.03GiB path /dev/sda1 devid2 size 279.46GiB used 211.03GiB path /dev/sdj1 Btrfs v3.17 IWT-VM ~ # umount /media/backup umount: /media/backup: not mounted IWT-VM ~ # btrfs check /dev/sdg3 Couldn't open file system IWT-VM ~ # btrfs check /dev/sdd1 Couldn't open file system IWT-VM ~ # btrfs check /dev/sde3 Couldn't open file system IWT-VM ~ # btrfs check /dev/sda1 Checking filesystem on /dev/sda1 UUID: 89181f84-bace-43f8-9534-693f99c4d033 checking extents checking free space cache checking fs roots checking csums checking root refs found 38060448123 bytes used err is 0 total csum bytes: 44401072 total tree bytes: 110723072 total fs tree bytes: 50872320 total extent tree bytes: 9994240 btree space waste bytes: 12145264 file data blocks allocated: 45618331648 referenced 48120782848 Btrfs v3.17 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: btrfs check - Couldn't open file system
Thanks for the help. I tried btrfs-progs 3.16, same results. IWT-VM ~ # blkid /dev/sdb1: LABEL=Fast UUID=c41dc6db-6f00-4d60-a2f7-acbceb25e4e7 UUID_SUB=0d26e72e-3848-455f-a250-56b442aa3bec TYPE=btrfs PARTUUID=000f11d6-01 /dev/sdb2: LABEL=Root UUID=61f6ce80-6d05-414f-9f0f-3d540fa82f2e UUID_SUB=cb941fed-b0f5-4a3c-8407-919f8de730b2 TYPE=btrfs PARTUUID=000f11d6-02 /dev/sdd1: LABEL=Backup UUID=92162be2-e52f-42fe-a9fd-da4f26c6abd1 UUID_SUB=988edf3f-616a-482d-a29a-dfd0d61185f8 TYPE=btrfs PARTUUID=000963fc-01 /dev/sdc1: LABEL=Backup UUID=92162be2-e52f-42fe-a9fd-da4f26c6abd1 UUID_SUB=3be06b82-6e5a-4f3b-a01c-19a0ae75711e TYPE=btrfs PARTLABEL=Linux filesystem PARTUUID=1128e68b-15ff-4279-b7a2-dfd4410d2c77 /dev/sdf1: LABEL=Backup UUID=92162be2-e52f-42fe-a9fd-da4f26c6abd1 UUID_SUB=c4573cea-cede-4143-a649-2cc3d97549a5 TYPE=btrfs PARTLABEL=Linux filesystem PARTUUID=4faf614b-8748-4464-b828-c26d1f477ca0 /dev/sde3: LABEL=Backup UUID=92162be2-e52f-42fe-a9fd-da4f26c6abd1 UUID_SUB=cfbc161b-8a80-4f8b-85d8-48703a1670ea TYPE=btrfs PARTLABEL=Backup PARTUUID=f8626199-38c0-490e-851d-24230d374ce2 /dev/sdg3: LABEL=Backup UUID=92162be2-e52f-42fe-a9fd-da4f26c6abd1 UUID_SUB=6e395f0f-47d0-4fec-b509-19d6f8ff076d TYPE=btrfs PTTYPE=dos PARTLABEL=Backup PARTUUID=588303d8-5af0-48c6-8d85-7ef778c04783 /dev/sdh1: LABEL=Fast UUID=c41dc6db-6f00-4d60-a2f7-acbceb25e4e7 UUID_SUB=e513f51a-0014-4332-a3ea-a1296b22618d TYPE=btrfs PARTLABEL=Fast PARTUUID=348b71c0-3353-46f9-a777-f63c68fdc934 /dev/sdh2: LABEL=Swap2 UUID=10fd61ac-c911-4d0b-8b2e-b72b845fec28 TYPE=swap PARTLABEL=Swap PARTUUID=a09a4bd5-a098-4c6d-a668-627175cdd574 /dev/sdi1: LABEL=Fast UUID=c41dc6db-6f00-4d60-a2f7-acbceb25e4e7 UUID_SUB=2b8930e5-2fb9-4bb0-a67c-aa12aaea1640 TYPE=btrfs PARTUUID=492b4f1e-71a8-4002-baeb-e17e584bd032 /dev/sdi2: LABEL=Swap1 UUID=8f22117a-9cd7-4a50-8f92-b10ed2e890e3 TYPE=swap PARTLABEL=Swap PARTUUID=a09a4bd5-a098-4c6d-a668-627175cdd574 /dev/sdi3: LABEL=Root UUID=61f6ce80-6d05-414f-9f0f-3d540fa82f2e UUID_SUB=36f3b7ca-70fd-4188-91c6-87bbb4914ffb TYPE=btrfs PARTUUID=cbdc9b38-d8a5-4b2a-9050-bdfc6d9ee1f1 /dev/sr0: UUID=2011-12-21-00-52-16-00 LABEL=DA 5.0 (2) TYPE=iso9660 /dev/sdj1: LABEL=Data UUID=89181f84-bace-43f8-9534-693f99c4d033 UUID_SUB=2c35cfb9-e7c5-4258-8a66-0eabe763e1d9 TYPE=btrfs PARTUUID=c193216e-01 /dev/sda1: LABEL=Data UUID=89181f84-bace-43f8-9534-693f99c4d033 UUID_SUB=c0dcd706-cffc-41ce-a633-3fa501b8ddd3 TYPE=btrfs PARTUUID=ad3ee8a5-01 /dev/sde1: PARTLABEL=Grub PARTUUID=79155d93-6922-462e-be91-fbc35eb16051 /dev/sde2: PARTUUID=0d4f7267-afe9-45af-a833-f4d618aedbf9 /dev/sdg1: PARTLABEL=Grub PARTUUID=79155d93-6922-462e-be91-fbc35eb16051 /dev/sdg2: PARTUUID=d9e177f6-2b38-4e9f-8c5c-96c0f5234f46 -Original Message- From: Anand Jain [mailto:anand.j...@oracle.com] Sent: Tuesday, 4 November 2014 2:12 PM To: Paul Jones; linux-btrfs@vger.kernel.org Subject: Re: btrfs check - Couldn't open file system very strange. I have no clue, yet. also bit concerned if this turns out to be a real issue. hope you could help to narrow down. can you send `blkid` output from your system. And can you go back to 3.16 and check if you have the same issue. thanks, anand IWT-VM ~ # uname -a Linux IWT-VM 3.17.2-gentoo #1 SMP Mon Nov 3 15:46:50 AEDT 2014 x86_64 Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz GenuineIntel GNU/Linux IWT-VM ~ # mount /dev/sdb2 on / type btrfs (rw,noatime,compress=zlib,ssd,noacl,space_cache) /dev/sdh1 on /media/fast type btrfs (rw,noatime,ssd,noacl,space_cache) /dev/sdg3 on /media/backup type btrfs (rw,noatime,compress=zlib,noacl,space_cache) /dev/sda1 on /media/data type btrfs (rw,noatime,compress=zlib,noacl,space_cache) IWT-VM ~ # btrfs fi sh Label: 'Root' uuid: 61f6ce80-6d05-414f-9f0f-3d540fa82f2e Total devices 2 FS bytes used 6.58GiB devid4 size 60.00GiB used 18.03GiB path /dev/sdb2 devid5 size 59.93GiB used 18.03GiB path /dev/sdi3 Label: 'Fast' uuid: c41dc6db-6f00-4d60-a2f7-acbceb25e4e7 Total devices 3 FS bytes used 396.85GiB devid1 size 471.93GiB used 416.02GiB path /dev/sdh1 devid2 size 412.00GiB used 356.01GiB path /dev/sdi1 devid3 size 163.57GiB used 108.01GiB path /dev/sdb1 Label: 'Backup' uuid: 92162be2-e52f-42fe-a9fd-da4f26c6abd1 Total devices 5 FS bytes used 2.74TiB devid6 size 891.50GiB used 822.00GiB path /dev/sdg3 devid8 size 698.64GiB used 629.00GiB path /dev/sdd1 devid9 size 891.51GiB used 821.03GiB path /dev/sde3 devid 10 size 2.73TiB used 2.66TiB path /dev/sdc1 devid 11 size 931.51GiB used 924.00GiB path /dev/sdf1 Label: 'Data' uuid: 89181f84-bace-43f8-9534-693f99c4d033 Total devices 2 FS bytes used 42.50GiB devid1 size 279.46GiB used 211.03GiB path /dev/sda1 devid2 size 279.46GiB used 211.03GiB path /dev/sdj1 Btrfs v3.17 IWT-VM ~ # umount /media/backup umount: /media
RE: btrfs check - Couldn't open file system
IWT-VM ~ # umount /media/backup/ umount: /media/backup/: not mounted IWT-VM ~ # btrfs fi show -d Label: 'Fast' uuid: c41dc6db-6f00-4d60-a2f7-acbceb25e4e7 Total devices 3 FS bytes used 332.62GiB devid1 size 471.93GiB used 423.02GiB path /dev/sdh1 devid2 size 412.00GiB used 363.01GiB path /dev/sdi1 devid3 size 163.57GiB used 114.01GiB path /dev/sdb1 Label: 'Root' uuid: 61f6ce80-6d05-414f-9f0f-3d540fa82f2e Total devices 2 FS bytes used 6.58GiB devid4 size 60.00GiB used 18.03GiB path /dev/sdb2 devid5 size 59.93GiB used 18.03GiB path /dev/sdi3 Label: 'Backup' uuid: 92162be2-e52f-42fe-a9fd-da4f26c6abd1 Total devices 5 FS bytes used 2.74TiB devid6 size 891.50GiB used 822.00GiB path /dev/sdg3 devid8 size 698.64GiB used 629.00GiB path /dev/sdd1 devid9 size 891.51GiB used 821.03GiB path /dev/sde3 devid 10 size 2.73TiB used 2.66TiB path /dev/sdc1 devid 11 size 931.51GiB used 924.00GiB path /dev/sdf1 Label: 'Data' uuid: 89181f84-bace-43f8-9534-693f99c4d033 Total devices 2 FS bytes used 42.50GiB devid1 size 279.46GiB used 211.03GiB path /dev/sda1 devid2 size 279.46GiB used 211.03GiB path /dev/sdj1 Btrfs v3.17 -Original Message- From: Anand Jain [mailto:anand.j...@oracle.com] Sent: Tuesday, 4 November 2014 4:34 PM To: Paul Jones; linux-btrfs@vger.kernel.org Subject: Re: btrfs check - Couldn't open file system can you run, btrfs fi show -d when device is unmounted. some func called by __open_ctree_fd() is failing, we need to figure out which. On 11/04/2014 11:27 AM, Paul Jones wrote: Thanks for the help. I tried btrfs-progs 3.16, same results. IWT-VM ~ # blkid /dev/sdb1: LABEL=Fast UUID=c41dc6db-6f00-4d60-a2f7-acbceb25e4e7 UUID_SUB=0d26e72e-3848-455f-a250-56b442aa3bec TYPE=btrfs PARTUUID=000f11d6-01 /dev/sdb2: LABEL=Root UUID=61f6ce80-6d05-414f-9f0f-3d540fa82f2e UUID_SUB=cb941fed-b0f5-4a3c-8407-919f8de730b2 TYPE=btrfs PARTUUID=000f11d6-02 /dev/sdd1: LABEL=Backup UUID=92162be2-e52f-42fe-a9fd-da4f26c6abd1 UUID_SUB=988edf3f-616a-482d-a29a-dfd0d61185f8 TYPE=btrfs PARTUUID=000963fc-01 /dev/sdc1: LABEL=Backup UUID=92162be2-e52f-42fe-a9fd-da4f26c6abd1 UUID_SUB=3be06b82-6e5a-4f3b-a01c-19a0ae75711e TYPE=btrfs PARTLABEL=Linux filesystem PARTUUID=1128e68b-15ff-4279-b7a2-dfd4410d2c77 /dev/sdf1: LABEL=Backup UUID=92162be2-e52f-42fe-a9fd-da4f26c6abd1 UUID_SUB=c4573cea-cede-4143-a649-2cc3d97549a5 TYPE=btrfs PARTLABEL=Linux filesystem PARTUUID=4faf614b-8748-4464-b828-c26d1f477ca0 /dev/sde3: LABEL=Backup UUID=92162be2-e52f-42fe-a9fd-da4f26c6abd1 UUID_SUB=cfbc161b-8a80-4f8b-85d8-48703a1670ea TYPE=btrfs PARTLABEL=Backup PARTUUID=f8626199-38c0-490e-851d-24230d374ce2 /dev/sdg3: LABEL=Backup UUID=92162be2-e52f-42fe-a9fd-da4f26c6abd1 UUID_SUB=6e395f0f-47d0-4fec-b509-19d6f8ff076d TYPE=btrfs PTTYPE=dos PARTLABEL=Backup PARTUUID=588303d8-5af0-48c6-8d85-7ef778c04783 /dev/sdh1: LABEL=Fast UUID=c41dc6db-6f00-4d60-a2f7-acbceb25e4e7 UUID_SUB=e513f51a-0014-4332-a3ea-a1296b22618d TYPE=btrfs PARTLABEL=Fast PARTUUID=348b71c0-3353-46f9-a777-f63c68fdc934 /dev/sdh2: LABEL=Swap2 UUID=10fd61ac-c911-4d0b-8b2e-b72b845fec28 TYPE=swap PARTLABEL=Swap PARTUUID=a09a4bd5-a098-4c6d-a668-627175cdd574 /dev/sdi1: LABEL=Fast UUID=c41dc6db-6f00-4d60-a2f7-acbceb25e4e7 UUID_SUB=2b8930e5-2fb9-4bb0-a67c-aa12aaea1640 TYPE=btrfs PARTUUID=492b4f1e-71a8-4002-baeb-e17e584bd032 /dev/sdi2: LABEL=Swap1 UUID=8f22117a-9cd7-4a50-8f92-b10ed2e890e3 TYPE=swap PARTLABEL=Swap PARTUUID=a09a4bd5-a098-4c6d-a668-627175cdd574 /dev/sdi3: LABEL=Root UUID=61f6ce80-6d05-414f-9f0f-3d540fa82f2e UUID_SUB=36f3b7ca-70fd-4188-91c6-87bbb4914ffb TYPE=btrfs PARTUUID=cbdc9b38-d8a5-4b2a-9050-bdfc6d9ee1f1 /dev/sr0: UUID=2011-12-21-00-52-16-00 LABEL=DA 5.0 (2) TYPE=iso9660 /dev/sdj1: LABEL=Data UUID=89181f84-bace-43f8-9534-693f99c4d033 UUID_SUB=2c35cfb9-e7c5-4258-8a66-0eabe763e1d9 TYPE=btrfs PARTUUID=c193216e-01 /dev/sda1: LABEL=Data UUID=89181f84-bace-43f8-9534-693f99c4d033 UUID_SUB=c0dcd706-cffc-41ce-a633-3fa501b8ddd3 TYPE=btrfs PARTUUID=ad3ee8a5-01 /dev/sde1: PARTLABEL=Grub PARTUUID=79155d93-6922-462e-be91-fbc35eb16051 /dev/sde2: PARTUUID=0d4f7267-afe9-45af-a833-f4d618aedbf9 /dev/sdg1: PARTLABEL=Grub PARTUUID=79155d93-6922-462e-be91-fbc35eb16051 /dev/sdg2: PARTUUID=d9e177f6-2b38-4e9f-8c5c-96c0f5234f46 -Original Message- From: Anand Jain [mailto:anand.j...@oracle.com] Sent: Tuesday, 4 November 2014 2:12 PM To: Paul Jones; linux-btrfs@vger.kernel.org Subject: Re: btrfs check - Couldn't open file system very strange. I have no clue, yet. also bit concerned if this turns out to be a real issue. hope you could help to narrow down. can you send `blkid` output from your system. And can you go back to 3.16 and check if you have the same issue. thanks, anand IWT-VM ~ # uname -a Linux IWT-VM 3.17.2-gentoo #1
3.17.1 blocked task
Hi All, Just found this stack trace in dmesg while running a scrub on one of my file systems. I haven’t seen this reported yet so I thought I should report it ☺ All filesystems are raid1. vm-server ~ # btrfs fi sh Label: 'Root' uuid: 58d27dbd-7c1e-4ef7-8d43-e93df1537b08 Total devices 2 FS bytes used 21.29GiB devid3 size 40.00GiB used 40.00GiB path /dev/sde3 devid4 size 40.00GiB used 40.00GiB path /dev/sdd3 Label: 'storage' uuid: df3d4a9c-ed6c-4867-8991-a018276f6f3c Total devices 5 FS bytes used 2.24TiB devid6 size 2.69TiB used 2.10TiB path /dev/sda1 devid7 size 901.92GiB used 422.00GiB path /dev/sdf1 devid8 size 892.25GiB used 410.00GiB path /dev/sdg1 devid9 size 892.25GiB used 412.00GiB path /dev/sdh1 devid 10 size 2.73TiB used 2.08TiB path /dev/sdb2 Label: 'Fast' uuid: 9baf63f7-a9d6-456c-8fdd-1a8fdb21958f Total devices 2 FS bytes used 352.54GiB devid2 size 407.12GiB used 407.12GiB path /dev/sde1 devid3 size 407.12GiB used 407.12GiB path /dev/sdd1 Linux vm-server 3.17.1-gentoo-r1 #1 SMP PREEMPT Sat Oct 18 16:53:06 AEDT 2014 x86_64 Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz GenuineIntel GNU/Linux [ 4372.838272] BTRFS: checksum error at logical 14375409094656 on dev /dev/sdb2, sector 1565551496, root 5, inode 3082523, offset 6035542016, length 4096, links 1 (path: shared/backup/Normal/Paul-PC_2014_07_06_13_00_27_323D24.TIB) [ 4372.838277] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 16, gen 0 [ 4374.425936] BTRFS: fixed up error at logical 14375409094656 on dev /dev/sdb2 [ 4374.442320] BTRFS: checksum error at logical 14375409098752 on dev /dev/sdb2, sector 1565551504, root 5, inode 3082523, offset 6035546112, length 4096, links 1 (path: shared/backup/Normal/Paul-PC_2014_07_06_13_00_27_323D24.TIB) [ 4374.442325] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 17, gen 0 [ 4374.475939] BTRFS: fixed up error at logical 14375409098752 on dev /dev/sdb2 [ 4374.476219] BTRFS: checksum error at logical 14375409102848 on dev /dev/sdb2, sector 1565551512, root 5, inode 3082523, offset 6035550208, length 4096, links 1 (path: shared/backup/Normal/Paul-PC_2014_07_06_13_00_27_323D24.TIB) [ 4374.476222] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 18, gen 0 [ 4374.500941] BTRFS: fixed up error at logical 14375409102848 on dev /dev/sdb2 [ 4374.501569] BTRFS: checksum error at logical 14375409106944 on dev /dev/sdb2, sector 1565551520, root 5, inode 3082523, offset 6035554304, length 4096, links 1 (path: shared/backup/Normal/Paul-PC_2014_07_06_13_00_27_323D24.TIB) [ 4374.501572] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 19, gen 0 [ 4374.534256] BTRFS: fixed up error at logical 14375409106944 on dev /dev/sdb2 [ 4374.534586] BTRFS: checksum error at logical 14375409111040 on dev /dev/sdb2, sector 1565551528, root 5, inode 3082523, offset 6035558400, length 4096, links 1 (path: shared/backup/Normal/Paul-PC_2014_07_06_13_00_27_323D24.TIB) [ 4374.534589] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 20, gen 0 [ 4374.567606] BTRFS: fixed up error at logical 14375409111040 on dev /dev/sdb2 [ 5396.970316] INFO: task kworker/u16:8:7540 blocked for more than 120 seconds. [ 5396.970318] Not tainted 3.17.1-gentoo-r1 #1 [ 5396.970319] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 5396.970319] kworker/u16:8 D 880302e4a2a0 0 7540 2 0x [ 5396.970325] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-3) [ 5396.970326] 88052741b6b8 0046 8805fe758ab0 8805fe758ab0 [ 5396.970328] 00011480 4000 8805ffa13610 8805fe758ab0 [ 5396.970329] 8805fe758ab0 88061f291480 00011480 8805fe758ab0 [ 5396.970331] Call Trace: [ 5396.970336] [8153a263] schedule+0x65/0x67 [ 5396.970338] [811a0649] wait_current_trans.isra.32+0x94/0xec [ 5396.970341] [8105b002] ? add_wait_queue+0x44/0x44 [ 5396.970342] [811a195e] start_transaction+0x206/0x472 [ 5396.970343] [811a1c53] btrfs_join_transaction+0x12/0x14 [ 5396.970344] [811a6f5e] run_delalloc_nocow+0x871/0x8bc [ 5396.970346] [811a7009] run_delalloc_range+0x60/0x2de [ 5396.970348] [811b9462] writepage_delalloc.isra.35+0xa1/0x125 [ 5396.970350] [811bb287] __extent_writepage+0x135/0x1d9 [ 5396.970351] [811bb4c8] extent_write_cache_pages.isra.28.constprop.46+0x19d/0x2f0 [ 5396.970353] [811bba55] extent_writepages+0x46/0x57 [ 5396.970354] [811a413b] ? btrfs_update_inode_item+0xe9/0xe9 [ 5396.970355] [811a3097] btrfs_writepages+0x23/0x25 [ 5396.970357] [810ac28b] do_writepages+0x19/0x27 [ 5396.970358] [810fd1e9] __writeback_single_inode+0x3e/0xf1 [ 5396.970359] [810fe089] writeback_sb_inodes+0x1bf/0x2da [ 5396.970360]
RE: Convert btrfs software code to ASIC
Hi Nguyen, Perhaps a better idea would be to use a low-cost low-power som module to run Linux and btrfs code, and use an FPGA/ASIC to offload compression/encryption/checksums and to possibly act as a raid controller. Since btrfs will be under heavy development for the foreseeable future I doubt it would be a good idea to lock it into silicon. Using this approach the mature technologies can be hardware accelerated, and the software parts are available for easy upgrades. It also significantly reduces risk for your project, and VCs like that sort of thing! Regards, Paul. -Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Le Nguyen Tran Sent: Monday, 19 May 2014 9:07 PM To: Fajar A. Nugraha Cc: linux-btrfs Subject: Re: Convert btrfs software code to ASIC Hi Nugraha, Thank you so much for your information. Frankly speaking, no one can confirm a new start-up idea works or not. The probability of failure is always high. However, the benefit if it works is also very high. I do not plan to exactly replicate the C source code. There are always some techniques in ASIC design to implement which are not the same as in software (less flexible but faster). The main advantages of my proposed chip are: - Very high performance: Performance of ASIC chip is normally more than 10x higher than performance of processors because processor run only 1-4 instructions sequentially. That is very suitable for server when there are many requests from users. - Low-cost: In side the chip, we can customized for our function only. In my plan, we do not need cache (which covers a very large area), and we can use low cost technology 0.18um. - Low-power: Processors run instructions sequentially and access memory ( or cache). As a result, they consume much more power than ASIC chip (also can be 10x higher). Actually ARM processors like mediatek cannot be comparable with ASIC chip. However, as I mentioned, it is just my draft idea. I still to work more to verify my idea. Thanks. Nguyen. On 5/19/14, Fajar A. Nugraha l...@fajar.net wrote: On Mon, May 19, 2014 at 3:40 PM, Le Nguyen Tran lntran...@gmail.com wrote: Hi, I am Nguyen. I am not a software development engineer but an IC (chip) development engineer. I have a plan to develop an IC controller for Network Attached Storage (NAS). The main idea is converting software code into hardware implementation. Because the chip is customized for NAS, its performance is high, and its cost is lower than using micro processor like Atom or Xeon (for servers). I plan to use btrfs as the file system specification for my NAS. The main point is that I need to understand the btrfs sofware code in order to covert them into hardware implementation. I am wandering if any of you can help me. If we can make the chip in a good shape, we can start up a company and have our own business. I'm not sure if that's a good idea. AFAIK btrfs depends a lot on other linux subsystems (e.g. vfs, block, etc). Rather than converting/reimplementing everything, if your aim is lower cost, you might have easier time using something like a mediatek SOC (the ones used on smartphones) and run a custom-built linux with btrfs support on it. For documentation, https://btrfs.wiki.kernel.org/index.php/Main_Page#Developer_documentat ion is probably the best place to start -- Fajar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html N�r��yb�X��ǧv�^�){.n�+{�n�߲)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥
Help with csum failed errors
Hi all, I'm getting some strange errors and I need some help diagnosing where the problem is. You can see from below that the error is csum failed ino 5641. This is a new SSD that is running in raid1. When I first noticed the error (on both drives) I copied all the data off the drives, reformatted and copied the data back. I was running 3.13.11 and upgraded to 3.14.2 just incase there was a bugfix. I still had the error on one drive so I converted the array back to single, ran dd if=/dev/zero of=/dev/sdd1 and readded sdd1 and rebalanced. No error was reported I'm also running the root and swap partitions on the same physical drives and they are ok (root is btrfs also), which makes me suspect that the SSD is ok. I did a scrub on both drives and that found no errors. What do I try next? [44778.232540] BTRFS info (device sdd1): relocating block group 333996621824 flags 1 [44780.458408] BTRFS info (device sdd1): found 339 extents [44783.494674] BTRFS info (device sdd1): found 339 extents [44783.546293] BTRFS info (device sdd1): relocating block group 331849138176 flags 1 [44786.143536] BTRFS info (device sdd1): found 164 extents [44789.256777] BTRFS info (device sdd1): found 164 extents [49217.915725] kvm: zapping shadow pages for mmio generation wraparound [141968.885166] BTRFS error (device sdd1): csum failed ino 5641 off 54112157696 csum 2741395493 expected csum 3151521372 [141968.885216] BTRFS error (device sdd1): csum failed ino 5641 off 54412632064 csum 3489516372 expected csum 2741395493 [141968.887816] BTRFS error (device sdd1): csum failed ino 5641 off 27601571840 csum 1878206089 expected csum 3203096954 [141969.887794] BTRFS error (device sdd1): csum failed ino 5641 off 54112157696 csum 2741395493 expected csum 3151521372 [141970.895408] BTRFS error (device sdd1): csum failed ino 5641 off 7849897984 csum 2833474655 expected csum 2585631118 [141970.895437] BTRFS error (device sdd1): csum failed ino 5641 off 8398065664 csum 2001723841 expected csum 2913537154 [141970.895450] BTRFS error (device sdd1): csum failed ino 5641 off 10395713536 csum 2001723841 expected csum 2833474655 [141971.895529] BTRFS error (device sdd1): csum failed ino 5641 off 10395713536 csum 2913537154 expected csum 2833474655 [141971.895541] BTRFS error (device sdd1): csum failed ino 5641 off 7849897984 csum 2913537154 expected csum 2585631118 [141972.894867] BTRFS error (device sdd1): csum failed ino 5641 off 10395713536 csum 369396853 expected csum 2833474655 [145579.088097] BTRFS error (device sdd1): csum failed ino 5641 off 54317748224 csum 1538824619 expected csum 2260594561 [145579.088110] BTRFS error (device sdd1): csum failed ino 5641 off 54412632064 csum 257502146 expected csum 3543777931 [145580.087459] BTRFS error (device sdd1): csum failed ino 5641 off 54317748224 csum 3543777931 expected csum 2260594561 [171255.071570] 3w-9xxx: scsi0: AEN: INFO (0x04:0x0029): Verify started:unit=2. [181556.516699] BTRFS error (device sdd1): csum failed ino 5641 off 54317748224 csum 1227121981 expected csum 4177569466 [181557.517271] BTRFS error (device sdd1): csum failed ino 5641 off 53179822080 csum 4130109553 expected csum 1722742324 [188752.222042] BTRFS error (device sdd1): csum failed ino 5641 off 54317748224 csum 50100434 expected csum 4177569466 [200568.511268] 3w-9xxx: scsi0: AEN: INFO (0x04:0x002B): Verify completed:unit=2. [202095.611465] BTRFS error (device sdd1): csum failed ino 5641 off 12205871104 csum 910010336 expected csum 2948706421 [203143.317917] BTRFS error (device sdd1): csum failed ino 5641 off 55994003456 csum 648604663 expected csum 2485194978 vm-server ~ # btrfs scrub stat /dev/sdd1 scrub status for 9baf63f7-a9d6-456c-8fdd-1a8fdb21958f scrub started at Sat May 3 17:59:47 2014 and finished after 659 seconds total bytes scrubbed: 273.64GiB with 0 errors vm-server ~ # btrfs scrub stat / scrub status for 58d27dbd-7c1e-4ef7-8d43-e93df1537b08 scrub started at Sat May 3 19:22:58 2014 and finished after 49 seconds total bytes scrubbed: 18.49GiB with 0 errors vm-server ~ # uname -a Linux vm-server 3.14.2-gentoo #1 SMP PREEMPT Thu May 1 00:02:32 EST 2014 x86_64 Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz GenuineIntel GNU/Linux vm-server ~ # btrfs --version Btrfs v3.14.1 vm-server ~ # btrfs fi show Label: 'Root' uuid: 58d27dbd-7c1e-4ef7-8d43-e93df1537b08 Total devices 2 FS bytes used 18.49GiB devid3 size 40.00GiB used 35.03GiB path /dev/sde3 devid4 size 40.00GiB used 35.03GiB path /dev/sdd3 Label: 'storage' uuid: df3d4a9c-ed6c-4867-8991-a018276f6f3c Total devices 2 FS bytes used 1.13TiB devid5 size 2.69TiB used 1.16TiB path /dev/sdb1 devid6 size 2.69TiB used 1.16TiB path /dev/sda1 Label: 'backup' uuid: b24d05da-6b0a-4ab0-8f2f-21ea5416e9e9 Total devices 3 FS bytes used 899.20GiB devid3 size 901.92GiB used 616.03GiB path /dev/sdf1 devid4 size 892.25GiB