Re: updatedb does not index /home when /home is Btrfs
On Fri, Nov 03, 2017 at 06:15:53PM -0600, Chris Murphy wrote: > Ancient bug, still seems to be a bug. > https://bugzilla.redhat.com/show_bug.cgi?id=906591 > > The issue is that updatedb by default will not index bind mounts, but > by default on Fedora and probably other distros, put /home on a > subvolume and then mount that subvolume which is in effect a bind > mount. > > There's a lot of early discussion in 2013 about it, but then it's > dropped off the radar as nobody has any ideas how to fix this in > mlocate. I don't see how this would be a bug in btrfs. The same happens if you bind-mount /home (or individual homes), which is a valid and non-rare setup. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ Laws we want back: Poland, Dz.U. 1921 nr.30 poz.177 (also Dz.U. ⣾⠁⢰⠒⠀⣿⡁ 1920 nr.11 poz.61): Art.2: An official, guilty of accepting a gift ⢿⡄⠘⠷⠚⠋⠀ or another material benefit, or a promise thereof, [in matters ⠈⠳⣄ relevant to duties], shall be punished by death by shooting. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with file system
On Fri, Nov 03, 2017 at 04:03:44PM -0600, Chris Murphy wrote: > On Tue, Oct 31, 2017 at 5:28 AM, Austin S. Hemmelgarn >wrote: > > > If you're running on an SSD (or thinly provisioned storage, or something > > else which supports discards) and have the 'discard' mount option enabled, > > then there is no backup metadata tree (this issue was mentioned on the list > > a while ago, but nobody ever replied), > > > This is a really good point. I've been running discard mount option > for some time now without problems, in a laptop with Samsung > Electronics Co Ltd NVMe SSD Controller SM951/PM951. > > However, just trying btrfs-debug-tree -b on a specific block address > for any of the backup root trees listed in the super, only the current > one returns a valid result. All others fail with checksum errors. And > even the good one fails with checksum errors within seconds as a new > tree is created, the super updated, and Btrfs considers the old root > tree disposable and subject to discard. > > So absolutely if I were to have a problem, probably no rollback for > me. This seems to totally obviate a fundamental part of Btrfs design. How is this an issue? Discard is issued only once we're positive there's no reference to the freed blocks anywhere. At that point, they're also open for reuse, thus they can be arbitrarily scribbled upon. Unless your hardware is seriously broken (such as lying about barriers, which is nearly-guaranteed data loss on btrfs anyway), there's no way the filesystem will ever reference such blocks. The corpses of old trees that are left lying around with no discard can at most be used for manual forensics, but whether a given block will have been overwritten or not is a matter of pure luck. For rollbacks, there are snapshots. Once a transaction has been fully committed, the old version is considered gone. > because it's already been discarded. > > This is ideally something which should be addressed (we need some sort of > > discard queue for handling in-line discards), but it's not easy to address. > > Discard data extents, don't discard metadata extents? Or put them on a > substantial delay. Why would you special-case metadata? Metadata that points to overwritten or discarded blocks is of no use either. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ Laws we want back: Poland, Dz.U. 1921 nr.30 poz.177 (also Dz.U. ⣾⠁⢰⠒⠀⣿⡁ 1920 nr.11 poz.61): Art.2: An official, guilty of accepting a gift ⢿⡄⠘⠷⠚⠋⠀ or another material benefit, or a promise thereof, [in matters ⠈⠳⣄ relevant to duties], shall be punished by death by shooting. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog
For what it's worth, cryptsetup 2 now offers a UI for setting up both dm-verity and dm-integrity. https://www.kernel.org/pub/linux/utils/cryptsetup/v2.0/v2.0.0-rc0-ReleaseNotes While more complicated than Btrfs, it's possible to first make an integrity device on each drive, and add the integrity block devices to mdadm or lvm as physical devices to create the raid1/10/5/6 array. You could do it the other way around, but what should happen if you do it as described, a sector read that fails checksum matching will cause a read error to be handed off to md driver which then does reconstruction from parity. If you only make the integrity volume out of an array, then your file system just gets a read error whenever there's a checksum mismatch, reconstruction isn't possible but at least you're warned. --- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
updatedb does not index /home when /home is Btrfs
Ancient bug, still seems to be a bug. https://bugzilla.redhat.com/show_bug.cgi?id=906591 The issue is that updatedb by default will not index bind mounts, but by default on Fedora and probably other distros, put /home on a subvolume and then mount that subvolume which is in effect a bind mount. There's a lot of early discussion in 2013 about it, but then it's dropped off the radar as nobody has any ideas how to fix this in mlocate. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with file system
On Tue, Oct 31, 2017 at 5:28 AM, Austin S. Hemmelgarnwrote: > If you're running on an SSD (or thinly provisioned storage, or something > else which supports discards) and have the 'discard' mount option enabled, > then there is no backup metadata tree (this issue was mentioned on the list > a while ago, but nobody ever replied), This is a really good point. I've been running discard mount option for some time now without problems, in a laptop with Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951. However, just trying btrfs-debug-tree -b on a specific block address for any of the backup root trees listed in the super, only the current one returns a valid result. All others fail with checksum errors. And even the good one fails with checksum errors within seconds as a new tree is created, the super updated, and Btrfs considers the old root tree disposable and subject to discard. So absolutely if I were to have a problem, probably no rollback for me. This seems to totally obviate a fundamental part of Btrfs design. because it's already been discarded. > This is ideally something which should be addressed (we need some sort of > discard queue for handling in-line discards), but it's not easy to address. Discard data extents, don't discard metadata extents? Or put them on a substantial delay. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: add support for fallocate's zero range operation
On 11/03/2017 11:20 AM, fdman...@kernel.org wrote: > From: Filipe Manana> > This implements support the zero range operation of fallocate. For now > at least it's as simple as possible while reusing most of the existing > fallocate and hole punching infrastructure. > > Signed-off-by: Filipe Manana > --- > > V2: Removed double inode unlock on error path from failure to lock range. > V3: Factored common code to update isize and inode item into a helper > function, plus some minor cleanup. > > fs/btrfs/file.c | 351 > +--- > 1 file changed, 285 insertions(+), 66 deletions(-) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index aafcc785f840..2cc1aed1c564 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -2448,7 +2448,48 @@ static int find_first_non_hole(struct inode *inode, > u64 *start, u64 *len) > return ret; > } > > -static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) > +static int btrfs_punch_hole_lock_range(struct inode *inode, > +const u64 lockstart, > +const u64 lockend, > +struct extent_state **cached_state) > +{ > + while (1) { > + struct btrfs_ordered_extent *ordered; > + int ret; > + > + truncate_pagecache_range(inode, lockstart, lockend); > + > + lock_extent_bits(_I(inode)->io_tree, lockstart, lockend, > + cached_state); > + ordered = btrfs_lookup_first_ordered_extent(inode, lockend); > + > + /* > + * We need to make sure we have no ordered extents in this range > + * and nobody raced in and read a page in this range, if we did > + * we need to try again. > + */ > + if ((!ordered || > + (ordered->file_offset + ordered->len <= lockstart || > + ordered->file_offset > lockend)) && > + !btrfs_page_exists_in_range(inode, lockstart, lockend)) { > + if (ordered) > + btrfs_put_ordered_extent(ordered); > + break; > + } > + if (ordered) > + btrfs_put_ordered_extent(ordered); > + unlock_extent_cached(_I(inode)->io_tree, lockstart, > + lockend, cached_state, GFP_NOFS); > + ret = btrfs_wait_ordered_range(inode, lockstart, > +lockend - lockstart + 1); > + if (ret) > + return ret; > + } > + return 0; > +} > + > +static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len, > + bool lock_inode) The inode_lock may no longer be needed, since it looks to be always true in this version of the patch. Ed > { > struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); > struct btrfs_root *root = BTRFS_I(inode)->root; > @@ -2477,7 +2518,8 @@ static int btrfs_punch_hole(struct inode *inode, loff_t > offset, loff_t len) > if (ret) > return ret; > > - inode_lock(inode); > + if (lock_inode) > + inode_lock(inode); > ino_size = round_up(inode->i_size, fs_info->sectorsize); > ret = find_first_non_hole(inode, , ); > if (ret < 0) > @@ -2516,7 +2558,8 @@ static int btrfs_punch_hole(struct inode *inode, loff_t > offset, loff_t len) > truncated_block = true; > ret = btrfs_truncate_block(inode, offset, 0, 0); > if (ret) { > - inode_unlock(inode); > + if (lock_inode) > + inode_unlock(inode); > return ret; > } > } > @@ -2564,38 +2607,12 @@ static int btrfs_punch_hole(struct inode *inode, > loff_t offset, loff_t len) > goto out_only_mutex; > } > > - while (1) { > - struct btrfs_ordered_extent *ordered; > - > - truncate_pagecache_range(inode, lockstart, lockend); > - > - lock_extent_bits(_I(inode)->io_tree, lockstart, lockend, > - _state); > - ordered = btrfs_lookup_first_ordered_extent(inode, lockend); > - > - /* > - * We need to make sure we have no ordered extents in this range > - * and nobody raced in and read a page in this range, if we did > - * we need to try again. > - */ > - if ((!ordered || > - (ordered->file_offset + ordered->len <= lockstart || > - ordered->file_offset > lockend)) && > - !btrfs_page_exists_in_range(inode, lockstart, lockend)) { > - if (ordered) > -
[PATCH v3] Btrfs: add support for fallocate's zero range operation
From: Filipe MananaThis implements support the zero range operation of fallocate. For now at least it's as simple as possible while reusing most of the existing fallocate and hole punching infrastructure. Signed-off-by: Filipe Manana --- V2: Removed double inode unlock on error path from failure to lock range. V3: Factored common code to update isize and inode item into a helper function, plus some minor cleanup. fs/btrfs/file.c | 351 +--- 1 file changed, 285 insertions(+), 66 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index aafcc785f840..2cc1aed1c564 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2448,7 +2448,48 @@ static int find_first_non_hole(struct inode *inode, u64 *start, u64 *len) return ret; } -static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) +static int btrfs_punch_hole_lock_range(struct inode *inode, + const u64 lockstart, + const u64 lockend, + struct extent_state **cached_state) +{ + while (1) { + struct btrfs_ordered_extent *ordered; + int ret; + + truncate_pagecache_range(inode, lockstart, lockend); + + lock_extent_bits(_I(inode)->io_tree, lockstart, lockend, +cached_state); + ordered = btrfs_lookup_first_ordered_extent(inode, lockend); + + /* +* We need to make sure we have no ordered extents in this range +* and nobody raced in and read a page in this range, if we did +* we need to try again. +*/ + if ((!ordered || + (ordered->file_offset + ordered->len <= lockstart || +ordered->file_offset > lockend)) && +!btrfs_page_exists_in_range(inode, lockstart, lockend)) { + if (ordered) + btrfs_put_ordered_extent(ordered); + break; + } + if (ordered) + btrfs_put_ordered_extent(ordered); + unlock_extent_cached(_I(inode)->io_tree, lockstart, +lockend, cached_state, GFP_NOFS); + ret = btrfs_wait_ordered_range(inode, lockstart, + lockend - lockstart + 1); + if (ret) + return ret; + } + return 0; +} + +static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len, + bool lock_inode) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_root *root = BTRFS_I(inode)->root; @@ -2477,7 +2518,8 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) if (ret) return ret; - inode_lock(inode); + if (lock_inode) + inode_lock(inode); ino_size = round_up(inode->i_size, fs_info->sectorsize); ret = find_first_non_hole(inode, , ); if (ret < 0) @@ -2516,7 +2558,8 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) truncated_block = true; ret = btrfs_truncate_block(inode, offset, 0, 0); if (ret) { - inode_unlock(inode); + if (lock_inode) + inode_unlock(inode); return ret; } } @@ -2564,38 +2607,12 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) goto out_only_mutex; } - while (1) { - struct btrfs_ordered_extent *ordered; - - truncate_pagecache_range(inode, lockstart, lockend); - - lock_extent_bits(_I(inode)->io_tree, lockstart, lockend, -_state); - ordered = btrfs_lookup_first_ordered_extent(inode, lockend); - - /* -* We need to make sure we have no ordered extents in this range -* and nobody raced in and read a page in this range, if we did -* we need to try again. -*/ - if ((!ordered || - (ordered->file_offset + ordered->len <= lockstart || -ordered->file_offset > lockend)) && -!btrfs_page_exists_in_range(inode, lockstart, lockend)) { - if (ordered) - btrfs_put_ordered_extent(ordered); - break; - } - if (ordered) - btrfs_put_ordered_extent(ordered); - unlock_extent_cached(_I(inode)->io_tree, lockstart, -
Mein Liebster
Fondsüberweisung Von Isabelle Seyyed. Mein Liebster, Ich habe Ihnen diese E-Mail für offene Gespräche mit Ihnen geschickt. Ich möchte nicht, dass du dieses Angebot in irgendeiner Hinsicht missverstehst ... wenn es dir gut geht, ich bitte um deine volle Mitarbeit. Ich habe Sie kontaktiert Vertrauen, um eine Investition in Ihrem Land / Unternehmen in meinem Namen als potenzieller Partner zu behandeln. Mein Name ist Isabelle Seyyed, 22 Jahre altes Mädchen von Cote D'Ivoire. Mein Vater und ich entkamen aus unserem Land in der Hitze des Bürgerkriegs, nachdem ich meine Mutter und zwei meiner älteren Brüder im Krieg verloren hatte. Als Ergebnis der politischen Instabilität in meinem Land auch nach dem Krieg, gründete mein Vater seine Kakao- und Kaffee-Export-Geschäft in meinem Land Abidjan, Elfenbeinküste. Er war in Burke, einer nördlichen Stadt, um für den Kauf einer Kakaoplantage zu verhandeln, als er von den Rebellen getroffen wurde, die kämpften, um die Regierung des Landes zu übernehmen. Der Tod meines Vaters hat mich jetzt zu einer Waise gemacht und damit der Gefahr ausgesetzt. Vor seinem unglücklichen Tod rief mich mein verstorbener Vater neben seinem kranken Bett und erzählte mir als seine einzige überlebende Tochter, dass er in einer der prominenten Bank hier in unserem Land die Summe von 4,6 Millionen Euro hinterlegt hatte. Mit meinem Namen als die nächste Angehörige. Infolge der gegenwärtigen Unsicherheit von Leben und Eigentum in diesem Land, Ich möchte in ein anderes Land verlagern, weil es in diesem Cote d'Ivoire keine weiteren guten Wertpapiere mehr gibt, keine weiteren guten Universitäten mehr, da dieser Rebellen-, Politik- und Bürgerkrieg begann, ich hoffe, Sie haben hören über den Krieg von Cote d Ivoire. Meine Gründe für die Kontaktaufnahme sind Sie unten aufgeführt: 1. Ich möchte, dass Sie mir helfen, die Summe von vier Millionen sechshunderttausend Euro (€ 4,6 0 0, 0 0 0,0 0) zu unterstützen und zu investieren, die ich von meinem verstorbenen Vater geerbt habe, bevor er starb. 2. Ich möchte, dass Sie mir helfen, die Hochschulzugangsberechtigung zu erhalten, sobald ich in Ihr Land nach der Überweisung des Geldes ankomme. 3. Ich möchte, dass du mein Vormund bist, also ist mein Vater tot. 4. Ich möchte, dass du mir behilfst, eine gute Unterkunft in deinem Land zu bekommen. Ich bin bereit, Ihnen 20% der Gesamtsumme als Entschädigung für Ihre Bemühungen nach dem erfolgreichen Transfer meines ererbten Geldes in Ihr nominiertes Konto anzubieten. Bitte schätze ich am meisten, wenn du mich kontaktieren kannst, sobald du diese Nachricht erhältst, damit wir weiter darüber sprechen können. Hochachtungsvoll. Isabelle Seyyed -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs_cleaner lockdep warning in v4.9.56
Hello, this warning happened during "btrfs subvolume remove" of a readonly snapshot after the newest in the snapshot series was "btrfs received". [96857.000284] [ cut here ] [96857.000307] WARNING: CPU: 1 PID: 371 at kernel/locking/lockdep.c:704 register_lock_class+0x4c8/0x530 [96857.000322] Modules linked in: fuse vfat msdos fat dm_mod nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc xfs ipmi_watchdog libcrc32c raid1 iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal coretemp kvm_intel kvm evdev irqbypass serio_raw hpilo hpwdt tpm_tis tpm_tis_core acpi_power_meter tpm button lpc_ich mfd_core md_mod ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 btrfs xor raid6_pq sg sd_mod uas usb_storage crc32c_intel ahci libahci psmouse libata scsi_mod uhci_hcd xhci_pci xhci_hcd tg3 ptp pps_core libphy thermal ehci_pci ehci_hcd usbcore usb_common [96857.000744] CPU: 1 PID: 371 Comm: btrfs-cleaner Not tainted 4.9.56 #13 [96857.000769] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 06/06/2014 [96857.000795] c9a4fa48 81309da5 [96857.000849] c9a4fa88 81059b7c 02c4 [96857.000901] 8234baf0 880041fca450 [96857.000954] Call Trace: [96857.000979] [] dump_stack+0x67/0x92 [96857.001004] [] __warn+0xcc/0xf0 [96857.001029] [] warn_slowpath_null+0x18/0x20 [96857.001054] [] register_lock_class+0x4c8/0x530 [96857.001080] [] __lock_acquire+0x76/0x7f0 [96857.001105] [] lock_acquire+0xbe/0x1f0 [96857.001161] [] ? btrfs_tree_lock+0x89/0x250 [btrfs] [96857.001188] [] _raw_write_lock+0x33/0x50 [96857.001233] [] ? btrfs_tree_lock+0x89/0x250 [btrfs] [96857.001276] [] btrfs_tree_lock+0x89/0x250 [btrfs] [96857.001322] [] ? find_extent_buffer+0xda/0x1e0 [btrfs] [96857.001367] [] ? release_extent_buffer+0xc0/0xc0 [btrfs] [96857.001409] [] do_walk_down+0xf0/0x930 [btrfs] [96857.001450] [] walk_down_tree+0xb2/0xe0 [btrfs] [96857.001491] [] btrfs_drop_snapshot+0x3a9/0x780 [btrfs] [96857.001517] [] ? _raw_spin_unlock+0x22/0x30 [96857.001561] [] ? btrfs_kill_all_delayed_nodes+0xbd/0xd0 [btrfs] [96857.001617] [] btrfs_clean_one_deleted_snapshot+0xad/0xe0 [btrfs] [96857.001672] [] cleaner_kthread+0x16f/0x1e0 [btrfs] [96857.001713] [] ? btree_invalidatepage+0xa0/0xa0 [btrfs] [96857.001741] [] kthread+0x116/0x130 [96857.001765] [] ? kthread_park+0x60/0x60 [96857.001790] [] ret_from_fork+0x27/0x40 [96857.001814] ---[ end trace ff21435da4cc1bc5 ]--- Regards, Petr -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with file system
On 2017-11-03 03:42, Kai Krakow wrote: Am Tue, 31 Oct 2017 07:28:58 -0400 schrieb "Austin S. Hemmelgarn": On 2017-10-31 01:57, Marat Khalili wrote: On 31/10/17 00:37, Chris Murphy wrote: But off hand it sounds like hardware was sabotaging the expected write ordering. How to test a given hardware setup for that, I think, is really overdue. It affects literally every file system, and Linux storage technology. It kinda sounds like to me something other than supers is being overwritten too soon, and that's why it's possible for none of the backup roots to find a valid root tree, because all four possible root trees either haven't actually been written yet (still) or they've been overwritten, even though the super is updated. But again, it's speculation, we don't actually know why your system was no longer mountable. Just a detached view: I know hardware should respect ordering/barriers and such, but how hard is it really to avoid overwriting at least one complete metadata tree for half an hour (even better, yet another one for a day)? Just metadata, not data extents. If you're running on an SSD (or thinly provisioned storage, or something else which supports discards) and have the 'discard' mount option enabled, then there is no backup metadata tree (this issue was mentioned on the list a while ago, but nobody ever replied), because it's already been discarded. This is ideally something which should be addressed (we need some sort of discard queue for handling in-line discards), but it's not easy to address. Otherwise, it becomes a question of space usage on the filesystem, and this is just another reason to keep some extra slack space on the FS (though that doesn't help _much_, it does help). This, in theory, could be addressed, but it probably can't be applied across mounts of a filesystem without an on-disk format change. Well, maybe inline discard is working at the wrong level. It should kick in when the reference through any of the backup roots is dropped, not when the current instance is dropped. Indeed. Without knowledge of the internals, I guess discards could be added to a queue within a new tree in btrfs, and only added to that queue when dropped from the last backup root referencing it. But this will probably add some bad performance spikes. Inline discards can already cause bad performance spikes. I wonder how a regular fstrim run through cron applies to this problem? You functionally lose any old (freed) trees, they just get kept around until you call fstrim. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: defragmenting best practice?
On 2017-11-03 03:26, Kai Krakow wrote: Am Thu, 2 Nov 2017 22:47:31 -0400 schrieb Dave: On Thu, Nov 2, 2017 at 5:16 PM, Kai Krakow wrote: You may want to try btrfs autodefrag mount option and see if it improves things (tho, the effect may take days or weeks to apply if you didn't enable it right from the creation of the filesystem). Also, autodefrag will probably unshare reflinks on your snapshots. You may be able to use bees[1] to work against this effect. Its interaction with autodefrag is not well tested but it works fine for me. Also, bees is able to reduce some of the fragmentation during deduplication because it will rewrite extents back into bigger chunks (but only for duplicated data). [1]: https://github.com/Zygo/bees I will look into bees. And yes, I plan to try autodefrag. (I already have it enabled now.) However, I need to understand something about how btrfs send-receive works in regard to reflinks and fragmentation. Say I have 2 snapshots on my live volume. The earlier one of them has already been sent to another block device by btrfs send-receive (full backup). Now defrag runs on the live volume and breaks some percentage of the reflinks. At this point I do an incremental btrfs send-receive using "-p" (or "-c") with the diff going to the same other block device where the prior snapshot was already sent. Will reflinks be "made whole" (restored) on the receiving block device? Or is the state of the source volume replicated so closely that reflink status is the same on the target? Also, is fragmentation reduced on the receiving block device? My expectation is that fragmentation would be reduced and duplication would be reduced too. In other words, does send-receive result in defragmentation and deduplication too? As far as I understand, btrfs send/receive doesn't create an exact mirror. It just replays the block operations between generation numbers. That is: If it finds new blocks referenced between generations, it will write a _new_ block to the destination. That is mostly correct, except it's not a block level copy. To put it in a heavily simplified manner, send/receive will recreate the subvolume using nothing more than basic file manipulation syscalls (write(), chown(), chmod(), etc), the clone ioctl, and some extra logic to figure out the correct location to clone from. IOW, it's functionally equivalent to using rsync to copy the data, and then deduplicating, albeit a bit smarter about when to deduplicate (and more efficient in that respect). So, no, it won't reduce fragmentation or duplication. It just keeps reflinks intact as long as such extents weren't touched within the generation range. Otherwise they are rewritten as new extents. A received subvolume will almost always be less fragmented than the source, since everything is received serially, and each file is written out one at a time. Autodefrag and deduplication processes will as such probably increase duplication at the destination. A developer may have a better clue, tho. In theory, yes, but in practice, not so much. Autodefrag generally operates on very small blocks of data (64k IIRC), and I'm pretty sure it has some heuristic that only triggers it on small random writes, so depending on the workload, it may not be triggering much (for example, it often won't trigger on cache directories, since those almost never have files rewritten in place). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Move leaf verification to correct timing to avoid false panic for sanity test
On 2017年11月03日 18:59, Filipe Manana wrote: > On Thu, Nov 2, 2017 at 7:04 AM, Qu Wenruowrote: >> [BUG] >> If we run btrfs with CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y, it will >> instantly cause kernel panic like: >> >> -- >> ... >> assertion failed: 0, file: fs/btrfs/disk-io.c, line: 3853 >> ... >> Call Trace: >> btrfs_mark_buffer_dirty+0x187/0x1f0 [btrfs] >> setup_items_for_insert+0x385/0x650 [btrfs] >> __btrfs_drop_extents+0x129a/0x1870 [btrfs] >> ... >> -- >> >> [Cause] >> Btrfs will call btrfs_check_leaf() in btrfs_mark_buffer_dirty() to check >> if the leaf is valid with CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y. >> >> However some btrfs_mark_buffer_dirty() caller, like >> setup_items_for_insert(), doesn't really initialize its item data but >> only initialize its item pointers, leaving item data uninitialized. > > So instead of doing this juggling, the best would be to have it not call > mark_buffer_dirty(), and leave that responsibility for the caller after > it initializes the item data. I give you a very good reason for that below. However setup_items_for_insert() is just one of the possible causes, unless we overhaul all btrfs_mark_buffer_dirty() callers, it will be whac-a-aole. > >> >> This makes tree-checker catch uninitialized data as error, causing >> such panic. >> >> [Fix] >> The correct timing to check leaf validation should be before write IO or >> after read IO. >> >> Just like ee have already done the tree validation check at btree >> readpage end io hook, this patch will move the write time tree checker to >> csum_dirty_buffer(). >> >> As csum_dirty_buffer() is called just before submitting btree write bio, as >> the call path shows: >> >> btree_submit_bio_hook() >> |- __btree_submit_bio_start() >>|- btree_csum_one_bio() >> |- csum_dirty_buffer() >> |- btrfs_check_leaf() >> >> By this we can ensure the leaf passed in is in consistent status, and >> can check them without causing tons of false alert. >> >> Reported-by: Lakshmipathi.G >> Signed-off-by: Qu Wenruo >> --- >> fs/btrfs/disk-io.c | 26 +++--- >> 1 file changed, 19 insertions(+), 7 deletions(-) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index efce9a2fa9be..6c17bce2a05e 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -506,6 +506,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info >> *fs_info, struct page *page) >> u64 start = page_offset(page); >> u64 found_start; >> struct extent_buffer *eb; >> + int ret; >> >> eb = (struct extent_buffer *)page->private; >> if (page != eb->pages[0]) >> @@ -524,7 +525,24 @@ static int csum_dirty_buffer(struct btrfs_fs_info >> *fs_info, struct page *page) >> ASSERT(memcmp_extent_buffer(eb, fs_info->fsid, >> btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0); >> >> - return csum_tree_block(fs_info, eb, 0); >> + ret = csum_tree_block(fs_info, eb, 0); >> + if (ret) >> + return ret; >> + >> +#ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY >> + /* >> +* Do extra check before we write the tree block into disk. >> +*/ >> + if (btrfs_header_level(eb) == 0) { >> + ret = btrfs_check_leaf(fs_info->tree_root, eb); >> + if (ret) { >> + btrfs_print_leaf(eb); >> + ASSERT(0); >> + return ret; >> + } >> + } >> +#endif >> + return 0; >> } >> >> static int check_tree_block_fsid(struct btrfs_fs_info *fs_info, >> @@ -3847,12 +3865,6 @@ void btrfs_mark_buffer_dirty(struct extent_buffer >> *buf) >> percpu_counter_add_batch(_info->dirty_metadata_bytes, >> buf->len, >> fs_info->dirty_metadata_batch); >> -#ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY >> - if (btrfs_header_level(buf) == 0 && btrfs_check_leaf(root, buf)) { >> - btrfs_print_leaf(buf); >> - ASSERT(0); >> - } >> -#endif > > So there's a reason why btrfs_check_leaf() was called here, at > mark_buffer_dirty(), > instead of somewhere else like csum_dirty_buffer(). > > The reason is that once some bad code inserts a key out of order for > example (or did any > other bad stuff that check_leaf() catched before you added the > tree-checker thing), we > would get a trace that pinpoints exactly where the bad code is. With > this change, we will > only know some is bad when writeback of the leaf starts, and before > that happens, the leaf might > have been changed dozens of times by many different functions (and > this happens very > often, it's far from being a unusual case), in which case the given > trace won't tell you which code > misbehaved. This makes it harder to find out bugs, and as it used to > be it certainly helped me in > the past several times. IOW, I would prefer what I
Re: [PATCH 06/11] btrfs: document device locking
Thanks for writing this. + * - fs_devices::device_list_mutex (per-fs, with RCU) + * + * protects updates to fs_devices::devices, ie. adding and deleting + * + * simple list traversal with read-only actions can be done with RCU + * protection + * + * may be used to exclude some operations from running concurrently without + * any modifications to the list (see write_all_supers) + * - volume_mutex + * + * coarse lock owned by a mounted filesystem; used to exclude some operations + * that cannot run in parallel and affect the higher-level properties of the + * filesystem like: device add/deleting/resize/replace, or balance + * - chunk_mutex + * + * protects chunks, adding or removing during allocation, trim or when + * a new device is added/removed :: + * Lock nesting + * + * + * uuid_mutex + * volume_mutex + * device_list_mutex + * chunk_mutex + * balance_mutex If we have a list of operations that would consume these locks then we can map it accordingly for better clarity. To me it looks like we have too many locks. - we don't have to differentiate the mounted and unmounted context for device locks. - Two lock would be sufficient, one for the device list (add/rm,replace,..) and another for device property changes (resize, trim,..). Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Move leaf verification to correct timing to avoid false panic for sanity test
On Thu, Nov 2, 2017 at 7:04 AM, Qu Wenruowrote: > [BUG] > If we run btrfs with CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y, it will > instantly cause kernel panic like: > > -- > ... > assertion failed: 0, file: fs/btrfs/disk-io.c, line: 3853 > ... > Call Trace: > btrfs_mark_buffer_dirty+0x187/0x1f0 [btrfs] > setup_items_for_insert+0x385/0x650 [btrfs] > __btrfs_drop_extents+0x129a/0x1870 [btrfs] > ... > -- > > [Cause] > Btrfs will call btrfs_check_leaf() in btrfs_mark_buffer_dirty() to check > if the leaf is valid with CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y. > > However some btrfs_mark_buffer_dirty() caller, like > setup_items_for_insert(), doesn't really initialize its item data but > only initialize its item pointers, leaving item data uninitialized. So instead of doing this juggling, the best would be to have it not call mark_buffer_dirty(), and leave that responsibility for the caller after it initializes the item data. I give you a very good reason for that below. > > This makes tree-checker catch uninitialized data as error, causing > such panic. > > [Fix] > The correct timing to check leaf validation should be before write IO or > after read IO. > > Just like ee have already done the tree validation check at btree > readpage end io hook, this patch will move the write time tree checker to > csum_dirty_buffer(). > > As csum_dirty_buffer() is called just before submitting btree write bio, as > the call path shows: > > btree_submit_bio_hook() > |- __btree_submit_bio_start() >|- btree_csum_one_bio() > |- csum_dirty_buffer() > |- btrfs_check_leaf() > > By this we can ensure the leaf passed in is in consistent status, and > can check them without causing tons of false alert. > > Reported-by: Lakshmipathi.G > Signed-off-by: Qu Wenruo > --- > fs/btrfs/disk-io.c | 26 +++--- > 1 file changed, 19 insertions(+), 7 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index efce9a2fa9be..6c17bce2a05e 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -506,6 +506,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info > *fs_info, struct page *page) > u64 start = page_offset(page); > u64 found_start; > struct extent_buffer *eb; > + int ret; > > eb = (struct extent_buffer *)page->private; > if (page != eb->pages[0]) > @@ -524,7 +525,24 @@ static int csum_dirty_buffer(struct btrfs_fs_info > *fs_info, struct page *page) > ASSERT(memcmp_extent_buffer(eb, fs_info->fsid, > btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0); > > - return csum_tree_block(fs_info, eb, 0); > + ret = csum_tree_block(fs_info, eb, 0); > + if (ret) > + return ret; > + > +#ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY > + /* > +* Do extra check before we write the tree block into disk. > +*/ > + if (btrfs_header_level(eb) == 0) { > + ret = btrfs_check_leaf(fs_info->tree_root, eb); > + if (ret) { > + btrfs_print_leaf(eb); > + ASSERT(0); > + return ret; > + } > + } > +#endif > + return 0; > } > > static int check_tree_block_fsid(struct btrfs_fs_info *fs_info, > @@ -3847,12 +3865,6 @@ void btrfs_mark_buffer_dirty(struct extent_buffer *buf) > percpu_counter_add_batch(_info->dirty_metadata_bytes, > buf->len, > fs_info->dirty_metadata_batch); > -#ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY > - if (btrfs_header_level(buf) == 0 && btrfs_check_leaf(root, buf)) { > - btrfs_print_leaf(buf); > - ASSERT(0); > - } > -#endif So there's a reason why btrfs_check_leaf() was called here, at mark_buffer_dirty(), instead of somewhere else like csum_dirty_buffer(). The reason is that once some bad code inserts a key out of order for example (or did any other bad stuff that check_leaf() catched before you added the tree-checker thing), we would get a trace that pinpoints exactly where the bad code is. With this change, we will only know some is bad when writeback of the leaf starts, and before that happens, the leaf might have been changed dozens of times by many different functions (and this happens very often, it's far from being a unusual case), in which case the given trace won't tell you which code misbehaved. This makes it harder to find out bugs, and as it used to be it certainly helped me in the past several times. IOW, I would prefer what I mentioned earlier or, at very least, do those new checks that validate data only at writeback start time. > } > > static void __btrfs_btree_balance_dirty(struct btrfs_fs_info *fs_info, > -- > 2.14.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo
Re: [PATCH v2] Btrfs: add support for fallocate's zero range operation
On Fri, Nov 3, 2017 at 10:29 AM, Filipe Mananawrote: > On Fri, Nov 3, 2017 at 9:30 AM, Nikolay Borisov wrote: >> >> >> On 25.10.2017 17:59, fdman...@kernel.org wrote: >>> From: Filipe Manana >>> >>> This implements support the zero range operation of fallocate. For now >>> at least it's as simple as possible while reusing most of the existing >>> fallocate and hole punching infrastructure. >>> >>> Signed-off-by: Filipe Manana >>> --- >>> >>> V2: Removed double inode unlock on error path from failure to lock range. >>> >>> fs/btrfs/file.c | 332 >>> +--- >>> 1 file changed, 290 insertions(+), 42 deletions(-) >>> >>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c >>> index aafcc785f840..e0d15c0d1641 100644 >>> --- a/fs/btrfs/file.c >>> +++ b/fs/btrfs/file.c >>> @@ -2448,7 +2448,48 @@ static int find_first_non_hole(struct inode *inode, >>> u64 *start, u64 *len) >>> return ret; >>> } >>> >>> -static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) >>> +static int btrfs_punch_hole_lock_range(struct inode *inode, >>> +const u64 lockstart, >>> +const u64 lockend, >>> +struct extent_state **cached_state) >>> +{ >>> + while (1) { >>> + struct btrfs_ordered_extent *ordered; >>> + int ret; >>> + >>> + truncate_pagecache_range(inode, lockstart, lockend); >>> + >>> + lock_extent_bits(_I(inode)->io_tree, lockstart, lockend, >>> + cached_state); >>> + ordered = btrfs_lookup_first_ordered_extent(inode, lockend); >>> + >>> + /* >>> + * We need to make sure we have no ordered extents in this >>> range >>> + * and nobody raced in and read a page in this range, if we >>> did >>> + * we need to try again. >>> + */ >>> + if ((!ordered || >>> + (ordered->file_offset + ordered->len <= lockstart || >>> + ordered->file_offset > lockend)) && >>> + !btrfs_page_exists_in_range(inode, lockstart, lockend)) { >>> + if (ordered) >>> + btrfs_put_ordered_extent(ordered); >>> + break; >>> + } >>> + if (ordered) >>> + btrfs_put_ordered_extent(ordered); >>> + unlock_extent_cached(_I(inode)->io_tree, lockstart, >>> + lockend, cached_state, GFP_NOFS); >>> + ret = btrfs_wait_ordered_range(inode, lockstart, >>> +lockend - lockstart + 1); >>> + if (ret) >>> + return ret; >>> + } >>> + return 0; >>> +} >>> + >>> +static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len, >>> + bool lock_inode) >>> { >>> struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); >>> struct btrfs_root *root = BTRFS_I(inode)->root; >>> @@ -2477,7 +2518,8 @@ static int btrfs_punch_hole(struct inode *inode, >>> loff_t offset, loff_t len) >>> if (ret) >>> return ret; >>> >>> - inode_lock(inode); >>> + if (lock_inode) >>> + inode_lock(inode); >>> ino_size = round_up(inode->i_size, fs_info->sectorsize); >>> ret = find_first_non_hole(inode, , ); >>> if (ret < 0) >>> @@ -2516,7 +2558,8 @@ static int btrfs_punch_hole(struct inode *inode, >>> loff_t offset, loff_t len) >>> truncated_block = true; >>> ret = btrfs_truncate_block(inode, offset, 0, 0); >>> if (ret) { >>> - inode_unlock(inode); >>> + if (lock_inode) >>> + inode_unlock(inode); >>> return ret; >>> } >>> } >>> @@ -2564,38 +2607,12 @@ static int btrfs_punch_hole(struct inode *inode, >>> loff_t offset, loff_t len) >>> goto out_only_mutex; >>> } >>> >>> - while (1) { >>> - struct btrfs_ordered_extent *ordered; >>> - >>> - truncate_pagecache_range(inode, lockstart, lockend); >>> - >>> - lock_extent_bits(_I(inode)->io_tree, lockstart, lockend, >>> - _state); >>> - ordered = btrfs_lookup_first_ordered_extent(inode, lockend); >>> - >>> - /* >>> - * We need to make sure we have no ordered extents in this >>> range >>> - * and nobody raced in and read a page in this range, if we >>> did >>> - * we need to try again. >>> - */ >>> - if ((!ordered || >>> - (ordered->file_offset + ordered->len <= lockstart || >>> -
Re: [PATCH v2] Btrfs: add support for fallocate's zero range operation
On Fri, Nov 3, 2017 at 9:30 AM, Nikolay Borisovwrote: > > > On 25.10.2017 17:59, fdman...@kernel.org wrote: >> From: Filipe Manana >> >> This implements support the zero range operation of fallocate. For now >> at least it's as simple as possible while reusing most of the existing >> fallocate and hole punching infrastructure. >> >> Signed-off-by: Filipe Manana >> --- >> >> V2: Removed double inode unlock on error path from failure to lock range. >> >> fs/btrfs/file.c | 332 >> +--- >> 1 file changed, 290 insertions(+), 42 deletions(-) >> >> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c >> index aafcc785f840..e0d15c0d1641 100644 >> --- a/fs/btrfs/file.c >> +++ b/fs/btrfs/file.c >> @@ -2448,7 +2448,48 @@ static int find_first_non_hole(struct inode *inode, >> u64 *start, u64 *len) >> return ret; >> } >> >> -static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) >> +static int btrfs_punch_hole_lock_range(struct inode *inode, >> +const u64 lockstart, >> +const u64 lockend, >> +struct extent_state **cached_state) >> +{ >> + while (1) { >> + struct btrfs_ordered_extent *ordered; >> + int ret; >> + >> + truncate_pagecache_range(inode, lockstart, lockend); >> + >> + lock_extent_bits(_I(inode)->io_tree, lockstart, lockend, >> + cached_state); >> + ordered = btrfs_lookup_first_ordered_extent(inode, lockend); >> + >> + /* >> + * We need to make sure we have no ordered extents in this >> range >> + * and nobody raced in and read a page in this range, if we did >> + * we need to try again. >> + */ >> + if ((!ordered || >> + (ordered->file_offset + ordered->len <= lockstart || >> + ordered->file_offset > lockend)) && >> + !btrfs_page_exists_in_range(inode, lockstart, lockend)) { >> + if (ordered) >> + btrfs_put_ordered_extent(ordered); >> + break; >> + } >> + if (ordered) >> + btrfs_put_ordered_extent(ordered); >> + unlock_extent_cached(_I(inode)->io_tree, lockstart, >> + lockend, cached_state, GFP_NOFS); >> + ret = btrfs_wait_ordered_range(inode, lockstart, >> +lockend - lockstart + 1); >> + if (ret) >> + return ret; >> + } >> + return 0; >> +} >> + >> +static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len, >> + bool lock_inode) >> { >> struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); >> struct btrfs_root *root = BTRFS_I(inode)->root; >> @@ -2477,7 +2518,8 @@ static int btrfs_punch_hole(struct inode *inode, >> loff_t offset, loff_t len) >> if (ret) >> return ret; >> >> - inode_lock(inode); >> + if (lock_inode) >> + inode_lock(inode); >> ino_size = round_up(inode->i_size, fs_info->sectorsize); >> ret = find_first_non_hole(inode, , ); >> if (ret < 0) >> @@ -2516,7 +2558,8 @@ static int btrfs_punch_hole(struct inode *inode, >> loff_t offset, loff_t len) >> truncated_block = true; >> ret = btrfs_truncate_block(inode, offset, 0, 0); >> if (ret) { >> - inode_unlock(inode); >> + if (lock_inode) >> + inode_unlock(inode); >> return ret; >> } >> } >> @@ -2564,38 +2607,12 @@ static int btrfs_punch_hole(struct inode *inode, >> loff_t offset, loff_t len) >> goto out_only_mutex; >> } >> >> - while (1) { >> - struct btrfs_ordered_extent *ordered; >> - >> - truncate_pagecache_range(inode, lockstart, lockend); >> - >> - lock_extent_bits(_I(inode)->io_tree, lockstart, lockend, >> - _state); >> - ordered = btrfs_lookup_first_ordered_extent(inode, lockend); >> - >> - /* >> - * We need to make sure we have no ordered extents in this >> range >> - * and nobody raced in and read a page in this range, if we did >> - * we need to try again. >> - */ >> - if ((!ordered || >> - (ordered->file_offset + ordered->len <= lockstart || >> - ordered->file_offset > lockend)) && >> - !btrfs_page_exists_in_range(inode, lockstart, lockend)) { >> - if (ordered) >> - btrfs_put_ordered_extent(ordered); >>
Re: [PATCH v2] Btrfs: add support for fallocate's zero range operation
On 25.10.2017 17:59, fdman...@kernel.org wrote: > From: Filipe Manana> > This implements support the zero range operation of fallocate. For now > at least it's as simple as possible while reusing most of the existing > fallocate and hole punching infrastructure. > > Signed-off-by: Filipe Manana > --- > > V2: Removed double inode unlock on error path from failure to lock range. > > fs/btrfs/file.c | 332 > +--- > 1 file changed, 290 insertions(+), 42 deletions(-) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index aafcc785f840..e0d15c0d1641 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -2448,7 +2448,48 @@ static int find_first_non_hole(struct inode *inode, > u64 *start, u64 *len) > return ret; > } > > -static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) > +static int btrfs_punch_hole_lock_range(struct inode *inode, > +const u64 lockstart, > +const u64 lockend, > +struct extent_state **cached_state) > +{ > + while (1) { > + struct btrfs_ordered_extent *ordered; > + int ret; > + > + truncate_pagecache_range(inode, lockstart, lockend); > + > + lock_extent_bits(_I(inode)->io_tree, lockstart, lockend, > + cached_state); > + ordered = btrfs_lookup_first_ordered_extent(inode, lockend); > + > + /* > + * We need to make sure we have no ordered extents in this range > + * and nobody raced in and read a page in this range, if we did > + * we need to try again. > + */ > + if ((!ordered || > + (ordered->file_offset + ordered->len <= lockstart || > + ordered->file_offset > lockend)) && > + !btrfs_page_exists_in_range(inode, lockstart, lockend)) { > + if (ordered) > + btrfs_put_ordered_extent(ordered); > + break; > + } > + if (ordered) > + btrfs_put_ordered_extent(ordered); > + unlock_extent_cached(_I(inode)->io_tree, lockstart, > + lockend, cached_state, GFP_NOFS); > + ret = btrfs_wait_ordered_range(inode, lockstart, > +lockend - lockstart + 1); > + if (ret) > + return ret; > + } > + return 0; > +} > + > +static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len, > + bool lock_inode) > { > struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); > struct btrfs_root *root = BTRFS_I(inode)->root; > @@ -2477,7 +2518,8 @@ static int btrfs_punch_hole(struct inode *inode, loff_t > offset, loff_t len) > if (ret) > return ret; > > - inode_lock(inode); > + if (lock_inode) > + inode_lock(inode); > ino_size = round_up(inode->i_size, fs_info->sectorsize); > ret = find_first_non_hole(inode, , ); > if (ret < 0) > @@ -2516,7 +2558,8 @@ static int btrfs_punch_hole(struct inode *inode, loff_t > offset, loff_t len) > truncated_block = true; > ret = btrfs_truncate_block(inode, offset, 0, 0); > if (ret) { > - inode_unlock(inode); > + if (lock_inode) > + inode_unlock(inode); > return ret; > } > } > @@ -2564,38 +2607,12 @@ static int btrfs_punch_hole(struct inode *inode, > loff_t offset, loff_t len) > goto out_only_mutex; > } > > - while (1) { > - struct btrfs_ordered_extent *ordered; > - > - truncate_pagecache_range(inode, lockstart, lockend); > - > - lock_extent_bits(_I(inode)->io_tree, lockstart, lockend, > - _state); > - ordered = btrfs_lookup_first_ordered_extent(inode, lockend); > - > - /* > - * We need to make sure we have no ordered extents in this range > - * and nobody raced in and read a page in this range, if we did > - * we need to try again. > - */ > - if ((!ordered || > - (ordered->file_offset + ordered->len <= lockstart || > - ordered->file_offset > lockend)) && > - !btrfs_page_exists_in_range(inode, lockstart, lockend)) { > - if (ordered) > - btrfs_put_ordered_extent(ordered); > - break; > - } > - if (ordered) > - btrfs_put_ordered_extent(ordered); > - unlock_extent_cached(_I(inode)->io_tree,
Re: [PATCH 5/8] btrfs-progs: ctree: Introduce function to create an empty tree
On 10/27/2017 03:29 PM, Qu Wenruo wrote: Introduce a new function, btrfs_create_tree(), to create an empty tree. Currently, there is only one caller to create new tree, namely data reloc tree in mkfs. However it's copying fs tree to create a new root. This copy fs tree method is not a good idea if we only need an empty tree. So here introduce a new function, btrfs_create_tree() to create new tree. Which will handle the following things: 1) New tree root leaf Using generic tree allocation 2) New root item in tree root 3) Modify special tree root pointers in fs_info Only quota_root is supported yet, but can be expended easily This patch provides the basis to implement quota support in mkfs. Signed-off-by: Qu Wenruo--- ctree.c | 109 ctree.h | 2 ++ 2 files changed, 111 insertions(+) diff --git a/ctree.c b/ctree.c index 4fc33b14000a..c707be58c413 100644 --- a/ctree.c +++ b/ctree.c @@ -22,6 +22,7 @@ #include "repair.h" #include "internal.h" #include "sizes.h" +#include "utils.h" static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, int level); @@ -136,6 +137,114 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans, return 0; } +/* + * Create a new tree root, with root objectid set to @objectid. + * + * NOTE: Doesn't support tree with non-zero offset, like tree reloc tree. + */ +int btrfs_create_root(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info, u64 objectid) +{ + struct extent_buffer *node; + struct btrfs_root *new_root; + struct btrfs_disk_key disk_key; + struct btrfs_key location; + struct btrfs_root_item root_item = { 0 }; + int ret; + + new_root = malloc(sizeof(*new_root)); + if (!new_root) + return -ENOMEM; + + btrfs_setup_root(new_root, fs_info, objectid); + if (!is_fstree(objectid)) + new_root->track_dirty = 1; + add_root_to_dirty_list(new_root); Since add_root_to_dirty_list only add root which track_dirty != 0 to dirty list, why not write like the following? if (!is_fstree(objectid)) { new_root->track_dirty = 1; add_root_to_dirty_list(new_root); } + + new_root->objectid = objectid; + new_root->root_key.objectid = objectid; These have been initialized in btrfs_setup_root, so we don't need to initialize again. + new_root->root_key.type = BTRFS_ROOT_ITEM_KEY; + new_root->root_key.offset = 0; + + node = btrfs_alloc_free_block(trans, new_root, fs_info->nodesize, + objectid, _key, 0, 0, 0); + if (IS_ERR(node)) { + ret = PTR_ERR(node); + error("failed to create root node for tree %llu: %d (%s)", + objectid, ret, strerror(-ret)); + return ret; + } + new_root->node = node; + + btrfs_set_header_generation(node, trans->transid); + btrfs_set_header_backref_rev(node, BTRFS_MIXED_BACKREF_REV); + btrfs_clear_header_flag(node, BTRFS_HEADER_FLAG_RELOC | + BTRFS_HEADER_FLAG_WRITTEN); + btrfs_set_header_owner(node, objectid); + btrfs_set_header_nritems(node, 0); + btrfs_set_header_level(node, 0); + write_extent_buffer(node, fs_info->fsid, btrfs_header_fsid(), + BTRFS_FSID_SIZE); + ret = btrfs_inc_ref(trans, new_root, node, 0); + if (ret < 0) + goto free; + + /* +* Special tree roots may need to modify pointers in @fs_info +* Only quota is supported yet. +*/ + switch (objectid) { + case BTRFS_QUOTA_TREE_OBJECTID: + if (fs_info->quota_root) { + error("quota root already exists"); + ret = -EEXIST; + goto free; + } + fs_info->quota_root = new_root; + fs_info->quota_enabled = 1; + break; + /* +* Essential trees can't be created by this function, yet. +* As we expect such skeleton exists, or a lot of functions like +* btrfs_alloc_free_block() doesn't work at all +*/ + case BTRFS_ROOT_TREE_OBJECTID: + case BTRFS_EXTENT_TREE_OBJECTID: + case BTRFS_CHUNK_TREE_OBJECTID: + case BTRFS_FS_TREE_OBJECTID: + ret = -EEXIST; + goto free; + default: + /* Subvolume trees don't need special handles */ + if (is_fstree(objectid)) + break; + /* Other special trees are not supported yet */ + ret = -ENOTTY; + goto free; + } + btrfs_mark_buffer_dirty(node); + btrfs_set_root_bytenr(_item,
[PATCH RESEND 4/4] btrfs-progs: test: Add test image for lowmem mode referencer count mismatch false alert
Add a image which can reproduce the extent item referencer count mismatch false alert for lowmem mode. Reported-by: Marc MERLINSigned-off-by: Lu Fengqi --- .../ref_count_mismatch_false_alert.img | Bin 0 -> 4096 bytes 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 tests/fsck-tests/020-extent-ref-cases/ref_count_mismatch_false_alert.img diff --git a/tests/fsck-tests/020-extent-ref-cases/ref_count_mismatch_false_alert.img b/tests/fsck-tests/020-extent-ref-cases/ref_count_mismatch_false_alert.img new file mode 100644 index ..85110a813b5d00cb35d23babc70d57510cae19b0 GIT binary patch literal 4096 zcmeH}c|6oxAIE>Q7>1#-&$wtTF}mEwgG?nemXMMqOIb3S>}eQX#*#aBGA>1k8`yK>to-!TQWGJDmK5OLv(4g?)FJ-uijX$PT~x!ht)C zY5P1L4Blq?7v2ronfPws75J{e|40Gapo2(xdkdYH52nfE^?_C$u X-3`wy=*GP3mJul9nzPWam{Dy8Jm@HHSh;$0D%?#(CLV)}m z4=I4Q9U&_*-d#g>Vt=XgLnU`1QDX$?18gfj8HGr{0^h5Eif=UgaWL?esD!xAzoX zSPYMeybi82L>KA))qZ@PiatGF(7mnZRyApaxavmP17%d2>L@Te*1{-nwKu1W&2h)^EItwUC=(6YpcF!`TVtS0q ;I(2VAvqY@;m@EcAW6pd@gCaAJWrm=L DD!yE;pNYVL%W@MuW`_Z;`{%KeXc z6yV)hwUaVf1fT=6WYY$FN?k9wupm%KCipH*$G#z5H%XvAY#nOQCTEn4S1`={@Gxr= zt~K2#@Pm%O&0L?%-m|gq$ XuO~V6@K7~nFvdMFs;1g@2c>FiCFkFwOPj zJ@{D_aG peL|`wvq=c 0(z<`nlQb)97j(gm&49Ve)kpX| z<;B$0X?oL^)zQAzKjaU{M{UWA^-ua~yn7ruy@aAhosds;P119BS?So?C%gCglEl8$ z!q`l|sGp3QukgjlxR6qk7a0or0O>(q(`Mo3{wGxwNaFIDpC3^<@r`gzm*=R z(~pa6-#RgJ49%V|*lxGLiBV!o%B4?x1D`dX3BolS52|cfnP<$dmrPzBi`{wY&3LKa z^*j)&6)GvZYgkKjutQN{?yTvtlK~nv(x3YF=`p%HO0B^Mu2$SrFW>L|YDR+WSau}8 zGLxS(4p|G=KszkW2)*cgKmTUKZ|M)8_OesX>$?2jYiGPp=H$$4dtF)N)MQ;+U13 z`)AJ5wzSW=pI@2Vf2Pllk4;|F*k0jSy-7Gq`O|pI`P!Oy(LcCB!h09GCJS$6atEv* zH@lZkR9r7zb2jVUR4(!g6YmWL@* r;bEA4Np) V zpnjftsG)f^C+HBxJwT~T^|!1&$(M4_6-Z3IBDMBkp472TXb=iM^wPP8G#nv zC6qWNQ*1w+^Y~ZT^ et>M?!T0nVBQK+{Krai+mmboyL+aKu%`-FD=Vc6 zte)<=7aT3k0(l$=dFZ!PfHy4!dSx2@inv~XylqQqdaT@+G9X;+f_43L!#JKCUAHh8
[PATCH RESEND 3/4] btrfs-progs: lowmem check: Fix false alert about referencer count mismatch
The normal back reference counting doesn't care about the extent referred by the extent data in the shared leaf. The check_extent_data_backref function need to skip the leaf that owner mismatch with the root_id. Reported-by: Marc MERLINSigned-off-by: Lu Fengqi --- cmds-check.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/cmds-check.c b/cmds-check.c index 5750bb72..a93ac2c8 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -12468,7 +12468,8 @@ static int check_extent_data_backref(struct btrfs_fs_info *fs_info, leaf = path.nodes[0]; slot = path.slots[0]; - if (slot >= btrfs_header_nritems(leaf)) + if (slot >= btrfs_header_nritems(leaf) || + btrfs_header_owner(leaf) != root_id) goto next; btrfs_item_key_to_cpu(leaf, , slot); if (key.objectid != objectid || key.type != BTRFS_EXTENT_DATA_KEY) -- 2.15.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs seed question
Am Thu, 12 Oct 2017 09:20:28 -0400 schrieb Joseph Dunn: > On Thu, 12 Oct 2017 12:18:01 +0800 > Anand Jain wrote: > > > On 10/12/2017 08:47 AM, Joseph Dunn wrote: > > > After seeing how btrfs seeds work I wondered if it was possible > > > to push specific files from the seed to the rw device. I know > > > that removing the seed device will flush all the contents over to > > > the rw device, but what about flushing individual files on demand? > > > > > > I found that opening a file, reading the contents, seeking back > > > to 0, and writing out the contents does what I want, but I was > > > hoping for a bit less of a hack. > > > > > > Is there maybe an ioctl or something else that might trigger a > > > similar action? > > > >You mean to say - seed-device delete to trigger copy of only the > > specified or the modified files only, instead of whole of > > seed-device ? What's the use case around this ? > > > > Not quite. While the seed device is still connected I would like to > force some files over to the rw device. The use case is basically a > much slower link to a seed device holding significantly more data than > we currently need. An example would be a slower iscsi link to the > seed device and a local rw ssd. I would like fast access to a > certain subset of files, likely larger than the memory cache will > accommodate. If at a later time I want to discard the image as a > whole I could unmount the file system or if I want a full local copy > I could delete the seed-device to sync the fs. In the mean time I > would have access to all the files, with some slower (iscsi) and some > faster (ssd) and the ability to pick which ones are in the faster > group at the cost of one content transfer. > > I'm not necessarily looking for a new feature addition, just if there > is some existing call that I can make to push specific files from the > slow mirror to the fast one. If I had to push a significant amount of > metadata that would be fine, but the file contents feeding some > computations might be large and useful only to certain clients. > > So far I found that I can re-write the file with the same contents and > thanks to the lack of online dedupe these writes land on the rw mirror > so later reads to that file should not hit the slower mirror. By the > way, if I'm misunderstanding how the read process would work after the > file push please correct me. > > I hope this makes sense but I'll try to clarify further if you have > more questions. You could try to wrap something like bcache ontop of the iscsi device, then make it a read-mostly cache (like bcache write-around mode). This probably involves rewriting the iscsi contents to add a bcache header. You could try mdcache instead. Then you sacrifice a few gigabytes of local SSD storage of the caching layer. I guess that you're sharing the seed device with different machines. As bcache will add a protective superblock, you may need to thin-clone the seed image on the source to have independent superblocks per each bcache instance. Not sure how this applies to mdcache as I never used it. But the caching approach is probably the easiest way to go for you. And it's mostly automatic once deployed: you don't have to manually choose which files to move to the sprout... -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with file system
Am Tue, 31 Oct 2017 07:28:58 -0400 schrieb "Austin S. Hemmelgarn": > On 2017-10-31 01:57, Marat Khalili wrote: > > On 31/10/17 00:37, Chris Murphy wrote: > >> But off hand it sounds like hardware was sabotaging the expected > >> write ordering. How to test a given hardware setup for that, I > >> think, is really overdue. It affects literally every file system, > >> and Linux storage technology. > >> > >> It kinda sounds like to me something other than supers is being > >> overwritten too soon, and that's why it's possible for none of the > >> backup roots to find a valid root tree, because all four possible > >> root trees either haven't actually been written yet (still) or > >> they've been overwritten, even though the super is updated. But > >> again, it's speculation, we don't actually know why your system > >> was no longer mountable. > > Just a detached view: I know hardware should respect > > ordering/barriers and such, but how hard is it really to avoid > > overwriting at least one complete metadata tree for half an hour > > (even better, yet another one for a day)? Just metadata, not data > > extents. > If you're running on an SSD (or thinly provisioned storage, or > something else which supports discards) and have the 'discard' mount > option enabled, then there is no backup metadata tree (this issue was > mentioned on the list a while ago, but nobody ever replied), because > it's already been discarded. This is ideally something which should > be addressed (we need some sort of discard queue for handling in-line > discards), but it's not easy to address. > > Otherwise, it becomes a question of space usage on the filesystem, > and this is just another reason to keep some extra slack space on the > FS (though that doesn't help _much_, it does help). This, in theory, > could be addressed, but it probably can't be applied across mounts of > a filesystem without an on-disk format change. Well, maybe inline discard is working at the wrong level. It should kick in when the reference through any of the backup roots is dropped, not when the current instance is dropped. Without knowledge of the internals, I guess discards could be added to a queue within a new tree in btrfs, and only added to that queue when dropped from the last backup root referencing it. But this will probably add some bad performance spikes. I wonder how a regular fstrim run through cron applies to this problem? -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: defragmenting best practice?
Am Thu, 2 Nov 2017 22:47:31 -0400 schrieb Dave: > On Thu, Nov 2, 2017 at 5:16 PM, Kai Krakow > wrote: > > > > > You may want to try btrfs autodefrag mount option and see if it > > improves things (tho, the effect may take days or weeks to apply if > > you didn't enable it right from the creation of the filesystem). > > > > Also, autodefrag will probably unshare reflinks on your snapshots. > > You may be able to use bees[1] to work against this effect. Its > > interaction with autodefrag is not well tested but it works fine > > for me. Also, bees is able to reduce some of the fragmentation > > during deduplication because it will rewrite extents back into > > bigger chunks (but only for duplicated data). > > > > [1]: https://github.com/Zygo/bees > > I will look into bees. And yes, I plan to try autodefrag. (I already > have it enabled now.) However, I need to understand something about > how btrfs send-receive works in regard to reflinks and fragmentation. > > Say I have 2 snapshots on my live volume. The earlier one of them has > already been sent to another block device by btrfs send-receive (full > backup). Now defrag runs on the live volume and breaks some percentage > of the reflinks. At this point I do an incremental btrfs send-receive > using "-p" (or "-c") with the diff going to the same other block > device where the prior snapshot was already sent. > > Will reflinks be "made whole" (restored) on the receiving block > device? Or is the state of the source volume replicated so closely > that reflink status is the same on the target? > > Also, is fragmentation reduced on the receiving block device? > > My expectation is that fragmentation would be reduced and duplication > would be reduced too. In other words, does send-receive result in > defragmentation and deduplication too? As far as I understand, btrfs send/receive doesn't create an exact mirror. It just replays the block operations between generation numbers. That is: If it finds new blocks referenced between generations, it will write a _new_ block to the destination. So, no, it won't reduce fragmentation or duplication. It just keeps reflinks intact as long as such extents weren't touched within the generation range. Otherwise they are rewritten as new extents. Autodefrag and deduplication processes will as such probably increase duplication at the destination. A developer may have a better clue, tho. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: defragmenting best practice?
Am Fri, 3 Nov 2017 08:58:22 +0300 schrieb Marat Khalili: > On 02/11/17 04:39, Dave wrote: > > I'm going to make this change now. What would be a good way to > > implement this so that the change applies to the $HOME/.cache of > > each user? > I'd make each user's .cache a symlink (should work but if it won't > then bind mount) to a per-user directory in some separately mounted > volume with necessary options. On a systemd system, each user already has a private tmpfs location at /run/user/$(id -u). You could add to the central login script: # CACHE_DIR="/run/user/$(id -u)/cache" # mkdir -p $CACHE_DIR && ln -snf $CACHE_DIR $HOME/.cache You should not run this as root (because of mkdir -p). You could wrap it into an if statement: # if [ "$(whoami)" -ne "root" ]; then # ... # fi -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: defragmenting best practice?
Am Thu, 2 Nov 2017 22:59:36 -0400 schrieb Dave: > On Thu, Nov 2, 2017 at 7:07 AM, Austin S. Hemmelgarn > wrote: > > On 2017-11-01 21:39, Dave wrote: > >> I'm going to make this change now. What would be a good way to > >> implement this so that the change applies to the $HOME/.cache of > >> each user? > >> > >> The simple way would be to create a new subvolume for each existing > >> user and mount it at $HOME/.cache in /etc/fstab, hard coding that > >> mount location for each user. I don't mind doing that as there are > >> only 4 users to consider. One minor concern is that it adds an > >> unexpected step to the process of creating a new user. Is there a > >> better way? > >> > > The easiest option is to just make sure nobody is logged in and run > > the following shell script fragment: > > > > for dir in /home/* ; do > > rm -rf $dir/.cache > > btrfs subvolume create $dir/.cache > > done > > > > And then add something to the user creation scripts to create that > > subvolume. This approach won't pollute /etc/fstab, will still > > exclude the directory from snapshots, and doesn't require any > > hugely creative work to integrate with user creation and deletion. > > > > In general, the contents of the .cache directory are just that, > > cached data. Provided nobody is actively accessing it, it's > > perfectly safe to just nuke the entire directory... > > I like this suggestion. Thank you. I had intended to mount the .cache > subvolumes with the NODATACOW option. However, with this approach, I > won't be explicitly mounting the .cache subvolumes. Is it possible to > use "chattr +C $dir/.cache" in that loop even though it is a > subvolume? And, is setting the .cache directory to NODATACOW the right > choice given this scenario? From earlier comments, I believe it is, > but I want to be sure I understood correctly. It is important to apply "chattr +C" to the _empty_ directory, because even if used recursively, it won't apply to already existing, non-empty files. But the +C attribute is inherited by newly created files and directory: So simply follow the "chattr +C on empty directory" and you're all set. BTW: You cannot mount subvolumes from an already mounted btrfs device with different mount options. That is currently not implemented (except for maybe a very few options). So the fstab approach probably wouldn't have helped you (depending on your partition layout). You can simply just create subvolumes within the location needed and they are implicitly mounted. Then change the particular subvolume cow behavior with chattr. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
Am Thu, 2 Nov 2017 23:24:29 -0400 schrieb Dave: > On Thu, Nov 2, 2017 at 4:46 PM, Kai Krakow > wrote: > > Am Wed, 1 Nov 2017 02:51:58 -0400 > > schrieb Dave : > > > [...] > [...] > [...] > >> > >> Thanks for confirming. I must have missed those reports. I had > >> never considered this idea until now -- but I like it. > >> > >> Are there any blogs or wikis where people have done something > >> similar to what we are discussing here? > > > > I used rsync before, backup source and destination both were btrfs. > > I was experiencing the same btrfs bug from time to time on both > > devices, luckily not at the same time. > > > > I instead switched to using borgbackup, and xfs as the destination > > (to not fall the same-bug-in-two-devices pitfall). > > I'm going to stick with btrfs everywhere. My reasoning is that my > biggest pitfalls will be related to lack of knowledge. So focusing on > learning one filesystem better (vs poorly learning two) is the better > strategy for me, given my limited time. (I'm not an IT professional of > any sort.) > > Is there any problem with the Borgbackup repository being on btrfs? No. I just wanted to point out that keeping backup and source on different media (which includes different technology, too) is common best practice and adheres to the 3-2-1 backup strategy. > > Borgbackup achieves a > > much higher deduplication density and compression, and as such also > > is able to store much more backup history in the same storage > > space. The first run is much slower than rsync (due to enabled > > compression) but successive runs are much faster (like 20 minutes > > per backup run instead of 4-5 hours). > > > > I'm currently storing 107 TB of backup history in just 2.2 TB backup > > space, which counts a little more than one year of history now, > > containing 56 snapshots. This is my retention policy: > > > > * 5 yearly snapshots > > * 12 monthly snapshots > > * 14 weekly snapshots (worth around 3 months) > > * 30 daily snapshots > > > > Restore is fast enough, and a snapshot can even be fuse-mounted > > (tho, in that case mounted access can be very slow navigating > > directories). > > > > With latest borgbackup version, the backup time increased to around > > 1 hour from 15-20 minutes in the previous version. That is due to > > switching the file cache strategy from mtime to ctime. This can be > > tuned to get back to old performance, but it may miss some files > > during backup if you're doing awkward things to file timestamps. > > > > I'm also backing up some servers with it now, then use rsync to sync > > the borg repository to an offsite location. > > > > Combined with same-fs local btrfs snapshots with short retention > > times, this could be a viable solution for you. > > Yes, I appreciate the idea. I'm going to evaluate both rsync and > Borgbackup. > > The advantage of rsync, I think, is that it will likely run in just a > couple minutes. That will allow me to run it hourly and to keep my > live volume almost entire free of snapshots and fully defragmented. > It's also very simple as I already have rsync. And since I'm going to > run btrfs on the backup volume, I can perform hourly snapshots there > and use Snapper to manage retention. It's all very simple and relies > on tools I already have and know. > > However, the advantages of Borgbackup you mentioned (much higher > deduplication density and compression) make it worth considering. > Maybe Borgbackup won't take long to complete successive (incremental) > backups on my system. Once a full backup was taken, incremental backups are extremely fast. At least for me, it works much faster than rsync. And as with btrfs snapshots, each incremental backup is also a full backup. It's not like traditional backup software that needs the backup parent and grand parent to make use of the differential and/or incremental backups. There's one caveat, tho: Only one process can access a repository at a time, that is you need to serialize different backup jobs if you want them to go into the same repository. Deduplication is done only within the same repository. Tho, you might be able to leverage btrfs deduplication (e.g. using bees) across multiple repositories if you're not using encrypted repositories. But since you're currently using send/receive and/or rsync, encrypted storage of the backup doesn't seem to be an important point to you. Burp with its client/server approach may have an advantage here, so its setup seems to be more complicated. Borg is really easy to use. I never tried burp, tho. > I'll have to try it to see. It's a very nice > looking project. I'm surprised I never heard of it before. It seems to follow similar principles as burp (which I never heard of previously). It seems like the really good backup software has some sort of PR problem... ;-) -- Regards, Kai Replies to list-only preferred. --
Re: kernel BUG at fs/btrfs/ctree.h:3457!
Yes, the patch works. Enabled both CONFIG_BTRFS_FS_CHECK_INTEGRITY and CONFIG_BTRFS_FS_RUN_SANITY_TESTS. And applied above patch. This method also resolved the issue. thanks. Cheers, Lakshmipathi.G http://www.giis.co.in http://www.webminal.org -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html