Re: Incremental send/receive broken after snapshot restore

2018-06-30 Thread Andrei Borzenkov
01.07.2018 02:16, Marc MERLIN пишет:
> Sorry that I missed the beginning of this discussion, but I think this is
> what I documented here after hitting hte same problem:

This is similar, yes. IIRC you had different starting point though. Here
it should have been possible to use only standard documented tools
without need of low level surgery if done right.

> http://marc.merlins.org/perso/btrfs/post_2018-03-09_Btrfs-Tips_-Rescuing-A-Btrfs-Send-Receive-Relationship.html
> 

M-m-m ... statement "because the source had a Parent UUID value too, I
was actually supposed to set Received UUID on the destination to it" is
entirely off mark nor does it even match subsequent command. You
probably meant to say "because the source had a *Received* UUID value
too, I was actually supposed to set Received UUID on the destination to
it". That is correct. And that is what I meant above - received_uuid is
misnomer, it is actually used as common data set identifier. Two
subvolumes with the same received_uuid are presumed to have identical
content.

Which makes the very idea of being able to freely manipulate it rather
questionable.

P.S. of course "parent" is also highly ambiguous in btrfs world. We
really need to come up with acceptable terminology to disambiguate tree
parent, snapshot parent and replication parent. The latter should
probably better be called "base snapshot" (NetApp calls it "common
snapshot"); error message "Could not find base snapshot matching UUID
xxx" would be far less ambiguous.

> Marc
> 
> On Sun, Jul 01, 2018 at 01:03:37AM +0200, Hannes Schweizer wrote:
>> On Sat, Jun 30, 2018 at 10:02 PM Andrei Borzenkov  
>> wrote:
>>>
>>> 30.06.2018 21:49, Andrei Borzenkov пишет:
 30.06.2018 20:49, Hannes Schweizer пишет:
>>> ...
>
> I've tested a few restore methods beforehand, and simply creating a
> writeable clone from the restored snapshot does not work for me, eg:
> # create some source snapshots
> btrfs sub create test_root
> btrfs sub snap -r test_root test_snap1
> btrfs sub snap -r test_root test_snap2
>
> # send a full and incremental backup to external disk
> btrfs send test_snap2 | btrfs receive /run/media/schweizer/external
> btrfs sub snap -r test_root test_snap3
> btrfs send -c test_snap2 test_snap3 | btrfs receive
> /run/media/schweizer/external
>
> # simulate disappearing source
> btrfs sub del test_*
>
> # restore full snapshot from external disk
> btrfs send /run/media/schweizer/external/test_snap3 | btrfs receive .
>
> # create writeable clone
> btrfs sub snap test_snap3 test_root
>
> # try to continue with backup scheme from source to external
> btrfs sub snap -r test_root test_snap4
>
> # this fails!!
> btrfs send -c test_snap3 test_snap4 | btrfs receive
> /run/media/schweizer/external
> At subvol test_snap4
> ERROR: parent determination failed for 2047
> ERROR: empty stream is not considered valid
>

 Yes, that's expected. Incremental stream always needs valid parent -
 this will be cloned on destination and incremental changes applied to
 it. "-c" option is just additional sugar on top of it which might reduce
 size of stream, but in this case (i.e. without "-p") it also attempts to
 guess parent subvolume for test_snap4 and this fails because test_snap3
 and test_snap4 do not have common parent so test_snap3 is rejected as
 valid parent snapshot. You can restart incremental-forever chain by
 using explicit "-p" instead:

 btrfs send -p test_snap3 test_snap4

 Subsequent snapshots (test_snap5 etc) will all have common parent with
 immediate predecessor again so "-c" will work.

 Note that technically "btrfs send" with single "-c" option is entirely
 equivalent to "btrfs -p". Using "-p" would have avoided this issue. :)
 Although this implicit check for common parent may be considered a good
 thing in this case.

 P.S. looking at the above, it probably needs to be in manual page for
 btrfs-send. It took me quite some time to actually understand the
 meaning of "-p" and "-c" and behavior if they are present.

>>> ...
>
> Is there some way to reset the received_uuid of the following snapshot
> on online?
> ID 258 gen 13742 top level 5 parent_uuid -
>received_uuid 6c683d90-44f2-ad48-bb84-e9f241800179 uuid
> 46db1185-3c3e-194e-8d19-7456e532b2f3 path diablo
>

 There is no "official" tool but this question came up quite often.
 Search this list, I believe recently one-liner using python-btrfs was
 posted. Note that also patch that removes received_uuid when "ro"
 propery is removed was suggested, hopefully it will be merged at some
 point. Still I personally consider ability to flip read-only property
 the very bad thing that should have never been exposed in the first place.

>>>
>>> Note that if 

[PATCH] btrfs-progs: free-space-cache: Don't panic when free space cache is corrupted

2018-06-30 Thread Qu Wenruo
In btrfs_add_free_space(), if the free space to be added is already
here, we trigger ASSERT() which is just another BUG_ON().

Let's remove such BUG_ON() at all.

Reported-by: Lewis Diamond 
Signed-off-by: Qu Wenruo 
---
 free-space-cache.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/free-space-cache.c b/free-space-cache.c
index 9b83a71ca59a..2ef2d307cc5d 100644
--- a/free-space-cache.c
+++ b/free-space-cache.c
@@ -838,10 +838,8 @@ int btrfs_add_free_space(struct btrfs_free_space_ctl *ctl, 
u64 offset,
try_merge_free_space(ctl, info);
 
ret = link_free_space(ctl, info);
-   if (ret) {
+   if (ret)
printk(KERN_CRIT "btrfs: unable to add free space :%d\n", ret);
-   BUG_ON(ret == -EEXIST);
-   }
 
return ret;
 }
-- 
2.18.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check --readonly crash

2018-06-30 Thread Qu Wenruo


On 2018年07月01日 09:59, Lewis Diamond wrote:
> Hi,
> I've been told to report this issue to this mailing list.
> 
> sudo btrfs check --readonly /dev/sdc
> Checking filesystem on /dev/sdc
> UUID: 2630aec8-8399-4bd8-9397-8c04953a35d5
> checking extents
> checking free space cache
> there is no free space entry for 1596152365056-1596152381440
> there is no free space entry for 1596152365056-1597220323328
> cache appears valid but isn't 1596146581504
> there is no free space entry for 1613348585472-1613348618240
> there is no free space entry for 1613348585472-1614400192512
> cache appears valid but isn't 1613326450688
> block group 1645538705408 has wrong amount of free space, free space
> cache has 58212352 block group has 58277888
> failed to load free space cache for block group 1645538705408
> block group 1683119669248 has wrong amount of free space, free space
> cache has 52838400 block group has 52953088
> failed to load free space cache for block group 1683119669248
> btrfs: unable to add free space :-17

Your free space cache is corrupted.
And if not handled well, it could (may have already) damaged your fs
further.

You could try "btrfs check --clear-space-cache v1 " to remove the
free space cache completely and re-try "btrfs check --readonly" to see
if it works.

Thanks,
Qu

> free-space-cache.c:843: btrfs_add_free_space: BUG_ON `ret == -EEXIST`
> triggered, value 1
> btrfs(+0x37337)[0x556024d5d337]
> btrfs(btrfs_add_free_space+0x11d)[0x556024d5da2d]
> btrfs(load_free_space_cache+0xde9)[0x556024d5e889]
> btrfs(cmd_check+0x15fe)[0x556024d8b9ee]
> btrfs(main+0x88)[0x556024d38768]
> /usr/lib/libc.so.6(__libc_start_main+0xeb)[0x7f9382a9506b]
> btrfs(_start+0x2a)[0x556024d3888a]
> Aborted
> 
> 
> This happened on a USB HDD duo in raid 1 which had btrfs issues
> (probably due to a bad controller). One of the hdd was subsequently
> dropped by a 2 year old (causing subsequent checks to crash).
> 
> -Lewis
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


btrfs check --readonly crash

2018-06-30 Thread Lewis Diamond
Hi,
I've been told to report this issue to this mailing list.

sudo btrfs check --readonly /dev/sdc
Checking filesystem on /dev/sdc
UUID: 2630aec8-8399-4bd8-9397-8c04953a35d5
checking extents
checking free space cache
there is no free space entry for 1596152365056-1596152381440
there is no free space entry for 1596152365056-1597220323328
cache appears valid but isn't 1596146581504
there is no free space entry for 1613348585472-1613348618240
there is no free space entry for 1613348585472-1614400192512
cache appears valid but isn't 1613326450688
block group 1645538705408 has wrong amount of free space, free space
cache has 58212352 block group has 58277888
failed to load free space cache for block group 1645538705408
block group 1683119669248 has wrong amount of free space, free space
cache has 52838400 block group has 52953088
failed to load free space cache for block group 1683119669248
btrfs: unable to add free space :-17
free-space-cache.c:843: btrfs_add_free_space: BUG_ON `ret == -EEXIST`
triggered, value 1
btrfs(+0x37337)[0x556024d5d337]
btrfs(btrfs_add_free_space+0x11d)[0x556024d5da2d]
btrfs(load_free_space_cache+0xde9)[0x556024d5e889]
btrfs(cmd_check+0x15fe)[0x556024d8b9ee]
btrfs(main+0x88)[0x556024d38768]
/usr/lib/libc.so.6(__libc_start_main+0xeb)[0x7f9382a9506b]
btrfs(_start+0x2a)[0x556024d3888a]
Aborted


This happened on a USB HDD duo in raid 1 which had btrfs issues
(probably due to a bad controller). One of the hdd was subsequently
dropped by a 2 year old (causing subsequent checks to crash).

-Lewis
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A list of bugs in btrfs found by fuzzing

2018-06-30 Thread Xu, Wen
Dear BTRFS developers,

I would like to know if these issues are fixed or handled? Thanks.

-Wen

> On Jun 3, 2018, at 6:22 PM, Wen Xu  wrote:
> 
> Hi btrfs maintainers and developers,
> 
> Here are a list of bugs found in upstream kernel recently. Please check:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=199833
> https://bugzilla.kernel.org/show_bug.cgi?id=199835
> https://bugzilla.kernel.org/show_bug.cgi?id=199837
> https://bugzilla.kernel.org/show_bug.cgi?id=199839
> https://bugzilla.kernel.org/show_bug.cgi?id=199847
> https://bugzilla.kernel.org/show_bug.cgi?id=199849
> 
> Thanks,
> Wen

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Incremental send/receive broken after snapshot restore

2018-06-30 Thread Marc MERLIN
Sorry that I missed the beginning of this discussion, but I think this is
what I documented here after hitting hte same problem:
http://marc.merlins.org/perso/btrfs/post_2018-03-09_Btrfs-Tips_-Rescuing-A-Btrfs-Send-Receive-Relationship.html

Marc

On Sun, Jul 01, 2018 at 01:03:37AM +0200, Hannes Schweizer wrote:
> On Sat, Jun 30, 2018 at 10:02 PM Andrei Borzenkov  wrote:
> >
> > 30.06.2018 21:49, Andrei Borzenkov пишет:
> > > 30.06.2018 20:49, Hannes Schweizer пишет:
> > ...
> > >>
> > >> I've tested a few restore methods beforehand, and simply creating a
> > >> writeable clone from the restored snapshot does not work for me, eg:
> > >> # create some source snapshots
> > >> btrfs sub create test_root
> > >> btrfs sub snap -r test_root test_snap1
> > >> btrfs sub snap -r test_root test_snap2
> > >>
> > >> # send a full and incremental backup to external disk
> > >> btrfs send test_snap2 | btrfs receive /run/media/schweizer/external
> > >> btrfs sub snap -r test_root test_snap3
> > >> btrfs send -c test_snap2 test_snap3 | btrfs receive
> > >> /run/media/schweizer/external
> > >>
> > >> # simulate disappearing source
> > >> btrfs sub del test_*
> > >>
> > >> # restore full snapshot from external disk
> > >> btrfs send /run/media/schweizer/external/test_snap3 | btrfs receive .
> > >>
> > >> # create writeable clone
> > >> btrfs sub snap test_snap3 test_root
> > >>
> > >> # try to continue with backup scheme from source to external
> > >> btrfs sub snap -r test_root test_snap4
> > >>
> > >> # this fails!!
> > >> btrfs send -c test_snap3 test_snap4 | btrfs receive
> > >> /run/media/schweizer/external
> > >> At subvol test_snap4
> > >> ERROR: parent determination failed for 2047
> > >> ERROR: empty stream is not considered valid
> > >>
> > >
> > > Yes, that's expected. Incremental stream always needs valid parent -
> > > this will be cloned on destination and incremental changes applied to
> > > it. "-c" option is just additional sugar on top of it which might reduce
> > > size of stream, but in this case (i.e. without "-p") it also attempts to
> > > guess parent subvolume for test_snap4 and this fails because test_snap3
> > > and test_snap4 do not have common parent so test_snap3 is rejected as
> > > valid parent snapshot. You can restart incremental-forever chain by
> > > using explicit "-p" instead:
> > >
> > > btrfs send -p test_snap3 test_snap4
> > >
> > > Subsequent snapshots (test_snap5 etc) will all have common parent with
> > > immediate predecessor again so "-c" will work.
> > >
> > > Note that technically "btrfs send" with single "-c" option is entirely
> > > equivalent to "btrfs -p". Using "-p" would have avoided this issue. :)
> > > Although this implicit check for common parent may be considered a good
> > > thing in this case.
> > >
> > > P.S. looking at the above, it probably needs to be in manual page for
> > > btrfs-send. It took me quite some time to actually understand the
> > > meaning of "-p" and "-c" and behavior if they are present.
> > >
> > ...
> > >>
> > >> Is there some way to reset the received_uuid of the following snapshot
> > >> on online?
> > >> ID 258 gen 13742 top level 5 parent_uuid -
> > >>received_uuid 6c683d90-44f2-ad48-bb84-e9f241800179 uuid
> > >> 46db1185-3c3e-194e-8d19-7456e532b2f3 path diablo
> > >>
> > >
> > > There is no "official" tool but this question came up quite often.
> > > Search this list, I believe recently one-liner using python-btrfs was
> > > posted. Note that also patch that removes received_uuid when "ro"
> > > propery is removed was suggested, hopefully it will be merged at some
> > > point. Still I personally consider ability to flip read-only property
> > > the very bad thing that should have never been exposed in the first place.
> > >
> >
> > Note that if you remove received_uuid (explicitly or - in the future -
> > implicitly) you will not be able to restart incremental send anymore.
> > Without received_uuid there will be no way to match source test_snap3
> > with destination test_snap3. So you *must* preserve it and start with
> > writable clone.
> >
> > received_uuid is misnomer. I wish it would be named "content_uuid" or
> > "snap_uuid" with semantic
> >
> > 1. When read-only snapshot of writable volume is created, content_uuid
> > is initialized
> >
> > 2. Read-only snapshot of read-only snapshot inherits content_uuid
> >
> > 3. destination of "btrfs send" inherits content_uuid
> >
> > 4. writable snapshot of read-only snapshot clears content_uuid
> >
> > 5. clearing read-only property clears content_uuid
> >
> > This would make it more straightforward to cascade and restart
> > replication by having single subvolume property to match against.
> 
> Indeed, the current terminology is a bit confusing, and the patch
> removing the received_uuid when manually switching ro to false should
> definitely be merged. As recommended, I'll simply create a writeable
> clone of the restored snapshot and use -p instead of -c when restoring
> again 

Re: Incremental send/receive broken after snapshot restore

2018-06-30 Thread Hannes Schweizer
On Sat, Jun 30, 2018 at 10:02 PM Andrei Borzenkov  wrote:
>
> 30.06.2018 21:49, Andrei Borzenkov пишет:
> > 30.06.2018 20:49, Hannes Schweizer пишет:
> ...
> >>
> >> I've tested a few restore methods beforehand, and simply creating a
> >> writeable clone from the restored snapshot does not work for me, eg:
> >> # create some source snapshots
> >> btrfs sub create test_root
> >> btrfs sub snap -r test_root test_snap1
> >> btrfs sub snap -r test_root test_snap2
> >>
> >> # send a full and incremental backup to external disk
> >> btrfs send test_snap2 | btrfs receive /run/media/schweizer/external
> >> btrfs sub snap -r test_root test_snap3
> >> btrfs send -c test_snap2 test_snap3 | btrfs receive
> >> /run/media/schweizer/external
> >>
> >> # simulate disappearing source
> >> btrfs sub del test_*
> >>
> >> # restore full snapshot from external disk
> >> btrfs send /run/media/schweizer/external/test_snap3 | btrfs receive .
> >>
> >> # create writeable clone
> >> btrfs sub snap test_snap3 test_root
> >>
> >> # try to continue with backup scheme from source to external
> >> btrfs sub snap -r test_root test_snap4
> >>
> >> # this fails!!
> >> btrfs send -c test_snap3 test_snap4 | btrfs receive
> >> /run/media/schweizer/external
> >> At subvol test_snap4
> >> ERROR: parent determination failed for 2047
> >> ERROR: empty stream is not considered valid
> >>
> >
> > Yes, that's expected. Incremental stream always needs valid parent -
> > this will be cloned on destination and incremental changes applied to
> > it. "-c" option is just additional sugar on top of it which might reduce
> > size of stream, but in this case (i.e. without "-p") it also attempts to
> > guess parent subvolume for test_snap4 and this fails because test_snap3
> > and test_snap4 do not have common parent so test_snap3 is rejected as
> > valid parent snapshot. You can restart incremental-forever chain by
> > using explicit "-p" instead:
> >
> > btrfs send -p test_snap3 test_snap4
> >
> > Subsequent snapshots (test_snap5 etc) will all have common parent with
> > immediate predecessor again so "-c" will work.
> >
> > Note that technically "btrfs send" with single "-c" option is entirely
> > equivalent to "btrfs -p". Using "-p" would have avoided this issue. :)
> > Although this implicit check for common parent may be considered a good
> > thing in this case.
> >
> > P.S. looking at the above, it probably needs to be in manual page for
> > btrfs-send. It took me quite some time to actually understand the
> > meaning of "-p" and "-c" and behavior if they are present.
> >
> ...
> >>
> >> Is there some way to reset the received_uuid of the following snapshot
> >> on online?
> >> ID 258 gen 13742 top level 5 parent_uuid -
> >>received_uuid 6c683d90-44f2-ad48-bb84-e9f241800179 uuid
> >> 46db1185-3c3e-194e-8d19-7456e532b2f3 path diablo
> >>
> >
> > There is no "official" tool but this question came up quite often.
> > Search this list, I believe recently one-liner using python-btrfs was
> > posted. Note that also patch that removes received_uuid when "ro"
> > propery is removed was suggested, hopefully it will be merged at some
> > point. Still I personally consider ability to flip read-only property
> > the very bad thing that should have never been exposed in the first place.
> >
>
> Note that if you remove received_uuid (explicitly or - in the future -
> implicitly) you will not be able to restart incremental send anymore.
> Without received_uuid there will be no way to match source test_snap3
> with destination test_snap3. So you *must* preserve it and start with
> writable clone.
>
> received_uuid is misnomer. I wish it would be named "content_uuid" or
> "snap_uuid" with semantic
>
> 1. When read-only snapshot of writable volume is created, content_uuid
> is initialized
>
> 2. Read-only snapshot of read-only snapshot inherits content_uuid
>
> 3. destination of "btrfs send" inherits content_uuid
>
> 4. writable snapshot of read-only snapshot clears content_uuid
>
> 5. clearing read-only property clears content_uuid
>
> This would make it more straightforward to cascade and restart
> replication by having single subvolume property to match against.

Indeed, the current terminology is a bit confusing, and the patch
removing the received_uuid when manually switching ro to false should
definitely be merged. As recommended, I'll simply create a writeable
clone of the restored snapshot and use -p instead of -c when restoring
again (which kind of snapshot relations are accepted for incremental
send/receive needs better documentation)

Fortunately, with all your hints regarding received_uuid I was able to
successfully restart the incremental-chain WITHOUT restarting from
scratch:
# replace incorrectly propagated received_uuid on destination with
actual uuid of source snapshot
btrfs property set
/run/media/schweizer/external/diablo_external.2018-06-24T19-37-39 ro
false
set_received_uuid.py de9421c5-d160-2949-bf09-613949b4611c 1089 0.0

Re: So, does btrfs check lowmem take days? weeks?

2018-06-30 Thread Marc MERLIN
On Sat, Jun 30, 2018 at 10:49:07PM +0800, Qu Wenruo wrote:
> But the last abort looks pretty possible to be the culprit.
> 
> Would you try to dump the extent tree?
> # btrfs inspect dump-tree -t extent  | grep -A50 156909494272

Sure, there you go:

item 25 key (156909494272 EXTENT_ITEM 55320576) itemoff 14943 itemsize 
24
refs 19715 gen 31575 flags DATA
item 26 key (156909494272 EXTENT_DATA_REF 571620086735451015) itemoff 
14915 itemsize 28
extent data backref root 21641 objectid 374857 offset 235175936 
count 1452
item 27 key (156909494272 EXTENT_DATA_REF 1765833482087969671) itemoff 
14887 itemsize 28
extent data backref root 23094 objectid 374857 offset 235175936 
count 1442
item 28 key (156909494272 EXTENT_DATA_REF 1807626434455810951) itemoff 
14859 itemsize 28
extent data backref root 21503 objectid 374857 offset 235175936 
count 1454
item 29 key (156909494272 EXTENT_DATA_REF 1879818091602916231) itemoff 
14831 itemsize 28
extent data backref root 21462 objectid 374857 offset 235175936 
count 1454
item 30 key (156909494272 EXTENT_DATA_REF 3610854505775117191) itemoff 
14803 itemsize 28
extent data backref root 23134 objectid 374857 offset 235175936 
count 1442
item 31 key (156909494272 EXTENT_DATA_REF 3754675454231458695) itemoff 
14775 itemsize 28
extent data backref root 23052 objectid 374857 offset 235175936 
count 1442
item 32 key (156909494272 EXTENT_DATA_REF 5060494667839714183) itemoff 
14747 itemsize 28
extent data backref root 23174 objectid 374857 offset 235175936 
count 1440
item 33 key (156909494272 EXTENT_DATA_REF 5476627808561673095) itemoff 
14719 itemsize 28
extent data backref root 22911 objectid 374857 offset 235175936 
count 1
item 34 key (156909494272 EXTENT_DATA_REF 6378484416458011527) itemoff 
14691 itemsize 28
extent data backref root 23012 objectid 374857 offset 235175936 
count 1442
item 35 key (156909494272 EXTENT_DATA_REF 7338474132555182983) itemoff 
14663 itemsize 28
extent data backref root 21872 objectid 374857 offset 235175936 
count 1
item 36 key (156909494272 EXTENT_DATA_REF 7516565391717970823) itemoff 
14635 itemsize 28
extent data backref root 21826 objectid 374857 offset 235175936 
count 1452
item 37 key (156909494272 SHARED_DATA_REF 14871537025024) itemoff 14631 
itemsize 4
shared data backref count 10
item 38 key (156909494272 SHARED_DATA_REF 14871617568768) itemoff 14627 
itemsize 4
shared data backref count 73
item 39 key (156909494272 SHARED_DATA_REF 14871619846144) itemoff 14623 
itemsize 4
shared data backref count 59
item 40 key (156909494272 SHARED_DATA_REF 14871623270400) itemoff 14619 
itemsize 4
shared data backref count 68
item 41 key (156909494272 SHARED_DATA_REF 14871623532544) itemoff 14615 
itemsize 4
shared data backref count 70
item 42 key (156909494272 SHARED_DATA_REF 14871626383360) itemoff 14611 
itemsize 4
shared data backref count 76
item 43 key (156909494272 SHARED_DATA_REF 14871635132416) itemoff 14607 
itemsize 4
shared data backref count 60
item 44 key (156909494272 SHARED_DATA_REF 14871649533952) itemoff 14603 
itemsize 4
shared data backref count 79
item 45 key (156909494272 SHARED_DATA_REF 14871862378496) itemoff 14599 
itemsize 4
shared data backref count 70
item 46 key (156909494272 SHARED_DATA_REF 14909667098624) itemoff 14595 
itemsize 4
shared data backref count 72
item 47 key (156909494272 SHARED_DATA_REF 14909669720064) itemoff 14591 
itemsize 4
shared data backref count 58
item 48 key (156909494272 SHARED_DATA_REF 14909734567936) itemoff 14587 
itemsize 4
shared data backref count 73
item 49 key (156909494272 SHARED_DATA_REF 14909920477184) itemoff 14583 
itemsize 4
shared data backref count 79
item 50 key (156909494272 SHARED_DATA_REF 14942279335936) itemoff 14579 
itemsize 4
shared data backref count 79
item 51 key (156909494272 SHARED_DATA_REF 14942304862208) itemoff 14575 
itemsize 4
shared data backref count 72
item 52 key (156909494272 SHARED_DATA_REF 14942348378112) itemoff 14571 
itemsize 4
shared data backref count 67
item 53 key (156909494272 SHARED_DATA_REF 14942366138368) itemoff 14567 
itemsize 4
shared data backref count 51
item 54 key (156909494272 SHARED_DATA_REF 14942384799744) itemoff 14563 
itemsize 4
shared data backref count 64
item 55 key (156909494272 SHARED_DATA_REF 14978234613760) 

Re: Incremental send/receive broken after snapshot restore

2018-06-30 Thread Andrei Borzenkov
30.06.2018 21:49, Andrei Borzenkov пишет:
> 30.06.2018 20:49, Hannes Schweizer пишет:
...
>>
>> I've tested a few restore methods beforehand, and simply creating a
>> writeable clone from the restored snapshot does not work for me, eg:
>> # create some source snapshots
>> btrfs sub create test_root
>> btrfs sub snap -r test_root test_snap1
>> btrfs sub snap -r test_root test_snap2
>>
>> # send a full and incremental backup to external disk
>> btrfs send test_snap2 | btrfs receive /run/media/schweizer/external
>> btrfs sub snap -r test_root test_snap3
>> btrfs send -c test_snap2 test_snap3 | btrfs receive
>> /run/media/schweizer/external
>>
>> # simulate disappearing source
>> btrfs sub del test_*
>>
>> # restore full snapshot from external disk
>> btrfs send /run/media/schweizer/external/test_snap3 | btrfs receive .
>>
>> # create writeable clone
>> btrfs sub snap test_snap3 test_root
>>
>> # try to continue with backup scheme from source to external
>> btrfs sub snap -r test_root test_snap4
>>
>> # this fails!!
>> btrfs send -c test_snap3 test_snap4 | btrfs receive
>> /run/media/schweizer/external
>> At subvol test_snap4
>> ERROR: parent determination failed for 2047
>> ERROR: empty stream is not considered valid
>>
> 
> Yes, that's expected. Incremental stream always needs valid parent -
> this will be cloned on destination and incremental changes applied to
> it. "-c" option is just additional sugar on top of it which might reduce
> size of stream, but in this case (i.e. without "-p") it also attempts to
> guess parent subvolume for test_snap4 and this fails because test_snap3
> and test_snap4 do not have common parent so test_snap3 is rejected as
> valid parent snapshot. You can restart incremental-forever chain by
> using explicit "-p" instead:
> 
> btrfs send -p test_snap3 test_snap4
> 
> Subsequent snapshots (test_snap5 etc) will all have common parent with
> immediate predecessor again so "-c" will work.
> 
> Note that technically "btrfs send" with single "-c" option is entirely
> equivalent to "btrfs -p". Using "-p" would have avoided this issue. :)
> Although this implicit check for common parent may be considered a good
> thing in this case.
> 
> P.S. looking at the above, it probably needs to be in manual page for
> btrfs-send. It took me quite some time to actually understand the
> meaning of "-p" and "-c" and behavior if they are present.
> 
...
>>
>> Is there some way to reset the received_uuid of the following snapshot
>> on online?
>> ID 258 gen 13742 top level 5 parent_uuid -
>>received_uuid 6c683d90-44f2-ad48-bb84-e9f241800179 uuid
>> 46db1185-3c3e-194e-8d19-7456e532b2f3 path diablo
>>
> 
> There is no "official" tool but this question came up quite often.
> Search this list, I believe recently one-liner using python-btrfs was
> posted. Note that also patch that removes received_uuid when "ro"
> propery is removed was suggested, hopefully it will be merged at some
> point. Still I personally consider ability to flip read-only property
> the very bad thing that should have never been exposed in the first place.
> 

Note that if you remove received_uuid (explicitly or - in the future -
implicitly) you will not be able to restart incremental send anymore.
Without received_uuid there will be no way to match source test_snap3
with destination test_snap3. So you *must* preserve it and start with
writable clone.

received_uuid is misnomer. I wish it would be named "content_uuid" or
"snap_uuid" with semantic

1. When read-only snapshot of writable volume is created, content_uuid
is initialized

2. Read-only snapshot of read-only snapshot inherits content_uuid

3. destination of "btrfs send" inherits content_uuid

4. writable snapshot of read-only snapshot clears content_uuid

5. clearing read-only property clears content_uuid

This would make it more straightforward to cascade and restart
replication by having single subvolume property to match against.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Incremental send/receive broken after snapshot restore

2018-06-30 Thread Andrei Borzenkov
30.06.2018 20:49, Hannes Schweizer пишет:
> On Sat, Jun 30, 2018 at 8:24 AM Andrei Borzenkov  wrote:
>>
>> Do not reply privately to mails on list.
>>
>> 29.06.2018 22:10, Hannes Schweizer пишет:
>>> On Fri, Jun 29, 2018 at 7:44 PM Andrei Borzenkov  
>>> wrote:

 28.06.2018 23:09, Hannes Schweizer пишет:
> Hi,
>
> Here's my environment:
> Linux diablo 4.17.0-gentoo #5 SMP Mon Jun 25 00:26:55 CEST 2018 x86_64
> Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz GenuineIntel GNU/Linux
> btrfs-progs v4.17
>
> Label: 'online'  uuid: e4dc6617-b7ed-4dfb-84a6-26e3952c8390
> Total devices 2 FS bytes used 3.16TiB
> devid1 size 1.82TiB used 1.58TiB path /dev/mapper/online0
> devid2 size 1.82TiB used 1.58TiB path /dev/mapper/online1
> Data, RAID0: total=3.16TiB, used=3.15TiB
> System, RAID0: total=16.00MiB, used=240.00KiB
> Metadata, RAID0: total=7.00GiB, used=4.91GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> Label: 'offline'  uuid: 5b449116-93e5-473e-aaf5-bf3097b14f29
> Total devices 2 FS bytes used 3.52TiB
> devid1 size 5.46TiB used 3.53TiB path /dev/mapper/offline0
> devid2 size 5.46TiB used 3.53TiB path /dev/mapper/offline1
> Data, RAID1: total=3.52TiB, used=3.52TiB
> System, RAID1: total=8.00MiB, used=512.00KiB
> Metadata, RAID1: total=6.00GiB, used=5.11GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> Label: 'external'  uuid: 8bf13621-01f0-4f09-95c7-2c157d3087d0
> Total devices 1 FS bytes used 3.65TiB
> devid1 size 5.46TiB used 3.66TiB path
> /dev/mapper/luks-3c196e96-d46c-4a9c-9583-b79c707678fc
> Data, single: total=3.64TiB, used=3.64TiB
> System, DUP: total=32.00MiB, used=448.00KiB
> Metadata, DUP: total=11.00GiB, used=9.72GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
>
> The following automatic backup scheme is in place:
> hourly:
> btrfs sub snap -r online/root online/root.
>
> daily:
> btrfs sub snap -r online/root online/root.
> btrfs send -c online/root.
> online/root. | btrfs receive offline
> btrfs sub del -c online/root.
>
> monthly:
> btrfs sub snap -r online/root online/root.
> btrfs send -c online/root.
> online/root. | btrfs receive external
> btrfs sub del -c online/root.
>
> Now here are the commands leading up to my problem:
> After the online filesystem suddenly went ro, and btrfs check showed
> massive problems, I decided to start the online array from scratch:
> 1: mkfs.btrfs -f -d raid0 -m raid0 -L "online" /dev/mapper/online0
> /dev/mapper/online1
>
> As you can see from the backup commands above, the snapshots of
> offline and external are not related, so in order to at least keep the
> extensive backlog of the external snapshot set (including all
> reflinks), I decided to restore the latest snapshot from external.
> 2: btrfs send external/root. | btrfs receive online
>
> I wanted to ensure I can restart the incremental backup flow from
> online to external, so I did this
> 3: mv online/root. online/root
> 4: btrfs sub snap -r online/root online/root.
> 5: btrfs property set online/root ro false
>
> Now, I naively expected a simple restart of my automatic backups for
> external should work.
> However after running
> 6: btrfs sub snap -r online/root online/root.
> 7: btrfs send -c online/root.
> online/root. | btrfs receive external

 You just recreated your "online" filesystem from scratch. Where
 "old_external_reference" comes from? You did not show steps used to
 create it.

> I see the following error:
> ERROR: unlink root/.ssh/agent-diablo-_dev_pts_3 failed. No such file
> or directory
>
> Which is unfortunate, but the second problem actually encouraged me to
> post this message.
> As planned, I had to start the offline array from scratch as well,
> because I no longer had any reference snapshot for incremental backups
> on other devices:
> 8: mkfs.btrfs -f -d raid1 -m raid1 -L "offline" /dev/mapper/offline0
> /dev/mapper/offline1
>
> However restarting the automatic daily backup flow bails out with a
> similar error, although no potentially problematic previous
> incremental snapshots should be involved here!
> ERROR: unlink o925031-987-0/2139527549 failed. No such file or directory
>

 Again - before you can *re*start incremental-forever sequence you need
 initial full copy. How exactly did you restart it if no snapshots exist
 either on source or on destination?
>>>
>>> Thanks for your help regarding this issue!
>>>
>>> Before the online crash, I've used the following online -> external
>>> backup scheme:
>>> btrfs sub snap -r online/root online/root.
>>> btrfs send 

Re: Incremental send/receive broken after snapshot restore

2018-06-30 Thread Hannes Schweizer
On Sat, Jun 30, 2018 at 8:24 AM Andrei Borzenkov  wrote:
>
> Do not reply privately to mails on list.
>
> 29.06.2018 22:10, Hannes Schweizer пишет:
> > On Fri, Jun 29, 2018 at 7:44 PM Andrei Borzenkov  
> > wrote:
> >>
> >> 28.06.2018 23:09, Hannes Schweizer пишет:
> >>> Hi,
> >>>
> >>> Here's my environment:
> >>> Linux diablo 4.17.0-gentoo #5 SMP Mon Jun 25 00:26:55 CEST 2018 x86_64
> >>> Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz GenuineIntel GNU/Linux
> >>> btrfs-progs v4.17
> >>>
> >>> Label: 'online'  uuid: e4dc6617-b7ed-4dfb-84a6-26e3952c8390
> >>> Total devices 2 FS bytes used 3.16TiB
> >>> devid1 size 1.82TiB used 1.58TiB path /dev/mapper/online0
> >>> devid2 size 1.82TiB used 1.58TiB path /dev/mapper/online1
> >>> Data, RAID0: total=3.16TiB, used=3.15TiB
> >>> System, RAID0: total=16.00MiB, used=240.00KiB
> >>> Metadata, RAID0: total=7.00GiB, used=4.91GiB
> >>> GlobalReserve, single: total=512.00MiB, used=0.00B
> >>>
> >>> Label: 'offline'  uuid: 5b449116-93e5-473e-aaf5-bf3097b14f29
> >>> Total devices 2 FS bytes used 3.52TiB
> >>> devid1 size 5.46TiB used 3.53TiB path /dev/mapper/offline0
> >>> devid2 size 5.46TiB used 3.53TiB path /dev/mapper/offline1
> >>> Data, RAID1: total=3.52TiB, used=3.52TiB
> >>> System, RAID1: total=8.00MiB, used=512.00KiB
> >>> Metadata, RAID1: total=6.00GiB, used=5.11GiB
> >>> GlobalReserve, single: total=512.00MiB, used=0.00B
> >>>
> >>> Label: 'external'  uuid: 8bf13621-01f0-4f09-95c7-2c157d3087d0
> >>> Total devices 1 FS bytes used 3.65TiB
> >>> devid1 size 5.46TiB used 3.66TiB path
> >>> /dev/mapper/luks-3c196e96-d46c-4a9c-9583-b79c707678fc
> >>> Data, single: total=3.64TiB, used=3.64TiB
> >>> System, DUP: total=32.00MiB, used=448.00KiB
> >>> Metadata, DUP: total=11.00GiB, used=9.72GiB
> >>> GlobalReserve, single: total=512.00MiB, used=0.00B
> >>>
> >>>
> >>> The following automatic backup scheme is in place:
> >>> hourly:
> >>> btrfs sub snap -r online/root online/root.
> >>>
> >>> daily:
> >>> btrfs sub snap -r online/root online/root.
> >>> btrfs send -c online/root.
> >>> online/root. | btrfs receive offline
> >>> btrfs sub del -c online/root.
> >>>
> >>> monthly:
> >>> btrfs sub snap -r online/root online/root.
> >>> btrfs send -c online/root.
> >>> online/root. | btrfs receive external
> >>> btrfs sub del -c online/root.
> >>>
> >>> Now here are the commands leading up to my problem:
> >>> After the online filesystem suddenly went ro, and btrfs check showed
> >>> massive problems, I decided to start the online array from scratch:
> >>> 1: mkfs.btrfs -f -d raid0 -m raid0 -L "online" /dev/mapper/online0
> >>> /dev/mapper/online1
> >>>
> >>> As you can see from the backup commands above, the snapshots of
> >>> offline and external are not related, so in order to at least keep the
> >>> extensive backlog of the external snapshot set (including all
> >>> reflinks), I decided to restore the latest snapshot from external.
> >>> 2: btrfs send external/root. | btrfs receive online
> >>>
> >>> I wanted to ensure I can restart the incremental backup flow from
> >>> online to external, so I did this
> >>> 3: mv online/root. online/root
> >>> 4: btrfs sub snap -r online/root online/root.
> >>> 5: btrfs property set online/root ro false
> >>>
> >>> Now, I naively expected a simple restart of my automatic backups for
> >>> external should work.
> >>> However after running
> >>> 6: btrfs sub snap -r online/root online/root.
> >>> 7: btrfs send -c online/root.
> >>> online/root. | btrfs receive external
> >>
> >> You just recreated your "online" filesystem from scratch. Where
> >> "old_external_reference" comes from? You did not show steps used to
> >> create it.
> >>
> >>> I see the following error:
> >>> ERROR: unlink root/.ssh/agent-diablo-_dev_pts_3 failed. No such file
> >>> or directory
> >>>
> >>> Which is unfortunate, but the second problem actually encouraged me to
> >>> post this message.
> >>> As planned, I had to start the offline array from scratch as well,
> >>> because I no longer had any reference snapshot for incremental backups
> >>> on other devices:
> >>> 8: mkfs.btrfs -f -d raid1 -m raid1 -L "offline" /dev/mapper/offline0
> >>> /dev/mapper/offline1
> >>>
> >>> However restarting the automatic daily backup flow bails out with a
> >>> similar error, although no potentially problematic previous
> >>> incremental snapshots should be involved here!
> >>> ERROR: unlink o925031-987-0/2139527549 failed. No such file or directory
> >>>
> >>
> >> Again - before you can *re*start incremental-forever sequence you need
> >> initial full copy. How exactly did you restart it if no snapshots exist
> >> either on source or on destination?
> >
> > Thanks for your help regarding this issue!
> >
> > Before the online crash, I've used the following online -> external
> > backup scheme:
> > btrfs sub snap -r online/root online/root.
> > btrfs send -c online/root.
> > online/root. | btrfs receive 

Re: So, does btrfs check lowmem take days? weeks?

2018-06-30 Thread Qu Wenruo



On 2018年06月30日 10:44, Marc MERLIN wrote:
> Well, there goes that. After about 18H:
> ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 235175936) wanted: 1, have: 1452 
> backref.c:466: __add_missing_keys: Assertion `ref->root_id` failed, value 0 
> btrfs(+0x3a232)[0x56091704f232] 
> btrfs(+0x3ab46)[0x56091704fb46] 
> btrfs(+0x3b9f5)[0x5609170509f5] 
> btrfs(btrfs_find_all_roots+0x9)[0x560917050a45] 
> btrfs(+0x572ff)[0x56091706c2ff] 
> btrfs(+0x60b13)[0x560917075b13] 
> btrfs(cmd_check+0x2634)[0x56091707d431] 
> btrfs(main+0x88)[0x560917027260] 
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f93aa508561] 
> btrfs(_start+0x2a)[0x560917026dfa] 
> Aborted 

I think that's the root cause.
Some invalid extent tree backref or bad tree block blow up backref code.

All previous error message may be garbage unless you're using Su's
latest branch, as lowmem mode tends to report false alerts on refrencer
count mismatch.

But the last abort looks pretty possible to be the culprit.

Would you try to dump the extent tree?
# btrfs inspect dump-tree -t extent  | grep -A50 156909494272

It should help us locate the culprit and hopefully get some chance to
fix it.

Thanks,
Qu

> 
> That's https://github.com/Damenly/btrfs-progs.git
> 
> Whoops, I didn't use the tmp1 branch, let me try again with that and
> report back, although the problem above is still going to be there since
> I think the only difference will be this, correct?
> https://github.com/Damenly/btrfs-progs/commit/b5851513a12237b3e19a3e71f3ad00b966d25b3a
> 
> Marc
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs suddenly think's it's raid6

2018-06-30 Thread Qu Wenruo


On 2018年06月30日 06:11, marble wrote:
> Hey,
> Thanks for the quick reply :-)
> 
>> Is there anything like unexpected power loss happens?
> Power loss may have happened.
>> And would you provide the following data for debugging?
>>
>> # btrfs ins dump-super -fFa /dev/mapper/black
> I attached it.

Looks pretty good.
Then we could go through normal salvage routine.

Please try the following commands and especially keep an eye on the
stderr output:

(The follow option is added in recent btrfs-progs releases, so please
ensure your btrfs-progs is up to date)
# btrfs ins dump-tree -b 1084653568 --follow /dev/mapper/black
# btrfs ins dump-tree -b 1083604992 --follow /dev/mapper/black
# btrfs ins dump-tree -b 1083981824 --follow /dev/mapper/black
# btrfs ins dump-tree -b 1084325888 --follow /dev/mapper/black

Or just use old roots and let btrfs check to judge:
# btrfs check --tree-root 1084751872 /dev/mapper/black
# btrfs check --tree-root 1083801600 /dev/mapper/black
# btrfs check --tree-root 1084145664 /dev/mapper/black
# btrfs check --tree-root 1084473344 /dev/mapper/black


>> And further more, what's the device mapper setup for /dev/mapper/black?
>> Is there anything like RAID here?
> I think this is the result of luks. "black" is the name I passed to
> cryptsetup open.

Maybe some powerloss screwed up encryption?
I'm not pretty sure how things will happen when power loss happens.

Anyway, let's see how above btrfs check and dump-tree ends.

Thanks,
Qu

>> Thanks,
>> Qu
> Cheers,
> marble
> 



signature.asc
Description: OpenPGP digital signature


[josef-btrfs:blk-iolatency-v7 14/14] mm/readahead.c:504:6: error: implicit declaration of function 'blk_cgroup_congested'

2018-06-30 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 
blk-iolatency-v7
head:   4f16e9aa09862911cb7ec38061a48b91a72142c3
commit: 4f16e9aa09862911cb7ec38061a48b91a72142c3 [14/14] skip readahead if the 
cgroup is congested
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or1k-linux-gcc (GCC) 6.0.0 20160327 (experimental)
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 4f16e9aa09862911cb7ec38061a48b91a72142c3
# save the attached .config to linux build tree
make.cross ARCH=openrisc 

All errors (new ones prefixed by >>):

   mm/readahead.c: In function 'page_cache_sync_readahead':
>> mm/readahead.c:504:6: error: implicit declaration of function 
>> 'blk_cgroup_congested' [-Werror=implicit-function-declaration]
 if (blk_cgroup_congested())
 ^~~~
   cc1: some warnings being treated as errors

vim +/blk_cgroup_congested +504 mm/readahead.c

   481  
   482  /**
   483   * page_cache_sync_readahead - generic file readahead
   484   * @mapping: address_space which holds the pagecache and I/O vectors
   485   * @ra: file_ra_state which holds the readahead state
   486   * @filp: passed on to ->readpage() and ->readpages()
   487   * @offset: start offset into @mapping, in pagecache page-sized units
   488   * @req_size: hint: total size of the read which the caller is 
performing in
   489   *pagecache pages
   490   *
   491   * page_cache_sync_readahead() should be called when a cache miss 
happened:
   492   * it will submit the read.  The readahead logic may decide to 
piggyback more
   493   * pages onto the read request if access patterns suggest it will 
improve
   494   * performance.
   495   */
   496  void page_cache_sync_readahead(struct address_space *mapping,
   497 struct file_ra_state *ra, struct file 
*filp,
   498 pgoff_t offset, unsigned long req_size)
   499  {
   500  /* no read-ahead */
   501  if (!ra->ra_pages)
   502  return;
   503  
 > 504  if (blk_cgroup_congested())
   505  return;
   506  
   507  /* be dumb */
   508  if (filp && (filp->f_mode & FMODE_RANDOM)) {
   509  force_page_cache_readahead(mapping, filp, offset, 
req_size);
   510  return;
   511  }
   512  
   513  /* do read-ahead */
   514  ondemand_readahead(mapping, ra, filp, false, offset, req_size);
   515  }
   516  EXPORT_SYMBOL_GPL(page_cache_sync_readahead);
   517  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[josef-btrfs:blk-iolatency-v7 14/14] mm/readahead.c:504:6: error: implicit declaration of function 'blk_cgroup_congested'; did you mean 'bdi_rw_congested'?

2018-06-30 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 
blk-iolatency-v7
head:   4f16e9aa09862911cb7ec38061a48b91a72142c3
commit: 4f16e9aa09862911cb7ec38061a48b91a72142c3 [14/14] skip readahead if the 
cgroup is congested
config: sh-allnoconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 4f16e9aa09862911cb7ec38061a48b91a72142c3
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   mm/readahead.c: In function 'page_cache_sync_readahead':
>> mm/readahead.c:504:6: error: implicit declaration of function 
>> 'blk_cgroup_congested'; did you mean 'bdi_rw_congested'? 
>> [-Werror=implicit-function-declaration]
 if (blk_cgroup_congested())
 ^~~~
 bdi_rw_congested
   cc1: some warnings being treated as errors

vim +504 mm/readahead.c

   481  
   482  /**
   483   * page_cache_sync_readahead - generic file readahead
   484   * @mapping: address_space which holds the pagecache and I/O vectors
   485   * @ra: file_ra_state which holds the readahead state
   486   * @filp: passed on to ->readpage() and ->readpages()
   487   * @offset: start offset into @mapping, in pagecache page-sized units
   488   * @req_size: hint: total size of the read which the caller is 
performing in
   489   *pagecache pages
   490   *
   491   * page_cache_sync_readahead() should be called when a cache miss 
happened:
   492   * it will submit the read.  The readahead logic may decide to 
piggyback more
   493   * pages onto the read request if access patterns suggest it will 
improve
   494   * performance.
   495   */
   496  void page_cache_sync_readahead(struct address_space *mapping,
   497 struct file_ra_state *ra, struct file 
*filp,
   498 pgoff_t offset, unsigned long req_size)
   499  {
   500  /* no read-ahead */
   501  if (!ra->ra_pages)
   502  return;
   503  
 > 504  if (blk_cgroup_congested())
   505  return;
   506  
   507  /* be dumb */
   508  if (filp && (filp->f_mode & FMODE_RANDOM)) {
   509  force_page_cache_readahead(mapping, filp, offset, 
req_size);
   510  return;
   511  }
   512  
   513  /* do read-ahead */
   514  ondemand_readahead(mapping, ra, filp, false, offset, req_size);
   515  }
   516  EXPORT_SYMBOL_GPL(page_cache_sync_readahead);
   517  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: btrfs send/receive vs rsync

2018-06-30 Thread Duncan
Marc MERLIN posted on Fri, 29 Jun 2018 09:24:20 -0700 as excerpted:

>> If instead of using a single BTRFS filesystem you used LVM volumes
>> (maybe with Thin provisioning and monitoring of the volume group free
>> space) for each of your servers to backup with one BTRFS filesystem per
>> volume you would have less snapshots per filesystem and isolate
>> problems in case of corruption. If you eventually decide to start from
>> scratch again this might help a lot in your case.
> 
> So, I already have problems due to too many block layers:
> - raid 5 + ssd - bcache - dmcrypt - btrfs
> 
> I get occasional deadlocks due to upper layers sending more data to the
> lower layer (bcache) than it can process. I'm a bit warry of adding yet
> another layer (LVM), but you're otherwise correct than keeping smaller
> btrfs filesystems would help with performance and containing possible
> damage.
> 
> Has anyone actually done this? :)

So I definitely use (and advocate!) the split-em-up strategy, and I use 
btrfs, but that's pretty much all the similarity we have.

I'm all ssd, having left spinning rust behind.  My strategy avoids 
unnecessary layers like lvm (tho crypt can arguably be necessary), 
preferring direct on-device (gpt) partitioning for simplicity of 
management and disaster recovery.  And my backup and recovery strategy is 
an equally simple mkfs and full-filesystem-fileset copy to an identically 
sized filesystem, with backups easily bootable/mountable in place of the 
working copy if necessary, and multiple backups so if disaster takes out 
the backup I was writing at the same time as the working copy, I still 
have a backup to fall back to.

So it's different enough I'm not sure how much my experience will help 
you.  But I /can/ say the subdivision is nice, as it means I can keep my 
root filesystem read-only by default for reliability, my most-at-risk log 
filesystem tiny for near-instant scrub/balance/check, and my also at risk 
home small as well, with the big media files being on a different 
filesystem that's mostly read-only, so less at risk and needing less 
frequent backups.  The tiny boot and large updates (distro repo, sources, 
ccache) are also separate, and mounted only for boot maintenance or 
updates.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files

2018-06-30 Thread Duncan
Austin S. Hemmelgarn posted on Fri, 29 Jun 2018 14:31:04 -0400 as
excerpted:

> On 2018-06-29 13:58, james harvey wrote:
>> On Fri, Jun 29, 2018 at 1:09 PM, Austin S. Hemmelgarn
>>  wrote:
>>> On 2018-06-29 11:15, james harvey wrote:

 On Thu, Jun 28, 2018 at 6:27 PM, Chris Murphy
 
 wrote:
>
> And an open question I have about scrub is weather it only ever is
> checking csums, meaning nodatacow files are never scrubbed, or if
> the copies are at least compared to each other?


 Scrub never looks at nodatacow files.  It does not compare the copies
 to each other.

 Qu submitted a patch to make check compare the copies:
 https://patchwork.kernel.org/patch/10434509/

 This hasn't been added to btrfs-progs git yet.

 IMO, I think the offline check should look at nodatacow copies like
 this, but I still think this also needs to be added to scrub.  In the
 patch thread, I discuss my reasons why.  In brief: online scanning;
 this goes along with user's expectation of scrub ensuring mirrored
 data integrity; and recommendations to setup scrub on periodic basis
 to me means it's the place to put it.
>>>
>>> That said, it can't sanely fix things if there is a mismatch. At
>>> least,
>>> not unless BTRFS gets proper generational tracking to handle
>>> temporarily missing devices.  As of right now, sanely fixing things
>>> requires significant manual intervention, as you have to bypass the
>>> device read selection algorithm to be able to look at the state of the
>>> individual copies so that you can pick one to use and forcibly rewrite
>>> the whole file by hand.
>> 
>> Absolutely.  User would need to use manual intervention as you
>> describe, or restore the single file(s) from backup.  But, it's a good
>> opportunity to tell the user they had partial data corruption, even if
>> it can't be auto-fixed.  Otherwise they get intermittent data
>> corruption, depending on which copies are read.

> The thing is though, as things stand right now, you need to manually
> edit the data on-disk directly or restore the file from a backup to fix
> the file.  While it's technically true that you can manually repair this
> type of thing, both of the cases for doing it without those patches I
> mentioned, it's functionally impossible for a regular user to do it
> without potentially losing some data.

[Usual backups rant, user vs. admin variant, nowcow/tmpfs edition.  
Regulars can skip as the rest is already predicted from past posts, for 
them. =;^]

"Regular user"?  

"Regular users" don't need to bother with this level of detail.  They 
simply get their "admin" to do it, even if that "admin" is their kid, or 
the kid from next door that's good with computers, or the geek squad (aka 
nsa-agent-squad) guy/gal, doing it... or telling them to install "a real 
OS", meaning whatever MS/Apple/Google something that they know how to 
deal with.

If the "user" is dealing with setting nocow, choosing btrfs in the first 
place, etc, then they're _not_ a "regular user" by definition, they're 
already an admin.

And as any admin learns rather quickly, the value of data is defined by 
the number of backups it's worth having of that data.

Which means it's not a problem.  Either the data had a backup and it's 
(reasonably) trivial to restore the data from that backup, or the data 
was defined by lack of having that backup as of only trivial value, so 
low as to not be worth the time/trouble/resources necessary to make that 
backup in the first place.

Which of course means what was defined as of most value, either the data 
of there was a backup, or the time/trouble/resources that would have gone 
into creating it if not, is *always* saved.

(And of course the same goes for "I had a backup, but it's old", except 
in this case it's the value of the data delta between the backup and 
current.  As soon as it's worth more than the time/trouble/hassle of 
updating the backup, it will by definition be updated.  Not having a 
newer backup available thus simply means the value of the data that 
changed between the last backup and current was simply not enough to 
justify updating the backup, and again, what was of most value is 
*always* saved, either the data, or the time that would have otherwise 
gone into making the newer backup.)

Because while a "regular user" may not know it because it's not his /job/ 
to know it, if there's anything an admin knows *well* it's that the 
working copy of data **WILL** be damaged.  It's not a matter of if, but 
of when, and of whether it'll be a fat-finger mistake, or a hardware or 
software failure, or wetware (theft, ransomware, etc), or wetware (flood, 
fire and the water that put it out damage, etc), tho none of that 
actually matters after all, because in the end, the only thing that 
matters was how the value of that data was defined by the number of 
backups made of it, and how quickly and conveniently at least 

Re: Incremental send/receive broken after snapshot restore

2018-06-30 Thread Andrei Borzenkov
Do not reply privately to mails on list.

29.06.2018 22:10, Hannes Schweizer пишет:
> On Fri, Jun 29, 2018 at 7:44 PM Andrei Borzenkov  wrote:
>>
>> 28.06.2018 23:09, Hannes Schweizer пишет:
>>> Hi,
>>>
>>> Here's my environment:
>>> Linux diablo 4.17.0-gentoo #5 SMP Mon Jun 25 00:26:55 CEST 2018 x86_64
>>> Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz GenuineIntel GNU/Linux
>>> btrfs-progs v4.17
>>>
>>> Label: 'online'  uuid: e4dc6617-b7ed-4dfb-84a6-26e3952c8390
>>> Total devices 2 FS bytes used 3.16TiB
>>> devid1 size 1.82TiB used 1.58TiB path /dev/mapper/online0
>>> devid2 size 1.82TiB used 1.58TiB path /dev/mapper/online1
>>> Data, RAID0: total=3.16TiB, used=3.15TiB
>>> System, RAID0: total=16.00MiB, used=240.00KiB
>>> Metadata, RAID0: total=7.00GiB, used=4.91GiB
>>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>>
>>> Label: 'offline'  uuid: 5b449116-93e5-473e-aaf5-bf3097b14f29
>>> Total devices 2 FS bytes used 3.52TiB
>>> devid1 size 5.46TiB used 3.53TiB path /dev/mapper/offline0
>>> devid2 size 5.46TiB used 3.53TiB path /dev/mapper/offline1
>>> Data, RAID1: total=3.52TiB, used=3.52TiB
>>> System, RAID1: total=8.00MiB, used=512.00KiB
>>> Metadata, RAID1: total=6.00GiB, used=5.11GiB
>>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>>
>>> Label: 'external'  uuid: 8bf13621-01f0-4f09-95c7-2c157d3087d0
>>> Total devices 1 FS bytes used 3.65TiB
>>> devid1 size 5.46TiB used 3.66TiB path
>>> /dev/mapper/luks-3c196e96-d46c-4a9c-9583-b79c707678fc
>>> Data, single: total=3.64TiB, used=3.64TiB
>>> System, DUP: total=32.00MiB, used=448.00KiB
>>> Metadata, DUP: total=11.00GiB, used=9.72GiB
>>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>>
>>>
>>> The following automatic backup scheme is in place:
>>> hourly:
>>> btrfs sub snap -r online/root online/root.
>>>
>>> daily:
>>> btrfs sub snap -r online/root online/root.
>>> btrfs send -c online/root.
>>> online/root. | btrfs receive offline
>>> btrfs sub del -c online/root.
>>>
>>> monthly:
>>> btrfs sub snap -r online/root online/root.
>>> btrfs send -c online/root.
>>> online/root. | btrfs receive external
>>> btrfs sub del -c online/root.
>>>
>>> Now here are the commands leading up to my problem:
>>> After the online filesystem suddenly went ro, and btrfs check showed
>>> massive problems, I decided to start the online array from scratch:
>>> 1: mkfs.btrfs -f -d raid0 -m raid0 -L "online" /dev/mapper/online0
>>> /dev/mapper/online1
>>>
>>> As you can see from the backup commands above, the snapshots of
>>> offline and external are not related, so in order to at least keep the
>>> extensive backlog of the external snapshot set (including all
>>> reflinks), I decided to restore the latest snapshot from external.
>>> 2: btrfs send external/root. | btrfs receive online
>>>
>>> I wanted to ensure I can restart the incremental backup flow from
>>> online to external, so I did this
>>> 3: mv online/root. online/root
>>> 4: btrfs sub snap -r online/root online/root.
>>> 5: btrfs property set online/root ro false
>>>
>>> Now, I naively expected a simple restart of my automatic backups for
>>> external should work.
>>> However after running
>>> 6: btrfs sub snap -r online/root online/root.
>>> 7: btrfs send -c online/root.
>>> online/root. | btrfs receive external
>>
>> You just recreated your "online" filesystem from scratch. Where
>> "old_external_reference" comes from? You did not show steps used to
>> create it.
>>
>>> I see the following error:
>>> ERROR: unlink root/.ssh/agent-diablo-_dev_pts_3 failed. No such file
>>> or directory
>>>
>>> Which is unfortunate, but the second problem actually encouraged me to
>>> post this message.
>>> As planned, I had to start the offline array from scratch as well,
>>> because I no longer had any reference snapshot for incremental backups
>>> on other devices:
>>> 8: mkfs.btrfs -f -d raid1 -m raid1 -L "offline" /dev/mapper/offline0
>>> /dev/mapper/offline1
>>>
>>> However restarting the automatic daily backup flow bails out with a
>>> similar error, although no potentially problematic previous
>>> incremental snapshots should be involved here!
>>> ERROR: unlink o925031-987-0/2139527549 failed. No such file or directory
>>>
>>
>> Again - before you can *re*start incremental-forever sequence you need
>> initial full copy. How exactly did you restart it if no snapshots exist
>> either on source or on destination?
> 
> Thanks for your help regarding this issue!
> 
> Before the online crash, I've used the following online -> external
> backup scheme:
> btrfs sub snap -r online/root online/root.
> btrfs send -c online/root.
> online/root. | btrfs receive external
> btrfs sub del -c online/root.
> 
> By sending the existing snapshot from external to online (basically a
> full copy of external/old_external_reference to online/root), it
> should have been possible to restart the monthly online -> external
> backup scheme, right?
> 

You did