Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-06 Thread Marc Joliet
On Montag, 6. März 2017 00:53:40 CET Marc Joliet wrote:
> On Mittwoch, 1. März 2017 19:14:07 CET you wrote:
> >  In any
> > 
> > case, I started btrfs-check on the device itself.
> 
> *Sigh*, I had to restart it, because I forgot to redirect to a file and
> quite frankly wasn't expecting this flood of output, but here's a summary
> of the output after about 2 days:
> 
[snip old output]

OK, it finished last night.  Here's the summary again:

% wc -l btrfs_check_output_20170303.log 
3028222 btrfs_check_output_20170303.log
% grep -v "backref lost" btrfs_check_output_20170303.log | grep -v "check \
(leaf\|node\) failed" | grep -v "lost its parent" | grep -v "referencer count" 
checking extents
ERROR: block group[3879328546816 1073741824] used 1072840704 but extent items 
used 1129164800
ERROR: block group[4163870130176 1073741824] used 1072259072 but extent items 
used 0
ERROR: block group[4223999672320 1073741824] used 1073664000 but extent items 
used 1074188288
ERROR: block group[4278760505344 1073741824] used 1073377280 but extent items 
used 1073901568
ERROR: block group[4406535782400 1073741824] used 1073627136 but extent items 
used 0
ERROR: extent [3830028140544 4096] referencer bytenr mismatch, wanted: 
3830028140544, have: 3826183847936
ERROR: errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots
Checking filesystem on /dev/sdb2
UUID: f97b3cda-15e8-418b-bb9b-235391ef2a38
found 892572778496 bytes used err is -5
total csum bytes: 860790216
total tree bytes: 36906336256
total fs tree bytes: 34551476224
total extent tree bytes: 1230610432
btree space waste bytes: 7446885892
file data blocks allocated: 16359581663232
 referenced 2358137831424

> That's right, slowly approaching 1.5 million lines of btrfs-check output!
> That's *way* more than I ran it the last time this error happened a few
> weeks ago.

As can be seen above, that ballooned to over 3 million lines.  Since the 
output is 4.2 MB, even after XZ compression, I put it up on my Dropbox, just 
in case it's interesting to anybody:

https://www.dropbox.com/s/h6ftqpygfr4vsks/btrfs_check_output_20170303.log.xz?
dl=0

Since this is my backup drive, and the second time within a month that it had 
problems like this, *and* I've got both the btrfs-image dump and btrfs-check 
output, I'm going to go ahead and reformat, so that my three computers are 
finally backed up again.

Oh, and for what it's worth, I did test against a 4.8 kernel, and pretty much 
immediately got the "forced RO" error, just like with 4.9.  I didn't try 
anything older (or newer).

As a last step, I'll probably collect all information I have and post it to 
bugzilla when I have a chance, since others might hit it, too (Kai did before 
me, after all).

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-05 Thread Marc Joliet
On Mittwoch, 1. März 2017 19:14:07 CET you wrote:
>  In any
> case, I started btrfs-check on the device itself.

*Sigh*, I had to restart it, because I forgot to redirect to a file and quite 
frankly wasn't expecting this flood of output, but here's a summary of the 
output after about 2 days:

% wc -l btrfs_check_output_20170303.log 
1360252 btrfs_check_output_20170303.log
% grep -v "backref lost" btrfs_check_output_20170303.log | grep -v "check \
(leaf\|node\) failed" | grep -v "lost its parent" | grep -v "referencer count" 
checking extents
ERROR: block group[3879328546816 1073741824] used 1072840704 but extent items 
used 1129164800
ERROR: block group[4163870130176 1073741824] used 1072259072 but extent items 
used 0
ERROR: block group[4223999672320 1073741824] used 1073664000 but extent items 
used 1074188288
ERROR: block group[4278760505344 1073741824] used 1073377280 but extent items 
used 1073901568
ERROR: block group[4406535782400 1073741824] used 1073627136 but extent items 
used 0
ERROR: extent [3830028140544 4096] referencer bytenr mismatch, wanted: 
3830028140544, have: 3826183847936
% tail btrfs_check_output_20170303.log 
ERROR: data extent[3924538445824 4096] backref lost
ERROR: data extent[3933464903680 4096] backref lost
ERROR: data extent[3924538531840 4096] backref lost
ERROR: data extent[3839131480064 4096] backref lost
ERROR: data extent[3834701750272 4096] backref lost
ERROR: data extent[3873087918080 4096] backref lost
ERROR: data extent[3873072283648 4096] backref lost
ERROR: data extent[3873088090112 8192] backref lost
ERROR: data extent[3873072287744 4096] backref lost
ERROR: data extent[3856294449152 4096] backref lost

That's right, slowly approaching 1.5 million lines of btrfs-check output!  
That's *way* more than I ran it the last time this error happened a few weeks 
ago.

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-03 Thread Marc Joliet
On Friday 03 March 2017 09:00:10 Qu Wenruo wrote:
> > FWIW, as per my later messages, after mounting with clear_cache and
> > letting
> > btrfs-cleaner finish, btrfs-check did *not* print out those errors after
> > running again.  It's now about two weeks later that the file system is
> > showing problems again.
> 
> If btrfs-check didn't print out *any* error, then it should be mostly 
> fine. (Unless there is some case we don't expose yet)
> 
> The problem should be caused by kernel AFAIK.

So you think it could be a regression in 4.9?  Should I try 4.10?  Or is it 
more likely just an undiscovered bug?

> > Oh, and just in case it's relevant, the file system was created with
> > btrfs-
> > convert (a long time, maybe 1.5 years ago, though; it was originally
> > ext4).
> 
> Not sure if it's related.
> But at least for that old convert, it's chunk layout is somewhat rare 
> and sometimes even bug-prone.
> 
> Did you balance the btrfs after convert? If so, it should be more like a 
> traditional btrfs then.

Yes, I'm fairly certain I did that, as that is what the btrfs wiki recommends.

> Personally speaking I don't think it is relative for your bug, but much 
> like a normal extent tree corruption seen in mail list.

OK, so is there anything else I can do?

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-03 Thread Marc Joliet
On Friday 03 March 2017 09:09:57 Qu Wenruo wrote:
> At 03/02/2017 05:44 PM, Marc Joliet wrote:
> > On Wednesday 01 March 2017 19:14:07 Marc Joliet wrote:
> >> In any
> >> case, I started btrfs-check on the device itself.
> > 
> > OK, it's still running, but the output so far is:
> > 
> > # btrfs check --mode=lowmem --progress /dev/sdb2
> > Checking filesystem on /dev/sdb2
> > UUID: f97b3cda-15e8-418b-bb9b-235391ef2a38
> > ERROR: shared extent[3826242740224 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3826442825728 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3826744471552 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3827106349056 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3827141001216 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3827150958592 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3827251724288 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3827433795584 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3827536166912 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3827536183296 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3827621646336 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3828179406848 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3828267970560 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3828284530688 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3828714246144 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3828794187776 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3829161340928 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3829373693952 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3830252130304 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3830421159936 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3830439141376 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3830441398272 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3830785138688 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3831099297792 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3831128768512 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3831371513856 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3831535570944 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3831591952384 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3831799398400 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3831829250048 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3831829512192 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3832011440128 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3832011767808 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3832023920640 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3832024678400 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3832027316224 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3832028762112 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3832030236672 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3832030330880 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3832161079296 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3832164904960 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3832164945920 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3832613765120 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3833727565824 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3833914073088 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared extent[3833929310208 4096] lost its parent (parent:
> > 3827251183616, level: 0)
> > ERROR: shared 

Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-02 Thread Qu Wenruo



At 03/02/2017 05:43 PM, Marc Joliet wrote:

On Thursday 02 March 2017 08:43:53 Qu Wenruo wrote:

At 02/02/2017 08:01 PM, Marc Joliet wrote:

On Sunday 28 August 2016 15:29:08 Kai Krakow wrote:

Hello list!


Hi list


[kernel message snipped]


Btrfs --repair refused to repair the filesystem telling me something
about compressed extents and an unsupported case, wanting me to take an
image and send it to the devs. *sigh*


I haven't tried a repair yet; it's a big file system, and btrfs-check is
still running:

# btrfs check -p /dev/sdd2
Checking filesystem on /dev/sdd2
UUID: f97b3cda-15e8-418b-bb9b-235391ef2a38
parent transid verify failed on 3829276291072 wanted 224274 found 283858
parent transid verify failed on 3829276291072 wanted 224274 found 283858
parent transid verify failed on 3829276291072 wanted 224274 found 283858
parent transid verify failed on 3829276291072 wanted 224274 found 283858


Normal transid error, can't say much about if it's harmless, but at
least some thing went wrong.


Ignoring transid failure
leaf parent key incorrect 3829276291072
bad block 3829276291072


That's some what a big problem for that tree block.

If this tree block is extent tree block, no wonder why kernel output
kernel warning and abort transaction.

You could try "btrfs-debug-tree -b 3829276291072 " to show the
content of the tree block.


# btrfs-debug-tree -b 3829276291072 /dev/sdb2
btrfs-progs v4.9
node 3829276291072 level 1 items 70 free 51 generation 292525 owner 2
fs uuid f97b3cda-15e8-418b-bb9b-235391ef2a38
chunk uuid 1cee580c-3442-4717-9300-8514dd8ff297
key (3828594696192 METADATA_ITEM 0) block 3828933423104 (934798199)
gen 292523
key (3828594925568 METADATA_ITEM 0) block 3829427818496 (934918901)
gen 292525
key (3828595109888 METADATA_ITEM 0) block 3828895723520 (934788995)
gen 292523
key (3828595232768 METADATA_ITEM 0) block 3829202751488 (934863953)
gen 292524
key (3828595412992 METADATA_ITEM 0) block 3829097209856 (934838186)
gen 292523
key (3828595572736 TREE_BLOCK_REF 33178) block 3829235073024
(934871844) gen 292524
key (3828595744768 METADATA_ITEM 0) block 3829128351744 (934845789)
gen 292524
key (3828595982336 METADATA_ITEM 0) block 3829146484736 (934850216)
gen 292524
key (3828596187136 METADATA_ITEM 1) block 3829097234432 (934838192)
gen 292523
key (3828596387840 TREE_BLOCK_REF 33527) block 3829301653504
(934888099) gen 292525
key (3828596617216 METADATA_ITEM 0) block 3828885737472 (934786557)
gen 292523
key (3828596838400 METADATA_ITEM 0) block 3828885741568 (934786558)
gen 292523
key (3828597047296 METADATA_ITEM 0) block 3829320552448 (934892713)
gen 292525
key (3828597231616 METADATA_ITEM 0) block 3828945653760 (934801185)
gen 292523
key (3828597383168 METADATA_ITEM 0) block 3829276299264 (934881909)
gen 292525
key (3828597641216 METADATA_ITEM 1) block 3829349351424 (934899744)
gen 292525
key (3828597866496 METADATA_ITEM 0) block 3829364776960 (934903510)
gen 292525
key (3828598067200 METADATA_ITEM 0) block 3828598321152 (934716387)
gen 292522
key (3828598259712 METADATA_ITEM 0) block 3829422968832 (934917717)
gen 292525
key (3828598415360 TREE_BLOCK_REF 33252) block 3828885803008
(934786573) gen 292523
key (3828598665216 METADATA_ITEM 0) block 3828937863168 (934799283)
gen 292523
key (3828598829056 METADATA_ITEM 0) block 3828885811200 (934786575)
gen 292523
key (3828599054336 METADATA_ITEM 0) block 3829363744768 (934903258)
gen 292525
key (3828599246848 METADATA_ITEM 0) block 3828915838976 (934793906)
gen 292523
key (3828599504896 METADATA_ITEM 0) block 3829436194816 (934920946)
gen 292525
key (3828599672832 METADATA_ITEM 0) block 3828905140224 (934791294)
gen 292523
key (3828599771136 METADATA_ITEM 0) block 382923776 (934895831)
gen 292525
key (3828599988224 METADATA_ITEM 0) block 3829087199232 (934835742)
gen 292523
key (3828600135680 METADATA_ITEM 0) block 3828885827584 (934786579)
gen 292523
key (3828600389632 METADATA_ITEM 0) block 3829436284928 (934920968)
gen 292525
key (3828600528896 METADATA_ITEM 0) block 3829316214784 (934891654)
gen 292525
key (3828600729600 METADATA_ITEM 0) block 3828885905408 (934786598)
gen 292523
key (3828600934400 METADATA_ITEM 0) block 3829384486912 (934908322)
gen 292525
key (3828601143296 METADATA_ITEM 0) block 3829423611904 (934917874)
gen 292525
key (3828601356288 METADATA_ITEM 0) block 3829113688064 (934842209)
gen 292524
key (3828601556992 METADATA_ITEM 0) block 3829134540800 (934847300)
gen 292524
key (3828601696256 METADATA_ITEM 0) block 3829181837312 (934858847)
gen 292524
key (3828601823232 METADATA_ITEM 0) block 3829157421056 (934852886)
gen 292524
key (3828602015744 TREE_BLOCK_REF 32943) block 3829316218880
(934891655) gen 292525
  

Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-02 Thread Qu Wenruo



At 03/02/2017 05:44 PM, Marc Joliet wrote:

On Wednesday 01 March 2017 19:14:07 Marc Joliet wrote:

In any
case, I started btrfs-check on the device itself.


OK, it's still running, but the output so far is:

# btrfs check --mode=lowmem --progress /dev/sdb2
Checking filesystem on /dev/sdb2
UUID: f97b3cda-15e8-418b-bb9b-235391ef2a38
ERROR: shared extent[3826242740224 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3826442825728 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3826744471552 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3827106349056 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3827141001216 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3827150958592 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3827251724288 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3827433795584 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3827536166912 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3827536183296 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3827621646336 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3828179406848 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3828267970560 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3828284530688 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3828714246144 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3828794187776 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3829161340928 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3829373693952 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3830252130304 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3830421159936 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3830439141376 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3830441398272 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3830785138688 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3831099297792 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3831128768512 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3831371513856 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3831535570944 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3831591952384 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3831799398400 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3831829250048 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3831829512192 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832011440128 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832011767808 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832023920640 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832024678400 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832027316224 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832028762112 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832030236672 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832030330880 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832161079296 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832164904960 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832164945920 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832613765120 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3833727565824 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3833914073088 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3833929310208 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3833930141696 4096] lost its parent (parent:
3827251183616, level: 0)


The "shared extent lost its parent" is all about the same extent, 
3827251183616.


It would be nice if you could paste the output of btrfs-debug-tree -b 
3827251183616 to check what tree it belongs to.



ERROR: extent[3837768077312, 24576] referencer count mismatch (root: 33174,
owner: 1277577, offset: 4767744) wanted: 1, have: 0
[snip many more referencer count mismatches]

Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-02 Thread Marc Joliet
On Wednesday 01 March 2017 19:14:07 Marc Joliet wrote:
> In any 
> case, I started btrfs-check on the device itself.

OK, it's still running, but the output so far is:

# btrfs check --mode=lowmem --progress /dev/sdb2
Checking filesystem on /dev/sdb2
UUID: f97b3cda-15e8-418b-bb9b-235391ef2a38
ERROR: shared extent[3826242740224 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3826442825728 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3826744471552 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3827106349056 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3827141001216 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3827150958592 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3827251724288 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3827433795584 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3827536166912 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3827536183296 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3827621646336 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3828179406848 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3828267970560 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3828284530688 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3828714246144 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3828794187776 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3829161340928 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3829373693952 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3830252130304 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3830421159936 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3830439141376 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3830441398272 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3830785138688 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3831099297792 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3831128768512 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3831371513856 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3831535570944 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3831591952384 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3831799398400 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3831829250048 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3831829512192 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832011440128 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832011767808 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832023920640 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832024678400 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832027316224 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832028762112 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832030236672 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832030330880 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832161079296 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832164904960 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832164945920 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832613765120 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3833727565824 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3833914073088 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3833929310208 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3833930141696 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: extent[3837768077312, 24576] referencer count mismatch (root: 33174, 
owner: 1277577, offset: 4767744) wanted: 1, have: 0
[snip many more referencer count mismatches]
ERROR: extent[3878247383040, 8192] referencer count mismatch (root: 33495, 
owner: 2688918, offset: 3874816) wanted: 2, have: 3
ERROR: block group[3879328546816 1073741824] used 1072840704 but extent items 

Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-02 Thread Marc Joliet
On Thursday 02 March 2017 08:43:53 Qu Wenruo wrote:
> At 02/02/2017 08:01 PM, Marc Joliet wrote:
> > On Sunday 28 August 2016 15:29:08 Kai Krakow wrote:
> >> Hello list!
> > 
> > Hi list
> 
> [kernel message snipped]
> 
> >> Btrfs --repair refused to repair the filesystem telling me something
> >> about compressed extents and an unsupported case, wanting me to take an
> >> image and send it to the devs. *sigh*
> > 
> > I haven't tried a repair yet; it's a big file system, and btrfs-check is
> > still running:
> > 
> > # btrfs check -p /dev/sdd2
> > Checking filesystem on /dev/sdd2
> > UUID: f97b3cda-15e8-418b-bb9b-235391ef2a38
> > parent transid verify failed on 3829276291072 wanted 224274 found 283858
> > parent transid verify failed on 3829276291072 wanted 224274 found 283858
> > parent transid verify failed on 3829276291072 wanted 224274 found 283858
> > parent transid verify failed on 3829276291072 wanted 224274 found 283858
> 
> Normal transid error, can't say much about if it's harmless, but at
> least some thing went wrong.
> 
> > Ignoring transid failure
> > leaf parent key incorrect 3829276291072
> > bad block 3829276291072
> 
> That's some what a big problem for that tree block.
> 
> If this tree block is extent tree block, no wonder why kernel output
> kernel warning and abort transaction.
> 
> You could try "btrfs-debug-tree -b 3829276291072 " to show the
> content of the tree block.

# btrfs-debug-tree -b 3829276291072 /dev/sdb2 
btrfs-progs v4.9
node 3829276291072 level 1 items 70 free 51 generation 292525 owner 2
fs uuid f97b3cda-15e8-418b-bb9b-235391ef2a38
chunk uuid 1cee580c-3442-4717-9300-8514dd8ff297
key (3828594696192 METADATA_ITEM 0) block 3828933423104 (934798199) 
gen 292523
key (3828594925568 METADATA_ITEM 0) block 3829427818496 (934918901) 
gen 292525
key (3828595109888 METADATA_ITEM 0) block 3828895723520 (934788995) 
gen 292523
key (3828595232768 METADATA_ITEM 0) block 3829202751488 (934863953) 
gen 292524
key (3828595412992 METADATA_ITEM 0) block 3829097209856 (934838186) 
gen 292523
key (3828595572736 TREE_BLOCK_REF 33178) block 3829235073024 
(934871844) gen 292524
key (3828595744768 METADATA_ITEM 0) block 3829128351744 (934845789) 
gen 292524
key (3828595982336 METADATA_ITEM 0) block 3829146484736 (934850216) 
gen 292524
key (3828596187136 METADATA_ITEM 1) block 3829097234432 (934838192) 
gen 292523
key (3828596387840 TREE_BLOCK_REF 33527) block 3829301653504 
(934888099) gen 292525
key (3828596617216 METADATA_ITEM 0) block 3828885737472 (934786557) 
gen 292523
key (3828596838400 METADATA_ITEM 0) block 3828885741568 (934786558) 
gen 292523
key (3828597047296 METADATA_ITEM 0) block 3829320552448 (934892713) 
gen 292525
key (3828597231616 METADATA_ITEM 0) block 3828945653760 (934801185) 
gen 292523
key (3828597383168 METADATA_ITEM 0) block 3829276299264 (934881909) 
gen 292525
key (3828597641216 METADATA_ITEM 1) block 3829349351424 (934899744) 
gen 292525
key (3828597866496 METADATA_ITEM 0) block 3829364776960 (934903510) 
gen 292525
key (3828598067200 METADATA_ITEM 0) block 3828598321152 (934716387) 
gen 292522
key (3828598259712 METADATA_ITEM 0) block 3829422968832 (934917717) 
gen 292525
key (3828598415360 TREE_BLOCK_REF 33252) block 3828885803008 
(934786573) gen 292523
key (3828598665216 METADATA_ITEM 0) block 3828937863168 (934799283) 
gen 292523
key (3828598829056 METADATA_ITEM 0) block 3828885811200 (934786575) 
gen 292523
key (3828599054336 METADATA_ITEM 0) block 3829363744768 (934903258) 
gen 292525
key (3828599246848 METADATA_ITEM 0) block 3828915838976 (934793906) 
gen 292523
key (3828599504896 METADATA_ITEM 0) block 3829436194816 (934920946) 
gen 292525
key (3828599672832 METADATA_ITEM 0) block 3828905140224 (934791294) 
gen 292523
key (3828599771136 METADATA_ITEM 0) block 382923776 (934895831) 
gen 292525
key (3828599988224 METADATA_ITEM 0) block 3829087199232 (934835742) 
gen 292523
key (3828600135680 METADATA_ITEM 0) block 3828885827584 (934786579) 
gen 292523
key (3828600389632 METADATA_ITEM 0) block 3829436284928 (934920968) 
gen 292525
key (3828600528896 METADATA_ITEM 0) block 3829316214784 (934891654) 
gen 292525
key (3828600729600 METADATA_ITEM 0) block 3828885905408 (934786598) 
gen 292523
key (3828600934400 METADATA_ITEM 0) block 3829384486912 (934908322) 
gen 292525
key (3828601143296 METADATA_ITEM 0) block 3829423611904 (934917874) 
gen 292525
key (3828601356288 METADATA_ITEM 0) block 3829113688064 (934842209) 
gen 292524
key (3828601556992 METADATA_ITEM 0) block 3829134540800 (934847300) 
gen 292524
key (3828601696256 METADATA_ITEM 0) block 3829181837312 (934858847) 
gen 292524
key (3828601823232 METADATA_ITEM 0) block 3829157421056 (934852886) 
gen 

Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-01 Thread Qu Wenruo



At 02/02/2017 08:01 PM, Marc Joliet wrote:

On Sunday 28 August 2016 15:29:08 Kai Krakow wrote:

Hello list!


Hi list

[kernel message snipped]



Btrfs --repair refused to repair the filesystem telling me something
about compressed extents and an unsupported case, wanting me to take an
image and send it to the devs. *sigh*


I haven't tried a repair yet; it's a big file system, and btrfs-check is still
running:

# btrfs check -p /dev/sdd2
Checking filesystem on /dev/sdd2
UUID: f97b3cda-15e8-418b-bb9b-235391ef2a38
parent transid verify failed on 3829276291072 wanted 224274 found 283858
parent transid verify failed on 3829276291072 wanted 224274 found 283858
parent transid verify failed on 3829276291072 wanted 224274 found 283858
parent transid verify failed on 3829276291072 wanted 224274 found 283858


Normal transid error, can't say much about if it's harmless, but at 
least some thing went wrong.



Ignoring transid failure
leaf parent key incorrect 3829276291072
bad block 3829276291072


That's some what a big problem for that tree block.

If this tree block is extent tree block, no wonder why kernel output 
kernel warning and abort transaction.


You could try "btrfs-debug-tree -b 3829276291072 " to show the 
content of the tree block.


If it's an extent tree block, then I'm afraid that's the problem.
Not sure if repair can repair such problem, but at least from what I see 
in btrfs-progs fsck self testcases, it doesn't handle extent tree error 
well.




ERROR: errors found in extent allocation tree or chunk allocation
block group 4722282987520 has wrong amount of free space
failed to load free space cache for block group 4722282987520
checking free space cache [O]
root 32018 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32089 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32091 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32092 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32107 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32189 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32190 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32191 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32265 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32266 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32409 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32410 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32411 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32412 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32413 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32631 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32632 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32633 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32634 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32635 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32636 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32718 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 32732 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096


File extent holes are completely fine, one of the few problems we can 
fix safely in btrfs-check.


But previous extent tree one is not.

If lowmem mode also reports the same problems only (file extent discount 
+ extent tree error), then there is a chance that --init-extent-tree may 
help.


But it will be super time consuming though.

Thanks,
Qu


checking fs roots [o]

I know that the "file extend discount" errors are fixable from my previous
email to this ML, but what about the rest?  From looking through the ML
archives it seems that --repair won't be able to fix the transid failures.  It
seems that one person had success with the "usebackuproot" mount 

Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-01 Thread Marc Joliet
On Wednesday 01 March 2017 19:14:07 Marc Joliet wrote:
> > > Also, the image is complete, so I only need to find somewhere where I
> > > can
> > > upload a 9.4 GB file.
> >
> > 
> >
> > Is it a compressed dump? Dumped with btrfs-image -c9?
> 
> It was created with:
> 
> btrfs-image -s -w /dev/sdb2 - | xz -9 --stdout >
> ./btrfs_backup_drive_2.img.xz
> 
> (Mainly because I felt more comfortable using a separate compression
> utility,  not for any rational reason.  Although if you really meant
> "image" above, I have the feeling I'll regret this decision.)

Ah, never mind, it's not as big as I was worried it would be:

% xz -l btrfs_backup_drive.img.xz 
 Str.  Blöcke   Kompr. Unkompr.  Verh.  Check   Dateiname
1   1  9.589,6 MiB 81,4 GiB  0,115  CRC64   
btrfs_backup_drive.img.xz

So 81.4 GB uncompressed.  I have the space to uncompress it if need be.

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-01 Thread Marc Joliet
On Wednesday 01 March 2017 19:14:07 Marc Joliet wrote:
> > And btrfs check --mode=lowmem is also recommended as in some rare case,
> > low mem mode can detect bug which original mode doesn't.
> 
> I did see differences in output the last time around (again, see my
> previous  messages in this thread), so I'll run with lowmem.

OK, just to prevent confusion: it turns out I never posted the output of 
btrfs-check with --mode=lowmem from the first time I was seeing these errors, 
but I remember the output being (slightly) different from the original mode.

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-01 Thread Marc Joliet
On Wednesday 01 March 2017 17:32:35 Qu Wenruo wrote:
> At 03/01/2017 04:23 PM, Marc Joliet wrote:
> > On Tuesday 28 February 2017 23:14:54 Marc Joliet wrote:
> >> I think I'm at that point now myself, unless anybody has any other ideas.
> > 
> > For example, could the --init-extent-tree option to btrfs-check help,
> > given
> > that I needed to pass -w to btrfs-image?

(First off, thanks for taking the time to respond!)

> --init-extent-tree should be avoided normally, unless you're 100% sure
> about only extent tree is corrupted, and you have enough faith that
> --init-extent-tree can finish.
> 
> Or it will mostly make things worse.

OK, that's why I asked :) .

> Before trying any RW btrfs check operations, did you run btrfs check on
> the image?

If by image you mean the device in question, not since the last time (see my 
previous messages).  Or can I really run btrfs check on the image?  In any 
case, I started btrfs-check on the device itself.

> And btrfs check --mode=lowmem is also recommended as in some rare case,
> low mem mode can detect bug which original mode doesn't.

I did see differences in output the last time around (again, see my previous 
messages in this thread), so I'll run with lowmem.  It won't be done until 
tomorrow, though.

> (And it also helps us to enhance lowmem mode)

OK

> > Also, the image is complete, so I only need to find somewhere where I can
> > upload a 9.4 GB file.
> 
> Is it a compressed dump? Dumped with btrfs-image -c9?

It was created with:

btrfs-image -s -w /dev/sdb2 - | xz -9 --stdout > ./btrfs_backup_drive_2.img.xz

(Mainly because I felt more comfortable using a separate compression utility, 
not for any rational reason.  Although if you really meant "image" above, I 
have the feeling I'll regret this decision.)

> If so, such large one won't help much unless there is some developer
> really interested in inspecting the image.

When I started without compression, it was something like 46 GB, after only 
1-2 hours, so I expect the uncompressed size to be... very large :-/ .

> Thanks,
> Qu
> 
> > Greetings
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-01 Thread Qu Wenruo



At 03/01/2017 04:23 PM, Marc Joliet wrote:

On Tuesday 28 February 2017 23:14:54 Marc Joliet wrote:

I think I'm at that point now myself, unless anybody has any other ideas.


For example, could the --init-extent-tree option to btrfs-check help, given
that I needed to pass -w to btrfs-image?


--init-extent-tree should be avoided normally, unless you're 100% sure 
about only extent tree is corrupted, and you have enough faith that 
--init-extent-tree can finish.


Or it will mostly make things worse.

Before trying any RW btrfs check operations, did you run btrfs check on 
the image?


And btrfs check --mode=lowmem is also recommended as in some rare case, 
low mem mode can detect bug which original mode doesn't.


(And it also helps us to enhance lowmem mode)


Also, the image is complete, so I only need to find somewhere where I can
upload a 9.4 GB file.


Is it a compressed dump? Dumped with btrfs-image -c9?

If so, such large one won't help much unless there is some developer 
really interested in inspecting the image.


Thanks,
Qu



Greetings




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-01 Thread Marc Joliet
On Tuesday 28 February 2017 23:14:54 Marc Joliet wrote:
> I think I'm at that point now myself, unless anybody has any other ideas.

For example, could the --init-extent-tree option to btrfs-check help, given 
that I needed to pass -w to btrfs-image?

Also, the image is complete, so I only need to find somewhere where I can 
upload a 9.4 GB file.

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-02-17 Thread Kai Krakow
Am Tue, 14 Feb 2017 13:52:48 +0100
schrieb Marc Joliet :

> Hi again,
> 
> so, it seems that I've solved the problem:  After having to
> umount/mount the FS several times to get btrfs-cleaner to finish, I
> thought of the "failed to load free space [...], rebuilding it now"
> type errors and decided to try the clear_cache mount option.  Since
> then, my home server has been running its backups regularly again.
> Furthermore, I was able back up my desktop again via send/recv (the
> rsync based backup is still running, but I expect it to succeed).
> The kernel log has also stayed clean.
> 
> Kai, I'd be curious whether clear_cache will help in your case, too,
> if you haven't tried it already.

The FS in question has been rebuilt from scratch. AFAIR it wasn't even
mountable back then, or at it froze the system short after mounting.

I needed the system back in usable state, so I had to recreate and
restore the system.

Next time, if it happens again (finger crossed it does not), I'll give
it a try.

-- 
Regards,
Kai

Replies to list-only preferred.


pgp8pHpMpE5QT.pgp
Description: Digitale Signatur von OpenPGP


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-02-14 Thread Marc Joliet
Hi again,

so, it seems that I've solved the problem:  After having to umount/mount the 
FS several times to get btrfs-cleaner to finish, I thought of the "failed to 
load free space [...], rebuilding it now" type errors and decided to try the 
clear_cache mount option.  Since then, my home server has been running its 
backups regularly again.  Furthermore, I was able back up my desktop again via 
send/recv (the rsync based backup is still running, but I expect it to 
succeed).  The kernel log has also stayed clean.

Kai, I'd be curious whether clear_cache will help in your case, too, if you 
haven't tried it already.

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-02-11 Thread Marc Joliet
On Saturday 11 February 2017 03:01:39 Kai Krakow wrote:
> Am Fri, 10 Feb 2017 23:15:03 +0100
> 
> schrieb Marc Joliet :
> > # btrfs filesystem df /media/MARCEC_BACKUP/
> > Data, single: total=851.00GiB, used=831.36GiB
> > System, DUP: total=64.00MiB, used=120.00KiB
> > Metadata, DUP: total=13.00GiB, used=10.38GiB
> > Metadata, single: total=1.12GiB, used=0.00B
> > GlobalReserve, single: total=512.00MiB, used=0.00B
> > 
> > Hmm, I take it that the single metadata is a leftover from running
> > --repair?
> 
> It's more probably a remnant of an incomplete balance operation or an
> older mkfs version. I'd simply rebalance metadata to fix this.
> 
> I don't think that btrfs-repair would migrate missing metadata
> duplicates back to single profile, it would more likely trigger
> recreating the missing duplicates. But I'm not sure.

I'm fairly certain it's from the repair operation, the device didn't have any 
single metadata before (and it still didn't really, notice that used=0.00B).  
So my best guess is that it was allocated during the --repair.

> If it is a result of the repair operation, that could be an
> interesting clue. Could it explain "error -17" from your logs? But that
> would mean the duplicates were already missing before the repair
> operation and triggered that problem. So the question is, why are those
> duplicates missing in the first place as a result of normal operation?
> From your logs:

I certainly didn't create them myself, the device has always had metadata=dup 
(you can check previous threads of mine, which also contain "btrfs fi df" 
output).  So I don't think it has anything to do with the failures.

> ---8<--- snip
> Feb 02 22:49:14 thetick kernel: BTRFS: device label MARCEC_BACKUP devid
> 1 transid 283903 /dev/sdd2
> Feb 02 22:49:19 thetick kernel: EXT4-fs (sdd1): mounted filesystem with
> ordered data mode. Opts: (null)
> Feb 03 00:18:52 thetick kernel: BTRFS info (device sdd2): use zlib
> compression Feb 03 00:18:52 thetick kernel: BTRFS info (device sdd2):
> disk space caching is enabled
> Feb 03 00:18:52 thetick kernel: BTRFS info (device sdd2): has skinny
> extents Feb 03 00:20:09 thetick kernel: BTRFS info (device sdd2): The
> free space cache file (3967375376384) is invalid. skip it
> Feb 03 01:05:58 thetick kernel: [ cut here ]
> Feb 03 01:05:58 thetick kernel: WARNING: CPU: 1 PID: 26544 at
> fs/btrfs/extent- tree.c:2967 btrfs_run_delayed_refs+0x26c/0x290
> Feb 03 01:05:58 thetick kernel: BTRFS: Transaction aborted (error -17)
> --->8--- snap
> 
> "error -17" being "object already exists". My only theory would be this
> has a direct connection to you finding the single metadata profile.
> Like in "the kernel thinks the objects already exists when it really
> didn't, and as a result the object is there only once now" aka "single
> metadata".
> 
> But I'm no dev and no expert on the internals.

Again, I don't think it has anything to do with the single metadata.  A "btrfs 
balance start -mprofile=single" got rid of the (empty) single metadata blocks.

What I *can* say is that the error seems to be transient.  For example, I ran 
an rsync last night that failed with the same error as before.  After 
unmounting and mounting the FS again, I could run it to completion (and 
verified some files).

Anyway, here's the whole log since this morning.  I can't correlate individual 
stack traces to specific actions any more, but the rough order of actions on 
my part (not counting unmount/mount cycles, those are easy enough to find):

- backups from the home server run, then
- btrfs-cleaner drops old snapshots, after which
- I run rsync (which fails), then
- I run rsync again (which succeeds), then
- I run "balance start -mprofile=single", which succeeds (I might have run 
this before rsync, I'm not sure), after which
- I start backups on my laptop (which fails after I finally went to bed, at 
about 3:58).

After that I just try "btrbk -r run" a few times, which managed to transfer 
the first two snapshots before failing.

Feb 11 00:00:06 diefledermaus kernel: usb 1-1: new high-speed USB device 
number 4 using ehci-pci
Feb 11 00:00:06 diefledermaus kernel: usb 1-1: New USB device found, 
idVendor=0480, idProduct=d010
Feb 11 00:00:06 diefledermaus kernel: usb 1-1: New USB device strings: Mfr=1, 
Product=2, SerialNumber=3
Feb 11 00:00:06 diefledermaus kernel: usb 1-1: Product: External USB 3.0
Feb 11 00:00:06 diefledermaus kernel: usb 1-1: Manufacturer: Toshiba
Feb 11 00:00:06 diefledermaus kernel: usb 1-1: SerialNumber: 20130421020612
Feb 11 00:00:07 diefledermaus kernel: usb-storage 1-1:1.0: USB Mass Storage 
device detected
Feb 11 00:00:07 diefledermaus kernel: usb-storage 1-1:1.0: Quirks match for 
vid 0480 pid d010: 2000
Feb 11 00:00:07 diefledermaus kernel: scsi host6: usb-storage 1-1:1.0
Feb 11 00:00:07 diefledermaus kernel: usbcore: registered new interface driver 
usb-storage
Feb 11 00:00:16 diefledermaus kernel: scsi 6:0:0:0: 

Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-02-10 Thread Kai Krakow
Am Fri, 10 Feb 2017 23:15:03 +0100
schrieb Marc Joliet :

> # btrfs filesystem df /media/MARCEC_BACKUP/   
> Data, single: total=851.00GiB, used=831.36GiB
> System, DUP: total=64.00MiB, used=120.00KiB
> Metadata, DUP: total=13.00GiB, used=10.38GiB
> Metadata, single: total=1.12GiB, used=0.00B
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> Hmm, I take it that the single metadata is a leftover from running
> --repair?

It's more probably a remnant of an incomplete balance operation or an
older mkfs version. I'd simply rebalance metadata to fix this.

I don't think that btrfs-repair would migrate missing metadata
duplicates back to single profile, it would more likely trigger
recreating the missing duplicates. But I'm not sure.

If it is a result of the repair operation, that could be an
interesting clue. Could it explain "error -17" from your logs? But that
would mean the duplicates were already missing before the repair
operation and triggered that problem. So the question is, why are those
duplicates missing in the first place as a result of normal operation?
From your logs:

---8<--- snip
Feb 02 22:49:14 thetick kernel: BTRFS: device label MARCEC_BACKUP devid
1 transid 283903 /dev/sdd2
Feb 02 22:49:19 thetick kernel: EXT4-fs (sdd1): mounted filesystem with 
ordered data mode. Opts: (null)
Feb 03 00:18:52 thetick kernel: BTRFS info (device sdd2): use zlib
compression Feb 03 00:18:52 thetick kernel: BTRFS info (device sdd2):
disk space caching is enabled
Feb 03 00:18:52 thetick kernel: BTRFS info (device sdd2): has skinny
extents Feb 03 00:20:09 thetick kernel: BTRFS info (device sdd2): The
free space cache file (3967375376384) is invalid. skip it
Feb 03 01:05:58 thetick kernel: [ cut here ]
Feb 03 01:05:58 thetick kernel: WARNING: CPU: 1 PID: 26544 at
fs/btrfs/extent- tree.c:2967 btrfs_run_delayed_refs+0x26c/0x290
Feb 03 01:05:58 thetick kernel: BTRFS: Transaction aborted (error -17)
--->8--- snap

"error -17" being "object already exists". My only theory would be this
has a direct connection to you finding the single metadata profile.
Like in "the kernel thinks the objects already exists when it really
didn't, and as a result the object is there only once now" aka "single
metadata".

But I'm no dev and no expert on the internals.

-- 
Regards,
Kai

Replies to list-only preferred.


pgp64OIVK8aRx.pgp
Description: Digitale Signatur von OpenPGP


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-02-10 Thread Marc Joliet
Sorry for the late reply, see below for why :) .

On Friday 03 February 2017 23:44:10 Kai Krakow wrote:
> Am Thu, 02 Feb 2017 13:01:03 +0100
> 
> schrieb Marc Joliet :
[...]
> > > Btrfs --repair refused to repair the filesystem telling me something
> > > about compressed extents and an unsupported case, wanting me to
> > > take an image and send it to the devs. *sigh*
> > 
> > I haven't tried a repair yet; it's a big file system, and btrfs-check
> > is still running:
> > 
> > # btrfs check -p /dev/sdd2
> > Checking filesystem on /dev/sdd2
> > UUID: f97b3cda-15e8-418b-bb9b-235391ef2a38
> > parent transid verify failed on 3829276291072 wanted 224274 found
> > 283858 parent transid verify failed on 3829276291072 wanted 224274
> > found 283858 parent transid verify failed on 3829276291072 wanted
> > 224274 found 283858 parent transid verify failed on 3829276291072
> > wanted 224274 found 283858 Ignoring transid failure
> > leaf parent key incorrect 3829276291072
> > bad block 3829276291072
> > 
> > ERROR: errors found in extent allocation tree or chunk allocation
> > block group 4722282987520 has wrong amount of free space
> > failed to load free space cache for block group 4722282987520
> > checking free space cache [O]
> > root 32018 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32089 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32091 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32092 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32107 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32189 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32190 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32191 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32265 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32266 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32409 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32410 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32411 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32412 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32413 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32631 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32632 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32633 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32634 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32635 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32636 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32718 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > root 32732 inode 95066 errors 100, file extent discount
> > 
> > Found file extent holes:
> > start: 413696, len: 4096
> > 
> > checking fs roots [o]
> > 
> > I know that the "file extend discount" errors are fixable from my
> > previous email to this ML, but what about the rest?  From looking
> > through the ML archives it seems that --repair won't be able to fix
> > the transid failures.  It seems that one person had success with the
> > "usebackuproot" mount option, which I haven't tried yet.

Well, it turns out that the problem fixed itself.  After check finished, I 
tried mounting the FS normally, which succeeded.  Shortly after, btrfs-cleaner 
started removing deleted snapshots.  About 40 minutes later I got the error 
again:


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-02-03 Thread Kai Krakow
Am Thu, 02 Feb 2017 13:01:03 +0100
schrieb Marc Joliet :

> On Sunday 28 August 2016 15:29:08 Kai Krakow wrote:
> > Hello list!  
> 
> Hi list
> 
> > It happened again. While using VirtualBox the following crash
> > happened, btrfs check found a lot of errors which it couldn't
> > repair. Earlier that day my system crashed which may already
> > introduced errors into my filesystem. Apparently, I couldn't create
> > an image (not enough space available), I only can give this trace
> > from dmesg:
> > 
> > [44819.903435] [ cut here ]
> > [44819.903443] WARNING: CPU: 3 PID: 2787 at
> > fs/btrfs/extent-tree.c:2963 btrfs_run_delayed_refs+0x26c/0x290
> > [44819.903444] BTRFS: Transaction aborted (error -17)
> > [44819.903445] Modules linked in: nls_iso8859_15 nls_cp437 vfat fat
> > fuse rfcomm veth af_packet ipt_MASQUERADE nf_nat_masquerade_ipv4
> > iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
> > nf_conntrack bridge stp llc w83627ehf bnep hwmon_vid cachefiles
> > btusb btintel bluetooth snd_hda_codec_hdmi snd_hda_codec_realtek
> > snd_hda_codec_generic snd_hda_intel snd_hda_codec rfkill snd_hwdep
> > snd_hda_core snd_pcm snd_timer coretemp hwmon snd r8169 mii
> > kvm_intel kvm iTCO_wdt iTCO_vendor_support rtc_cmos irqbypass
> > soundcore ip_tables uas usb_storage nvidia_drm(PO) vboxpci(O)
> > vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nvidia_modeset(PO)
> > nvidia(PO) efivarfs unix ipv6 [44819.903484] CPU: 3 PID: 2787 Comm:
> > BrowserBlocking Tainted: P   O4.7.2-gentoo #2
> > [44819.903485] Hardware name: To Be Filled By O.E.M. To Be Filled
> > By O.E.M./Z68 Pro3, BIOS L2.16A 02/22/2013 [44819.903487]
> >  8130af2d 8800b7d03d20 
> > [44819.903489]  810865fa 880409374428 8800b7d03d70
> > 8803bf299760 [44819.903491]  ffef
> > 8803f677f000 8108666a [44819.903493] Call Trace:
> > [44819.903496]  [] ? dump_stack+0x46/0x59
> > [44819.903500]  [] ? __warn+0xba/0xe0
> > [44819.903502]  [] ? warn_slowpath_fmt+0x4a/0x50
> > [44819.903504]  [] ?
> > btrfs_run_delayed_refs+0x26c/0x290 [44819.903507]
> > [] ? btrfs_release_path+0xe/0x80 [44819.903509]
> > [] ? btrfs_start_dirty_block_groups+0x2da/0x420
> > [44819.903511] [] ?
> > btrfs_commit_transaction+0x143/0x990 [44819.903514]
> > [] ? kmem_cache_free+0x165/0x180 [44819.903516]
> > [] ? btrfs_wait_ordered_range+0x7c/0x110
> > [44819.903518] [] ? btrfs_sync_file+0x286/0x360
> > [44819.903522] [] ? do_fsync+0x33/0x60
> > [44819.903524]  [] ? SyS_fdatasync+0xa/0x10
> > [44819.903528]  [] ?
> > entry_SYSCALL_64_fastpath+0x13/0x8f [44819.903529] ---[ end trace
> > 6944811e170a0e57 ]--- [44819.903531] BTRFS: error (device bcache2)
> > in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
> > [44819.903533] BTRFS info (device bcache2): forced readonly  
> 
> I got the same error myself, with this stack trace:
> 
> -- Logs begin at Fr 2016-04-01 17:07:28 CEST, end at Mi 2017-02-01
> 22:03:57 CET. --
> Feb 01 01:46:26 diefledermaus kernel: [ cut here
> ] Feb 01 01:46:26 diefledermaus kernel: WARNING: CPU: 1
> PID: 16727 at fs/btrfs/extent-tree.c:2967
> btrfs_run_delayed_refs+0x278/0x2b0 Feb 01 01:46:26 diefledermaus
> kernel: BTRFS: Transaction aborted (error -17) Feb 01 01:46:26
> diefledermaus kernel: BTRFS: error (device sdb2) in
> btrfs_run_delayed_refs:2967: errno=-17 Object already exists Feb 01
> 01:46:27 diefledermaus kernel: BTRFS info (device sdb2): forced
> readonly Feb 01 01:46:27 diefledermaus kernel: Modules linked in: msr
> ctr ccm tun arc4 snd_hda_codec_idt applesmc snd_hda_codec_generic
> input_polldev hwmon snd_hda_intel ath5k snd_hda_codec mac80211
> snd_hda_core ath snd_pcm cfg80211 snd_timer video acpi_cpufreq snd
> backlight sky2 rfkill processor button soundcore sg usb_storage
> sr_mod cdrom ata_generic pata_acpi uhci_hcd ahci libahci ata_piix
> libata ehci_pci ehci_hcd Feb 01 01:46:27 diefledermaus kernel: CPU: 1
> PID: 16727 Comm: kworker/u4:0 Not tainted 4.9.6-gentoo #1
> Feb 01 01:46:27 diefledermaus kernel: Hardware name: Apple Inc. 
> Macmini2,1/Mac-F4208EAA, BIOS MM21.88Z.009A.B00.0706281359
> 06/28/07 Feb 01 01:46:27 diefledermaus kernel: Workqueue:
> btrfs-extent-refs btrfs_extent_refs_helper
> Feb 01 01:46:27 diefledermaus kernel:  
> 812cf739 c9000285fd60 
> Feb 01 01:46:27 diefledermaus kernel:  8104908a
> 8800428df1e0 c9000285fdb0 0020
> Feb 01 01:46:27 diefledermaus kernel:  880003c1b1b8
> 8800bb73e900  810490fa
> Feb 01 01:46:27 diefledermaus kernel: Call Trace:
> Feb 01 01:46:27 diefledermaus kernel:  [] ? 
> dump_stack+0x46/0x5d
> Feb 01 01:46:27 diefledermaus kernel:  [] ?
> __warn+0xba/0xe0 Feb 01 01:46:27 diefledermaus kernel:
> [] ? warn_slowpath_fmt+0x4a/0x50
> Feb 01 01:46:27 diefledermaus kernel:  [] ? 
> 

Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-02-02 Thread Marc Joliet
On Sunday 28 August 2016 15:29:08 Kai Krakow wrote:
> Hello list!

Hi list

> It happened again. While using VirtualBox the following crash happened,
> btrfs check found a lot of errors which it couldn't repair. Earlier
> that day my system crashed which may already introduced errors into my
> filesystem. Apparently, I couldn't create an image (not enough space
> available), I only can give this trace from dmesg:
> 
> [44819.903435] [ cut here ]
> [44819.903443] WARNING: CPU: 3 PID: 2787 at fs/btrfs/extent-tree.c:2963
> btrfs_run_delayed_refs+0x26c/0x290 [44819.903444] BTRFS: Transaction
> aborted (error -17)
> [44819.903445] Modules linked in: nls_iso8859_15 nls_cp437 vfat fat fuse
> rfcomm veth af_packet ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack bridge stp
> llc w83627ehf bnep hwmon_vid cachefiles btusb btintel bluetooth
> snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic
> snd_hda_intel snd_hda_codec rfkill snd_hwdep snd_hda_core snd_pcm snd_timer
> coretemp hwmon snd r8169 mii kvm_intel kvm iTCO_wdt iTCO_vendor_support
> rtc_cmos irqbypass soundcore ip_tables uas usb_storage nvidia_drm(PO)
> vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nvidia_modeset(PO)
> nvidia(PO) efivarfs unix ipv6 [44819.903484] CPU: 3 PID: 2787 Comm:
> BrowserBlocking Tainted: P   O4.7.2-gentoo #2 [44819.903485]
> Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3, BIOS
> L2.16A 02/22/2013 [44819.903487]   8130af2d
> 8800b7d03d20  [44819.903489]  810865fa
> 880409374428 8800b7d03d70 8803bf299760 [44819.903491] 
>  ffef 8803f677f000 8108666a
> [44819.903493] Call Trace:
> [44819.903496]  [] ? dump_stack+0x46/0x59
> [44819.903500]  [] ? __warn+0xba/0xe0
> [44819.903502]  [] ? warn_slowpath_fmt+0x4a/0x50
> [44819.903504]  [] ? btrfs_run_delayed_refs+0x26c/0x290
> [44819.903507]  [] ? btrfs_release_path+0xe/0x80
> [44819.903509]  [] ?
> btrfs_start_dirty_block_groups+0x2da/0x420 [44819.903511] 
> [] ? btrfs_commit_transaction+0x143/0x990 [44819.903514] 
> [] ? kmem_cache_free+0x165/0x180 [44819.903516] 
> [] ? btrfs_wait_ordered_range+0x7c/0x110 [44819.903518] 
> [] ? btrfs_sync_file+0x286/0x360 [44819.903522] 
> [] ? do_fsync+0x33/0x60
> [44819.903524]  [] ? SyS_fdatasync+0xa/0x10
> [44819.903528]  [] ? entry_SYSCALL_64_fastpath+0x13/0x8f
> [44819.903529] ---[ end trace 6944811e170a0e57 ]---
> [44819.903531] BTRFS: error (device bcache2) in btrfs_run_delayed_refs:2963:
> errno=-17 Object already exists [44819.903533] BTRFS info (device bcache2):
> forced readonly

I got the same error myself, with this stack trace:

-- Logs begin at Fr 2016-04-01 17:07:28 CEST, end at Mi 2017-02-01 22:03:57 
CET. --
Feb 01 01:46:26 diefledermaus kernel: [ cut here ]
Feb 01 01:46:26 diefledermaus kernel: WARNING: CPU: 1 PID: 16727 at 
fs/btrfs/extent-tree.c:2967 btrfs_run_delayed_refs+0x278/0x2b0
Feb 01 01:46:26 diefledermaus kernel: BTRFS: Transaction aborted (error -17)
Feb 01 01:46:26 diefledermaus kernel: BTRFS: error (device sdb2) in 
btrfs_run_delayed_refs:2967: errno=-17 Object already exists
Feb 01 01:46:27 diefledermaus kernel: BTRFS info (device sdb2): forced 
readonly
Feb 01 01:46:27 diefledermaus kernel: Modules linked in: msr ctr ccm tun arc4 
snd_hda_codec_idt applesmc snd_hda_codec_generic input_polldev hwmon 
snd_hda_intel ath5k snd_hda_codec mac80211 snd_hda_core ath snd_pcm cfg80211 
snd_timer video acpi_cpufreq snd backlight sky2 rfkill processor button 
soundcore sg usb_storage sr_mod cdrom ata_generic pata_acpi uhci_hcd ahci 
libahci ata_piix libata ehci_pci ehci_hcd
Feb 01 01:46:27 diefledermaus kernel: CPU: 1 PID: 16727 Comm: kworker/u4:0 Not 
tainted 4.9.6-gentoo #1
Feb 01 01:46:27 diefledermaus kernel: Hardware name: Apple Inc. 
Macmini2,1/Mac-F4208EAA, BIOS MM21.88Z.009A.B00.0706281359 06/28/07
Feb 01 01:46:27 diefledermaus kernel: Workqueue: btrfs-extent-refs 
btrfs_extent_refs_helper
Feb 01 01:46:27 diefledermaus kernel:   812cf739 
c9000285fd60 
Feb 01 01:46:27 diefledermaus kernel:  8104908a 8800428df1e0 
c9000285fdb0 0020
Feb 01 01:46:27 diefledermaus kernel:  880003c1b1b8 8800bb73e900 
 810490fa
Feb 01 01:46:27 diefledermaus kernel: Call Trace:
Feb 01 01:46:27 diefledermaus kernel:  [] ? 
dump_stack+0x46/0x5d
Feb 01 01:46:27 diefledermaus kernel:  [] ? __warn+0xba/0xe0
Feb 01 01:46:27 diefledermaus kernel:  [] ? 
warn_slowpath_fmt+0x4a/0x50
Feb 01 01:46:27 diefledermaus kernel:  [] ? 
btrfs_run_delayed_refs+0x278/0x2b0
Feb 01 01:46:27 diefledermaus kernel:  [] ? 
delayed_ref_async_start+0x84/0xa0
Feb 01 01:46:27 diefledermaus kernel:  [] ? 
process_one_work+0x126/0x310
Feb 01 01:46:27 diefledermaus kernel:  [] ? 

[4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2016-08-28 Thread Kai Krakow
Hello list!

It happened again. While using VirtualBox the following crash happened,
btrfs check found a lot of errors which it couldn't repair. Earlier
that day my system crashed which may already introduced errors into my
filesystem. Apparently, I couldn't create an image (not enough space
available), I only can give this trace from dmesg:

[44819.903435] [ cut here ]
[44819.903443] WARNING: CPU: 3 PID: 2787 at fs/btrfs/extent-tree.c:2963 
btrfs_run_delayed_refs+0x26c/0x290
[44819.903444] BTRFS: Transaction aborted (error -17)
[44819.903445] Modules linked in: nls_iso8859_15 nls_cp437 vfat fat fuse rfcomm 
veth af_packet ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack bridge stp llc 
w83627ehf bnep hwmon_vid cachefiles btusb btintel bluetooth snd_hda_codec_hdmi 
snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec rfkill 
snd_hwdep snd_hda_core snd_pcm snd_timer coretemp hwmon snd r8169 mii kvm_intel 
kvm iTCO_wdt iTCO_vendor_support rtc_cmos irqbypass soundcore ip_tables uas 
usb_storage nvidia_drm(PO) vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) 
nvidia_modeset(PO) nvidia(PO) efivarfs unix ipv6
[44819.903484] CPU: 3 PID: 2787 Comm: BrowserBlocking Tainted: P   O
4.7.2-gentoo #2
[44819.903485] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 
Pro3, BIOS L2.16A 02/22/2013
[44819.903487]   8130af2d 8800b7d03d20 

[44819.903489]  810865fa 880409374428 8800b7d03d70 
8803bf299760
[44819.903491]   ffef 8803f677f000 
8108666a
[44819.903493] Call Trace:
[44819.903496]  [] ? dump_stack+0x46/0x59
[44819.903500]  [] ? __warn+0xba/0xe0
[44819.903502]  [] ? warn_slowpath_fmt+0x4a/0x50
[44819.903504]  [] ? btrfs_run_delayed_refs+0x26c/0x290
[44819.903507]  [] ? btrfs_release_path+0xe/0x80
[44819.903509]  [] ? 
btrfs_start_dirty_block_groups+0x2da/0x420
[44819.903511]  [] ? btrfs_commit_transaction+0x143/0x990
[44819.903514]  [] ? kmem_cache_free+0x165/0x180
[44819.903516]  [] ? btrfs_wait_ordered_range+0x7c/0x110
[44819.903518]  [] ? btrfs_sync_file+0x286/0x360
[44819.903522]  [] ? do_fsync+0x33/0x60
[44819.903524]  [] ? SyS_fdatasync+0xa/0x10
[44819.903528]  [] ? entry_SYSCALL_64_fastpath+0x13/0x8f
[44819.903529] ---[ end trace 6944811e170a0e57 ]---
[44819.903531] BTRFS: error (device bcache2) in btrfs_run_delayed_refs:2963: 
errno=-17 Object already exists
[44819.903533] BTRFS info (device bcache2): forced readonly


Since I had to get back up and running fast, I restored from backup. I
now bought some extra 3TB backup space and created a rescue system
including all tools on a USB stick, so next time it happens I may be
able to create an image of the broken filesystem.

Btrfs --repair refused to repair the filesystem telling me something
about compressed extents and an unsupported case, wanting me to take an
image and send it to the devs. *sigh*

System is kernel 4.7.2, Gentoo Linux, latest VirtualBox stable.
VirtualBox was using VDI image format without nocow. I now reverted
back to using nocow on VDI files and hope it doesn't strike again too
soon. I didn't try again yet, first I need to refresh my backup which
takes a while.

The filesystem runs on 3x SATA 1TB mraid1 draid0 through bcache in
writeback mode, backed by a 500GB 850 Evo - if that matters.

The problem occurred during high IO on 4.7.2. I previously ran 4.6.6
which didn't show this problem. Part of the culprit may be that I was
using bfq patches - I removed them for now and went back to deadline io
scheduler. The bfq patches froze my system a few times when I booted
4.7.2 which may already have broken my btrfs (although it shouldn't,
right? btrfs is transactional). Last time this happened (on an earlier
kernel), bfq may have been part of the problem, too. So I think bfq
does something to btrfs which may break the fs, or at least interferes
badly with the transaction as otherwise it shouldn't break. You may
want to run your test suites with bfq also (or different io schedulers
in general).

My home partition is mounted as a subvolume:
/dev/bcache0 on /home type btrfs 
(rw,noatime,compress=lzo,nossd,space_cache,autodefrag,subvolid=261,subvol=/home)


-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html