Re: Another ENOSPC situation

2016-04-05 Thread Austin S. Hemmelgarn

On 2016-04-02 01:43, Chris Murphy wrote:

On Fri, Apr 1, 2016 at 10:55 PM, Duncan <1i5t5.dun...@cox.net> wrote:

Marc Haber posted on Fri, 01 Apr 2016 15:40:29 +0200 as excerpted:



[4/502]mh@swivel:~$ sudo btrfs fi usage /
Overall:
 Device size: 600.00GiB
 Device allocated:600.00GiB
 Device unallocated:1.00MiB


That's the problem right there.  The admin didn't do his job and spot the
near full allocation issue



I don't yet agree this is an admin problem. This is the 2nd or 3rd
case we've seen only recently where there's plenty of space in all
chunk types and yet ENOSPC happens, seemingly only because there's no
unallocated space remaining. I don't know that this is a regression
for sure, but it sure seems like one.
I personally don't think it's a regression.  I've hit this myself before 
(although I make a point not to anymore, having to jump through hoops to 
the degree I did to get the FS working again tends to provide a pretty 
big incentive to not let it happen again), I know a couple of other 
people who have and never reported it here or on IRC, and I'd be willing 
to bet that the reason we're seeing it recently is that more 'regular' 
users (in contrast to system administrators or developers) are using 
BTRFS, and they tend to be more likely to hit such issues (because 
they're not as likely to know about them in the first place, let alone 
how to avoid them).






Data,single: Size:553.93GiB, Used:405.73GiB
/dev/mapper/swivelbtr 553.93GiB

Metadata,DUP: Size:23.00GiB, Used:3.83GiB
/dev/mapper/swivelbtr  46.00GiB

System,DUP: Size:32.00MiB, Used:112.00KiB
/dev/mapper/swivelbtr  64.00MiB

Unallocated:
/dev/mapper/swivelbtr   1.00MiB
[5/503]mh@swivel:~$


Both data and metadata have several GiB free, data ~140 GiB free, and
metadata isn't into global reserve, so the system isn't totally wedged,
only partially, due to the lack of unallocated space.


Unallocated space alone hasn't ever caused this that I can remember.
It's most often been totally full metadata chunks, with free space in
allocated data chunks, with no unallocated space out of which to
create another metadata chunk to write out changes.

There should be plenty of space for either a -dusage=1 or -musage=1
balance to free up a bunch of partially allocated chunks. Offhand I
don't think the profiles filter is helpful in this case.

OK so where I could be wrong is that I'm expecting balance doesn't
require allocated space to work. I'd expect that it can COW extents
from one chunk into another existing chunk (of the same type) and then
once that's successful, free up that chunk, i.e. revert it back to
unallocated. If balance can only copy into newly allocated chunks,
that seems like a big problem. I thought that problems had been fixed
a very long time ago.
Balance has always allocated new chunks.  This is IMHO one of the big 
issues with the current implementation of it (the other being that it 
can't be made asynchronous without some creative userspace work).  If we 
aren't converting chunk types and we're on a single device FS, we should 
be tail-packing existing chunks before we try to allocate new ones.


And what we don't see from 'usage' that we will see from 'df' is the
GlobalReserve values. I'd like to see that.

Anyway, in the meantime there is a work around:

btrfs dev add

Just add a device, even if it's an 8GiB flash drive. But it can be a
spare space on a partition, or it can be a logical volume, or whatever
you want. That'll add some gigs of unallocated space. Now the balance
will work, or for absolutely sure there's a bug (and a new one because
this has always worked in the past). After whatever filtered or full
balance is done, make sure to 'btfs dev rem' and confirm it's gone
with 'btrfs fi show' before removing the device. It's a two device
volume until that device is successfully removed and is in something
of a fragile state until then because any loss of data on that 2nd
device has a good chance of face planting the file system.
If you can ensure with a relative degree of certainty that you won't 
lose power or crash, and you have lots of RAM, a small ramdisk (or even 
zram) works well for this too.  I wouldn't use either personally for a 
critical filesystem (I'd pull out the disk and hook it up internally to 
another system with spare disk space and handle things there), but both 
options should work fine.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Another ENOSPC situation

2016-04-02 Thread Duncan
Chris Murphy posted on Fri, 01 Apr 2016 23:43:46 -0600 as excerpted:

> On Fri, Apr 1, 2016 at 10:55 PM, Duncan <1i5t5.dun...@cox.net> wrote:
>> Marc Haber posted on Fri, 01 Apr 2016 15:40:29 +0200 as excerpted:
> 
>>> [4/502]mh@swivel:~$ sudo btrfs fi usage / Overall:
>>> Device size: 600.00GiB Device allocated:  
>>>  600.00GiB Device unallocated:1.00MiB
>>
>> That's the problem right there.  The admin didn't do his job and spot
>> the near full allocation issue
> 
> 
> I don't yet agree this is an admin problem. This is the 2nd or 3rd case
> we've seen only recently where there's plenty of space in all chunk
> types and yet ENOSPC happens, seemingly only because there's no
> unallocated space remaining. I don't know that this is a regression for
> sure, but it sure seems like one.

Notice that he said _balance_ failed with ENOSPC.  He did _NOT_ say he 
was getting it in ordinary usage, just yet.  Which would fit a 100% 
allocated situation, with plenty of space left in both data and metadata 
chunks.  The plenty of space left inside the chunks would keep ordinary 
usage from running into problems just yet, but balance really /does/ need 
room to allocate at least one new chunk in ordered to properly handle the 
chunk rewrite via COW.  (At least for data, metadata seems to work a bit 
differently.  See below.)

Balance has always failed with ENOSPC if there was no unallocated space 
left.  It used to happen all the time, before btrfs learned how to delete 
empty chunks in 3.17, but while that helps, it only works for literally 
/empty/ chunks.  Chunks with even a single block/node still in use don't 
get deleted automatically.

What I think is happening now is that while the empty-chunk deleting from 
3.17 on helped, it has been long enough since then, now, that people with 
particular usage patterns, I'd strongly suspect those with heavy 
snapshotting, don't tend to fully empty their chunks to the extent that 
those with other usage patterns do, and it has been just long enough now 
that we're beginning to see the problem reported again, because deleting 
empty chunks helped, but they weren't fully emptying enough chunks to 
keep up with things that way, in their particular use-cases.

>>> Data,single: Size:553.93GiB, Used:405.73GiB
>>>/dev/mapper/swivelbtr 553.93GiB
>>>
>>> Metadata,DUP: Size:23.00GiB, Used:3.83GiB
>>>/dev/mapper/swivelbtr  46.00GiB
>>>
>>> System,DUP: Size:32.00MiB, Used:112.00KiB
>>>/dev/mapper/swivelbtr  64.00MiB
>>>
>>> Unallocated:
>>>/dev/mapper/swivelbtr   1.00MiB
>>> [5/503]mh@swivel:~$
>>
>> Both data and metadata have several GiB free, data ~140 GiB free, and
>> metadata isn't into global reserve, so the system isn't totally wedged,
>> only partially, due to the lack of unallocated space.
> 
> Unallocated space alone hasn't ever caused this that I can remember.
> It's most often been totally full metadata chunks, with free space in
> allocated data chunks, with no unallocated space out of which to create
> another metadata chunk to write out changes.

Unallocated space alone doesn't cause ENOSPC with normal operations; for 
those you're correct, running out of either data or metadata space is 
required as well.  (Normally it's metadata that runs out, but I recall 
seeing one post from someone who had metadata room but full data.  The 
behavior was.. "interesting", as he could do renames, etc, and even 
create small files as long as they were small enough to stay in 
metadata.  As soon as he tried to do anything that needed an actual data 
extent, however, ENOSPC.)

But balance has always required space to allocate at least one chunk, as 
COW means the existing chunk can't be released until everything is 
rewritten into the new one.

Tho it seems that btrfs can sometimes either write very small metadata 
chunks, which don't forget are dup by default on a single device, as they 
are in this case.  He has 1 MiB unallocated.  Split in half that's 512 
KiB.  I'm not sure if btrfs can go that small, but if it can, and it can 
find a low enough usage metadata chunk to write into it, freeing the 
larger metadata chunk...

Or maybe btrfs can actually use the global reserve for that, since global 
reserve is part of metadata.  If it can, a 512 MiB global reserve would 
be just large enough to write the two copies of a nominally 256 MiB 
metadata chunk.

Either way, I've seen a number of times now where btrfs was able to 
balance metadata, when it had less than the 256 (*2 if dup) MiB 
unallocated that would normally be required.  Maybe it /is/ able to use 
global reserve for that, which would allow it to work, as long as 
metadata isn't so tight that it's already using global reserve.  That's 
actually what I bet it's doing, now that I think about it.  Because as 
long as the global reserve isn't being used, 512 MiB of global reserve 
would be exactly 2*256 MiB metadata chunks, and if they're 

Re: Another ENOSPC situation

2016-04-01 Thread Chris Murphy
On Fri, Apr 1, 2016 at 10:55 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> Marc Haber posted on Fri, 01 Apr 2016 15:40:29 +0200 as excerpted:

>> [4/502]mh@swivel:~$ sudo btrfs fi usage /
>> Overall:
>> Device size: 600.00GiB
>> Device allocated:600.00GiB
>> Device unallocated:1.00MiB
>
> That's the problem right there.  The admin didn't do his job and spot the
> near full allocation issue


I don't yet agree this is an admin problem. This is the 2nd or 3rd
case we've seen only recently where there's plenty of space in all
chunk types and yet ENOSPC happens, seemingly only because there's no
unallocated space remaining. I don't know that this is a regression
for sure, but it sure seems like one.



>>
>> Data,single: Size:553.93GiB, Used:405.73GiB
>>/dev/mapper/swivelbtr 553.93GiB
>>
>> Metadata,DUP: Size:23.00GiB, Used:3.83GiB
>>/dev/mapper/swivelbtr  46.00GiB
>>
>> System,DUP: Size:32.00MiB, Used:112.00KiB
>>/dev/mapper/swivelbtr  64.00MiB
>>
>> Unallocated:
>>/dev/mapper/swivelbtr   1.00MiB
>> [5/503]mh@swivel:~$
>
> Both data and metadata have several GiB free, data ~140 GiB free, and
> metadata isn't into global reserve, so the system isn't totally wedged,
> only partially, due to the lack of unallocated space.

Unallocated space alone hasn't ever caused this that I can remember.
It's most often been totally full metadata chunks, with free space in
allocated data chunks, with no unallocated space out of which to
create another metadata chunk to write out changes.

There should be plenty of space for either a -dusage=1 or -musage=1
balance to free up a bunch of partially allocated chunks. Offhand I
don't think the profiles filter is helpful in this case.

OK so where I could be wrong is that I'm expecting balance doesn't
require allocated space to work. I'd expect that it can COW extents
from one chunk into another existing chunk (of the same type) and then
once that's successful, free up that chunk, i.e. revert it back to
unallocated. If balance can only copy into newly allocated chunks,
that seems like a big problem. I thought that problems had been fixed
a very long time ago.

And what we don't see from 'usage' that we will see from 'df' is the
GlobalReserve values. I'd like to see that.

Anyway, in the meantime there is a work around:

btrfs dev add

Just add a device, even if it's an 8GiB flash drive. But it can be a
spare space on a partition, or it can be a logical volume, or whatever
you want. That'll add some gigs of unallocated space. Now the balance
will work, or for absolutely sure there's a bug (and a new one because
this has always worked in the past). After whatever filtered or full
balance is done, make sure to 'btfs dev rem' and confirm it's gone
with 'btrfs fi show' before removing the device. It's a two device
volume until that device is successfully removed and is in something
of a fragile state until then because any loss of data on that 2nd
device has a good chance of face planting the file system.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Another ENOSPC situation

2016-04-01 Thread Duncan
Marc Haber posted on Fri, 01 Apr 2016 15:40:29 +0200 as excerpted:

> Hi,
> 
> just for a change, this is another btrfs on a different host. The host
> is also running Debian unstable with mainline kernels, the btrfs in
> question was created (not converted) in March 2015 with btrfs-tools
> 3.17. It is the root fs of my main work notebook which is under
> workstation load, with lots of snapshots being created and deleted.
> 
> Balance immediately fails with ENOSPC
> 
> balance -dprofiles=single -dusage=1 goes through "fine" ("had to
> relocate 0 out of 602 chunks")
> 
> balance -dprofiles=single -dusage=2 also ENOSPCes immediately.
> 
> [4/502]mh@swivel:~$ sudo btrfs fi usage /
> Overall:
> Device size: 600.00GiB
> Device allocated:600.00GiB
> Device unallocated:1.00MiB

That's the problem right there.  The admin didn't do his job and spot the 
near full allocation issue (perhaps with the help of some script set to 
run periodically and tell him about it) before it got critical, and now 
there's no room left to balance, to fix the problem.

This despite the fact that the admin chose to run a not yet entirely 
stable filesystem that's well known to run off the rails in precisely 
this sort of way, occasionally, with specific use-cases such as heavy 
snapshotting more often than others.

> Device missing:  0.00B
> Used:413.40GiB
> Free (estimated):148.20GiB  (min: 148.20GiB)

Tho the used vs. free isn't all that bad... it's just that the allocated 
vs. unallocated was allowed to run off the rails and get the filesystem 
in a bind.

But that does mean it should be possible to do something about it. =:^)

> Data ratio:   1.00
> Metadata ratio:   2.00
> Global reserve:  512.00MiB  (used: 0.00B)
> 
> Data,single: Size:553.93GiB, Used:405.73GiB
>/dev/mapper/swivelbtr 553.93GiB
> 
> Metadata,DUP: Size:23.00GiB, Used:3.83GiB
>/dev/mapper/swivelbtr  46.00GiB
> 
> System,DUP: Size:32.00MiB, Used:112.00KiB
>/dev/mapper/swivelbtr  64.00MiB
> 
> Unallocated:
>/dev/mapper/swivelbtr   1.00MiB
> [5/503]mh@swivel:~$

Both data and metadata have several GiB free, data ~140 GiB free, and 
metadata isn't into global reserve, so the system isn't totally wedged, 
only partially, due to the lack of unallocated space.

> btrfs balance -mprofiles seems to do something. one kworked and one
> btrfs-transaction process hog one CPU core each for hours, while
> blocking the filesystem for minutes apiece, which leads to the host
> being nearly unuseable up to the point of "clock and mouse pointer
> frozen for nearly ten minutes".
> 
> The btrfs balance cancel I issued after four hours of this state took
> eleven minutes alone to complete.

It's worth noting as an aside that Linux isn't necessarily tuned for 
interactivity by default, tho there are definitely ways to make it more 
so.  Additionally, on some mobos at least, it's possible to tweak the 
BIOS balance between interactivity and thruput.  An old Tyan board (PCI 
not the newer PCIE, which avoids some of the problems with multiple 
dedicated buses) I had was tilted a bit heavily toward thruput, which did 
make sense as it was actually a server board, until I tweaked things a 
bit.  That made a LOT of difference, curing the dragging, but also curing 
occasional audio runouts, etc.  Turns out it was simply tuned to do huge 
bus "packets" (I forgot the proper in-context term, and that board died a 
few years ago, so...), increasing thruput, but also increasing latency 
beyond what the sound card and keyboard/mouse (or in that case the human 
operating them) could reasonably deal with.  By shortening the PCI 
"packet length", it reduced thruput a bit but greatly improved latency, 
letting other users have their turn when they needed it, not some time 
later.

Of course in addition to PCIE putting many of those things on dedicated 
buses these days, ssds are so much faster that a lot of things that could 
potentially be problems on spinning rust, simply don't tend to be issues 
on ssds.  As much as anything, I think that's what a lot of users 
bothered by such problems are turning to, and I'd bet that's a good part 
of why SSDs are as popular as they are, as well.  I know I've simply not 
had many of the problems here that others had, and while I think part of 
it is the multiple relatively small but independent filesystems and part 
of it may be because I don't use snapshotting, I also think a major part 
of it is simply that the SSDs I'm running btrfs on are simply so much 
faster than spinning rust that the problems either don't occur, or if 
they do, they're done before I even notice them.

FWIW, I do still use spinning rust, but for my media partition and 
(second) backups, not for anything speed critical at all.  And FWIW, I 
still use reiserfs on 

Re: Another ENOSPC situation

2016-04-01 Thread Henk Slager
On Fri, Apr 1, 2016 at 10:40 PM, Marc Haber  wrote:
> On Fri, Apr 01, 2016 at 09:20:52PM +0200, Henk Slager wrote:
>> On Fri, Apr 1, 2016 at 6:50 PM, Marc Haber  
>> wrote:
>> > On Fri, Apr 01, 2016 at 06:30:20PM +0200, Marc Haber wrote:
>> >> On Fri, Apr 01, 2016 at 05:44:30PM +0200, Henk Slager wrote:
>> >> > On Fri, Apr 1, 2016 at 3:40 PM, Marc Haber 
>> >> >  wrote:
>> >> > > btrfs balance -mprofiles seems to do something. one kworked and one
>> >> > > btrfs-transaction process hog one CPU core each for hours, while
>> >> > > blocking the filesystem for minutes apiece, which leads to the host
>> >> > > being nearly unuseable up to the point of "clock and mouse pointer
>> >> > > frozen for nearly ten minutes".
>> >> >
>> >> > I assume you still have your every 10 minutes snapshotting running
>> >> > while balancing?
>> >>
>> >> No, I disabled the cronjob before trying the balance. I might be
>> >> crazy, but not stup^wunexperienced.
>> >
>> > That being said, I would still expect the code not to allow _this_
>> > kind of effect on the entire system when two alledgely incompatible
>> > operations run simultaneously. I mean, Linux is a multi-user,
>> > multi-tasking operating system where one simply cannot expect all
>> > processes to be cooperative to each other. We have the operating
>> > systems to prevent this kind of issues, not to cause them.
>>
>> Maybe look at it differently: Does user mh have trouble using this
>> laptop w.r.t. storing files?
>
> No. I would have cried murder otherwise.
>
>> In openSUSE Tumbleweed (the snapshot from end of march), root access
>> is needed to change the default snapshotting config, otherwise you
>> will have a 10 year history. After that change has been done according
>> to needs of the user, there is no need to run manual balance.
>
> So you are saying the balancing a filesystem should never be
> necessary? Or what are you trying to say?

There is a package  bbtrfsmaintenance  which does balancing for the
user after it is configured by root according to user's wishes and
needs.

Key thing I want to say is that you should change you snapshotting
rate and/or policy. It has been hinted before and it is more a
psychological issue than technical I think.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Another ENOSPC situation

2016-04-01 Thread Marc Haber
On Fri, Apr 01, 2016 at 09:20:52PM +0200, Henk Slager wrote:
> On Fri, Apr 1, 2016 at 6:50 PM, Marc Haber  
> wrote:
> > On Fri, Apr 01, 2016 at 06:30:20PM +0200, Marc Haber wrote:
> >> On Fri, Apr 01, 2016 at 05:44:30PM +0200, Henk Slager wrote:
> >> > On Fri, Apr 1, 2016 at 3:40 PM, Marc Haber  
> >> > wrote:
> >> > > btrfs balance -mprofiles seems to do something. one kworked and one
> >> > > btrfs-transaction process hog one CPU core each for hours, while
> >> > > blocking the filesystem for minutes apiece, which leads to the host
> >> > > being nearly unuseable up to the point of "clock and mouse pointer
> >> > > frozen for nearly ten minutes".
> >> >
> >> > I assume you still have your every 10 minutes snapshotting running
> >> > while balancing?
> >>
> >> No, I disabled the cronjob before trying the balance. I might be
> >> crazy, but not stup^wunexperienced.
> >
> > That being said, I would still expect the code not to allow _this_
> > kind of effect on the entire system when two alledgely incompatible
> > operations run simultaneously. I mean, Linux is a multi-user,
> > multi-tasking operating system where one simply cannot expect all
> > processes to be cooperative to each other. We have the operating
> > systems to prevent this kind of issues, not to cause them.
> 
> Maybe look at it differently: Does user mh have trouble using this
> laptop w.r.t. storing files?

No. I would have cried murder otherwise.

> In openSUSE Tumbleweed (the snapshot from end of march), root access
> is needed to change the default snapshotting config, otherwise you
> will have a 10 year history. After that change has been done according
> to needs of the user, there is no need to run manual balance.

So you are saying the balancing a filesystem should never be
necessary? Or what are you trying to say?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Another ENOSPC situation

2016-04-01 Thread Henk Slager
On Fri, Apr 1, 2016 at 6:50 PM, Marc Haber  wrote:
> On Fri, Apr 01, 2016 at 06:30:20PM +0200, Marc Haber wrote:
>> On Fri, Apr 01, 2016 at 05:44:30PM +0200, Henk Slager wrote:
>> > On Fri, Apr 1, 2016 at 3:40 PM, Marc Haber  
>> > wrote:
>> > > btrfs balance -mprofiles seems to do something. one kworked and one
>> > > btrfs-transaction process hog one CPU core each for hours, while
>> > > blocking the filesystem for minutes apiece, which leads to the host
>> > > being nearly unuseable up to the point of "clock and mouse pointer
>> > > frozen for nearly ten minutes".
>> >
>> > I assume you still have your every 10 minutes snapshotting running
>> > while balancing?
>>
>> No, I disabled the cronjob before trying the balance. I might be
>> crazy, but not stup^wunexperienced.
>
> That being said, I would still expect the code not to allow _this_
> kind of effect on the entire system when two alledgely incompatible
> operations run simultaneously. I mean, Linux is a multi-user,
> multi-tasking operating system where one simply cannot expect all
> processes to be cooperative to each other. We have the operating
> systems to prevent this kind of issues, not to cause them.

Maybe look at it differently: Does user mh have trouble using this
laptop w.r.t. storing files?

In openSUSE Tumbleweed (the snapshot from end of march), root access
is needed to change the default snapshotting config, otherwise you
will have a 10 year history. After that change has been done according
to needs of the user, there is no need to run manual balance.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Another ENOSPC situation

2016-04-01 Thread Marc Haber
On Fri, Apr 01, 2016 at 06:30:20PM +0200, Marc Haber wrote:
> On Fri, Apr 01, 2016 at 05:44:30PM +0200, Henk Slager wrote:
> > On Fri, Apr 1, 2016 at 3:40 PM, Marc Haber  
> > wrote:
> > > btrfs balance -mprofiles seems to do something. one kworked and one
> > > btrfs-transaction process hog one CPU core each for hours, while
> > > blocking the filesystem for minutes apiece, which leads to the host
> > > being nearly unuseable up to the point of "clock and mouse pointer
> > > frozen for nearly ten minutes".
> > 
> > I assume you still have your every 10 minutes snapshotting running
> > while balancing?
> 
> No, I disabled the cronjob before trying the balance. I might be
> crazy, but not stup^wunexperienced.

That being said, I would still expect the code not to allow _this_
kind of effect on the entire system when two alledgely incompatible
operations run simultaneously. I mean, Linux is a multi-user,
multi-tasking operating system where one simply cannot expect all
processes to be cooperative to each other. We have the operating
systems to prevent this kind of issues, not to cause them.

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Another ENOSPC situation

2016-04-01 Thread Marc Haber
On Fri, Apr 01, 2016 at 05:44:30PM +0200, Henk Slager wrote:
> On Fri, Apr 1, 2016 at 3:40 PM, Marc Haber  
> wrote:
> > btrfs balance -mprofiles seems to do something. one kworked and one
> > btrfs-transaction process hog one CPU core each for hours, while
> > blocking the filesystem for minutes apiece, which leads to the host
> > being nearly unuseable up to the point of "clock and mouse pointer
> > frozen for nearly ten minutes".
> 
> I assume you still have your every 10 minutes snapshotting running
> while balancing?

No, I disabled the cronjob before trying the balance. I might be
crazy, but not stup^wunexperienced.

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Another ENOSPC situation

2016-04-01 Thread Henk Slager
On Fri, Apr 1, 2016 at 3:40 PM, Marc Haber  wrote:
> Hi,
>
> just for a change, this is another btrfs on a different host. The host
> is also running Debian unstable with mainline kernels, the btrfs in
> question was created (not converted) in March 2015 with btrfs-tools
> 3.17. It is the root fs of my main work notebook which is under
> workstation load, with lots of snapshots being created and deleted.
>
> Balance immediately fails with ENOSPC
>
> balance -dprofiles=single -dusage=1 goes through "fine" ("had to
> relocate 0 out of 602 chunks")
>
> balance -dprofiles=single -dusage=2 also ENOSPCes immediately.
>
> [4/502]mh@swivel:~$ sudo btrfs fi usage /
> Overall:
> Device size: 600.00GiB
> Device allocated:600.00GiB
> Device unallocated:1.00MiB
> Device missing:  0.00B
> Used:413.40GiB
> Free (estimated):148.20GiB  (min: 148.20GiB)
> Data ratio:   1.00
> Metadata ratio:   2.00
> Global reserve:  512.00MiB  (used: 0.00B)
>
> Data,single: Size:553.93GiB, Used:405.73GiB
>/dev/mapper/swivelbtr 553.93GiB
>
> Metadata,DUP: Size:23.00GiB, Used:3.83GiB
>/dev/mapper/swivelbtr  46.00GiB
>
> System,DUP: Size:32.00MiB, Used:112.00KiB
>/dev/mapper/swivelbtr  64.00MiB
>
> Unallocated:
>/dev/mapper/swivelbtr   1.00MiB
> [5/503]mh@swivel:~$
>
> btrfs balance -mprofiles seems to do something. one kworked and one
> btrfs-transaction process hog one CPU core each for hours, while
> blocking the filesystem for minutes apiece, which leads to the host
> being nearly unuseable up to the point of "clock and mouse pointer
> frozen for nearly ten minutes".

I assume you still have your every 10 minutes snapshotting running
while balancing?

> The btrfs balance cancel I issued after four hours of this state took
> eleven minutes alone to complete.
>
> These are all log entries that were obtained after starting btrfs
> balance -mprofiles on 09:43
> Apr  1 12:18:21 swivel kernel: [253651.970413] BTRFS info (device dm-14): 
> found 3523 extents
> Apr  1 12:18:21 swivel kernel: [253652.035572] BTRFS info (device dm-14): 
> relocating block group 1538365849600 flags 36
> Apr  1 13:30:57 swivel kernel: [258007.653597] BTRFS info (device dm-14): 
> found 3585 extents
> Apr  1 13:30:57 swivel kernel: [258007.746541] BTRFS info (device dm-14): 
> relocating block group 1536755236864 flags 36
> Apr  1 13:49:39 swivel kernel: [259130.296184] BTRFS info (device dm-14): 
> found 3047 extents
> Apr  1 13:49:39 swivel kernel: [259130.357314] BTRFS info (device dm-14): 
> relocating block group 1528702173184 flags 36
> Apr  1 14:30:00 swivel kernel: [261550.776348] BTRFS info (device dm-14): 
> found 4200 extents
>
> This kernel trace from 11:16 is not btrfs-related, is it? I guess it's
> bluetooth related since it happened simultaneously to the bluetooth
> device popping out an in:
> Apr  1 11:16:38 swivel kernel: [249948.993751] usb 1-1.4: USB disconnect, 
> device number 39
> Apr  1 11:16:38 swivel systemd[1]: Starting Load/Save RF Kill Switch Status...
> Apr  1 11:16:38 swivel systemd[1]: Started Load/Save RF Kill Switch Status.
> Apr  1 11:16:38 swivel systemd[1]: bluetooth.target: Unit not needed anymore. 
> Stopping.
> Apr  1 11:16:38 swivel systemd[1]: Stopped target Bluetooth.
> Apr  1 11:16:38 swivel laptop-mode: Laptop mode
> Apr  1 11:16:38 swivel laptop-mode: enabled, not active
> Apr  1 11:16:39 swivel kernel: [249949.211549] usb 1-1.4: new full-speed USB 
> device number 40 using ehci-pci
> Apr  1 11:16:39 swivel kernel: [249949.308386] usb 1-1.4: New USB device 
> found, idVendor=0a5c, idProduct=217f
> Apr  1 11:16:39 swivel kernel: [249949.308397] usb 1-1.4: New USB device 
> strings: Mfr=1, Product=2, SerialNumber=3
> Apr  1 11:16:39 swivel kernel: [249949.308402] usb 1-1.4: Product: Broadcom 
> Bluetooth Device
> Apr  1 11:16:39 swivel kernel: [249949.308407] usb 1-1.4: Manufacturer: 
> Broadcom Corp
> Apr  1 11:16:39 swivel kernel: [249949.308412] usb 1-1.4: SerialNumber: 
> CCAF78F1274F
> Apr  1 11:16:39 swivel systemd[1]: Reached target Bluetooth.
> Apr  1 11:16:39 swivel kernel: [249949.507794] [ cut here 
> ]
> Apr  1 11:16:39 swivel kernel: [249949.507810] WARNING: CPU: 1 PID: 11 at 
> arch/x86/kernel/cpu/perf_event_intel_ds.c:325 reserve_ds_buffers+0x102/0x326()
> Apr  1 11:16:39 swivel kernel: [249949.507813] alloc_bts_buffer: BTS buffer 
> allocation failure
> Apr  1 11:16:39 swivel kernel: [249949.507816] Modules linked in: cpuid 
> hid_generic usbhid hid e1000e tun ctr ccm rfcomm bridge stp llc 
> cpufreq_userspace cpufreq_stats cpufreq_conservative cpufreq_powersave 
> nf_conntrack_netlink nfnetlink bnep binfmt_misc intel_rapl 
> x86_pkg_temp_thermal arc4 intel_powerclamp kvm_intel kvm irqbypass iwldvm 
> snd_hda_codec_conexant 

Another ENOSPC situation

2016-04-01 Thread Marc Haber
Hi,

just for a change, this is another btrfs on a different host. The host
is also running Debian unstable with mainline kernels, the btrfs in
question was created (not converted) in March 2015 with btrfs-tools
3.17. It is the root fs of my main work notebook which is under
workstation load, with lots of snapshots being created and deleted.

Balance immediately fails with ENOSPC

balance -dprofiles=single -dusage=1 goes through "fine" ("had to
relocate 0 out of 602 chunks")

balance -dprofiles=single -dusage=2 also ENOSPCes immediately.

[4/502]mh@swivel:~$ sudo btrfs fi usage /
Overall:
Device size: 600.00GiB
Device allocated:600.00GiB
Device unallocated:1.00MiB
Device missing:  0.00B
Used:413.40GiB
Free (estimated):148.20GiB  (min: 148.20GiB)
Data ratio:   1.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,single: Size:553.93GiB, Used:405.73GiB
   /dev/mapper/swivelbtr 553.93GiB

Metadata,DUP: Size:23.00GiB, Used:3.83GiB
   /dev/mapper/swivelbtr  46.00GiB

System,DUP: Size:32.00MiB, Used:112.00KiB
   /dev/mapper/swivelbtr  64.00MiB

Unallocated:
   /dev/mapper/swivelbtr   1.00MiB
[5/503]mh@swivel:~$ 

btrfs balance -mprofiles seems to do something. one kworked and one
btrfs-transaction process hog one CPU core each for hours, while
blocking the filesystem for minutes apiece, which leads to the host
being nearly unuseable up to the point of "clock and mouse pointer
frozen for nearly ten minutes".

The btrfs balance cancel I issued after four hours of this state took
eleven minutes alone to complete.

These are all log entries that were obtained after starting btrfs
balance -mprofiles on 09:43
Apr  1 12:18:21 swivel kernel: [253651.970413] BTRFS info (device dm-14): found 
3523 extents
Apr  1 12:18:21 swivel kernel: [253652.035572] BTRFS info (device dm-14): 
relocating block group 1538365849600 flags 36
Apr  1 13:30:57 swivel kernel: [258007.653597] BTRFS info (device dm-14): found 
3585 extents
Apr  1 13:30:57 swivel kernel: [258007.746541] BTRFS info (device dm-14): 
relocating block group 1536755236864 flags 36
Apr  1 13:49:39 swivel kernel: [259130.296184] BTRFS info (device dm-14): found 
3047 extents
Apr  1 13:49:39 swivel kernel: [259130.357314] BTRFS info (device dm-14): 
relocating block group 1528702173184 flags 36
Apr  1 14:30:00 swivel kernel: [261550.776348] BTRFS info (device dm-14): found 
4200 extents

This kernel trace from 11:16 is not btrfs-related, is it? I guess it's
bluetooth related since it happened simultaneously to the bluetooth
device popping out an in:
Apr  1 11:16:38 swivel kernel: [249948.993751] usb 1-1.4: USB disconnect, 
device number 39
Apr  1 11:16:38 swivel systemd[1]: Starting Load/Save RF Kill Switch Status...
Apr  1 11:16:38 swivel systemd[1]: Started Load/Save RF Kill Switch Status.
Apr  1 11:16:38 swivel systemd[1]: bluetooth.target: Unit not needed anymore. 
Stopping.
Apr  1 11:16:38 swivel systemd[1]: Stopped target Bluetooth.
Apr  1 11:16:38 swivel laptop-mode: Laptop mode
Apr  1 11:16:38 swivel laptop-mode: enabled, not active
Apr  1 11:16:39 swivel kernel: [249949.211549] usb 1-1.4: new full-speed USB 
device number 40 using ehci-pci
Apr  1 11:16:39 swivel kernel: [249949.308386] usb 1-1.4: New USB device found, 
idVendor=0a5c, idProduct=217f
Apr  1 11:16:39 swivel kernel: [249949.308397] usb 1-1.4: New USB device 
strings: Mfr=1, Product=2, SerialNumber=3
Apr  1 11:16:39 swivel kernel: [249949.308402] usb 1-1.4: Product: Broadcom 
Bluetooth Device
Apr  1 11:16:39 swivel kernel: [249949.308407] usb 1-1.4: Manufacturer: 
Broadcom Corp
Apr  1 11:16:39 swivel kernel: [249949.308412] usb 1-1.4: SerialNumber: 
CCAF78F1274F
Apr  1 11:16:39 swivel systemd[1]: Reached target Bluetooth.
Apr  1 11:16:39 swivel kernel: [249949.507794] [ cut here 
]
Apr  1 11:16:39 swivel kernel: [249949.507810] WARNING: CPU: 1 PID: 11 at 
arch/x86/kernel/cpu/perf_event_intel_ds.c:325 reserve_ds_buffers+0x102/0x326()
Apr  1 11:16:39 swivel kernel: [249949.507813] alloc_bts_buffer: BTS buffer 
allocation failure
Apr  1 11:16:39 swivel kernel: [249949.507816] Modules linked in: cpuid 
hid_generic usbhid hid e1000e tun ctr ccm rfcomm bridge stp llc 
cpufreq_userspace cpufreq_stats cpufreq_conservative cpufreq_powersave 
nf_conntrack_netlink nfnetlink bnep binfmt_misc intel_rapl x86_pkg_temp_thermal 
arc4 intel_powerclamp kvm_intel kvm irqbypass iwldvm snd_hda_codec_conexant 
snd_hda_codec_generic mac80211 input_leds btusb btbcm i2c_i801 snd_hda_intel 
btintel snd_hda_codec bluetooth iwlwifi snd_hda_core cfg80211 snd_hwdep sg 
snd_pcm_oss snd_mixer_oss lpc_ich mfd_core snd_pcm shpchp snd_timer 
thinkpad_acpi nvram snd battery soundcore rfkill ac tpm_tis tpm evdev processor 
xt_TCPMSS xt_tcpudp iptable_mangle iptable_filter