Re: OOM: Better, but still there on

2016-12-17 Thread Tetsuo Handa
Nils Holland wrote:
> On Sat, Dec 17, 2016 at 11:44:45PM +0900, Tetsuo Handa wrote:
> > On 2016/12/17 21:59, Nils Holland wrote:
> > > On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote:
> > >> mount -t tracefs none /debug/trace
> > >> echo 1 > /debug/trace/events/vmscan/enable
> > >> cat /debug/trace/trace_pipe > trace.log
> > >>
> > >> should help
> > >> [...]
> > >
> > > No problem! I enabled writing the trace data to a file and then tried
> > > to trigger another OOM situation. That worked, this time without a
> > > complete kernel panic, but with only my processes being killed and the
> > > system becoming unresponsive.
> >
> > Under OOM situation, writing to a file on disk unlikely works. Maybe
> > logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port"
> > if your are using bash) works better. (I wish we can do it from kernel
> > so that /bin/cat is not disturbed by delays due to page fault.)
> >
> > If you can configure netconsole for logging OOM killer messages and
> > UDP socket for logging trace_pipe messages, udplogger at
> > https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/
> > might fit for logging both output with timestamp into a single file.
>
> Actually, I decided to give this a try once more on machine #2, i.e.
> not the one that produced the previous trace, but the other one.
>
> I logged via netconsole as well as 'cat /debug/trace/trace_pipe' via
> the network to another machine running udplogger. After the machine
> had been frehsly booted and I had set up the logging, unpacking of the
> firefox source tarball started. After it had been unpacking for a
> while, the first load of trace messages started to appear. Some time
> later, OOMs started to appear - I've got quite a lot of them in my
> capture file this time.

Thank you for capturing. I think it worked well. Let's wait for Michal.

The first OOM killer invocation was

  2016-12-17 21:36:56 192.168.17.23:6665 [ 1276.828639] Killed process 3894 
(xz) total-vm:68640kB, anon-rss:65920kB, file-rss:1696kB, shmem-rss:0kB

and the last OOM killer invocation was

  2016-12-17 21:39:27 192.168.17.23:6665 [ 1426.800677] Killed process 3070 
(screen) total-vm:7440kB, anon-rss:960kB, file-rss:2360kB, shmem-rss:0kB

and trace output was sent until

  2016-12-17 21:37:07 192.168.17.23:48468 kworker/u4:4-3896  [000]   
1287.202958: mm_shrink_slab_start: super_cache_scan+0x0/0x170 f4436ed4: nid: 0 
objects to shrink 86 gfp_flags GFP_NOFS|__GFP_NOFAIL pgs_scanned 32 lru_pgs 
406078 cache items 412 delta 0 total_scan 86

which (I hope) should be sufficient for analysis.

>
> Unfortunately, the reclaim trace messages stopped a while after the first
> OOM messages show up - most likely my "cat" had been killed at that
> point or became unresponsive. :-/
>
> In the end, the machine didn't completely panic, but after nothing new
> showed up being logged via the network, I walked up to the
> machine and found it in a state where I couldn't really log in to it
> anymore, but all that worked was, as always, a magic SysRequest reboot.

There is a known issue (since Linux 2.6.32) that all memory allocation requests
get stuck due to kswapd v.s. shrink_inactive_list() livelock which occurs under
almost OOM situation ( http://lkml.kernel.org/r/20160211225929.GU14668@dastard 
).
If we hit it, even "page allocation stalls for " messages do not show up.

Even if we didn't hit it, although agetty and sshd were still alive

  2016-12-17 21:39:27 192.168.17.23:6665 [ 1426.800614] [ 2800] 0  2800 
1152  494   6   30 0 agetty
  2016-12-17 21:39:27 192.168.17.23:6665 [ 1426.800618] [ 2802] 0  2802 
1457 1055   6   30 -1000 sshd

memory allocation was delaying too much

  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034624] btrfs-transacti: page 
alloction stalls for 93995ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL)
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034628] CPU: 1 PID: 1949 Comm: 
btrfs-transacti Not tainted 4.9.0-gentoo #3
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034630] Hardware name: 
Hewlett-Packard Compaq 15 Notebook PC/21F7, BIOS F.22 08/06/2014
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034638]  f162f94c c142bd8e 
0001  f162f970 c110ad7e c1b58833 02400840
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034645]  f162f978 f162f980 
c1b55814 f162f960 0160 f162fa38 c110b78c 02400840
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034652]  c1b55814 00016f2b 
 0040  f21d f21d 0001
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034653] Call Trace:
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034660]  [] 
dump_stack+0x47/0x69
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034666]  [] 
warn_alloc+0xce/0xf0
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034671]  [] 
__alloc_pages_nodemask+0x97c/0xd30
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034678]  [

Re: OOM: Better, but still there on

2016-12-17 Thread Xin Zhou
Hi,
The system supposes to have special memory reservation for coredump and other 
debug info when encountering panic,
the size seems configurable.
Thanks,
Xin
 
 

Sent: Saturday, December 17, 2016 at 6:44 AM
From: "Tetsuo Handa" 
To: "Nils Holland" , "Michal Hocko" 
Cc: linux-ker...@vger.kernel.org, linux...@kvack.org, "Chris Mason" 
, "David Sterba" , linux-btrfs@vger.kernel.org
Subject: Re: OOM: Better, but still there on
On 2016/12/17 21:59, Nils Holland wrote:
> On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote:
>> mount -t tracefs none /debug/trace
>> echo 1 > /debug/trace/events/vmscan/enable
>> cat /debug/trace/trace_pipe > trace.log
>>
>> should help
>> [...]
>
> No problem! I enabled writing the trace data to a file and then tried
> to trigger another OOM situation. That worked, this time without a
> complete kernel panic, but with only my processes being killed and the
> system becoming unresponsive. When that happened, I let it run for
> another minute or two so that in case it was still logging something
> to the trace file, it could continue to do so some time longer. Then I
> rebooted with the only thing that still worked, i.e. by means of magic
> SysRequest.

Under OOM situation, writing to a file on disk unlikely works. Maybe
logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port"
if your are using bash) works better. (I wish we can do it from kernel
so that /bin/cat is not disturbed by delays due to page fault.)

If you can configure netconsole for logging OOM killer messages and
UDP socket for logging trace_pipe messages, udplogger at
https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/
might fit for logging both output with timestamp into a single file.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at 
http://vger.kernel.org/majordomo-info.html[http://vger.kernel.org/majordomo-info.html]
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount raid1 gives open_ctree failed

2016-12-17 Thread Kai Stian Olstad

On 25. nov. 2016 21:19, Kai Stian Olstad wrote:

I have problem mounting my 3 disk raid1.
This happened after upgrading from Kubuntu 14.04 to 16.04.


I finally found the problem.
Since I needed to reboot after the upgrade I decided to add some disks,
and in order to do that I needed to move around some of the other disks.
And the disk(6TB) for this btrfs raid1 happened to land on a HBA that 
doesn't support disk lager than 2 TB.


Moved them to the motherboards SATA connection and they mounted like 
nothing had happened.


--
Kai Stian Olstad
PS: I really need to replace those HBAs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM: Better, but still there on

2016-12-17 Thread Nils Holland
On Sat, Dec 17, 2016 at 11:44:45PM +0900, Tetsuo Handa wrote:
> On 2016/12/17 21:59, Nils Holland wrote:
> > On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote:
> >> mount -t tracefs none /debug/trace
> >> echo 1 > /debug/trace/events/vmscan/enable
> >> cat /debug/trace/trace_pipe > trace.log
> >>
> >> should help
> >> [...]
> > 
> > No problem! I enabled writing the trace data to a file and then tried
> > to trigger another OOM situation. That worked, this time without a
> > complete kernel panic, but with only my processes being killed and the
> > system becoming unresponsive.
> 
> Under OOM situation, writing to a file on disk unlikely works. Maybe
> logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port"
> if your are using bash) works better. (I wish we can do it from kernel
> so that /bin/cat is not disturbed by delays due to page fault.)
> 
> If you can configure netconsole for logging OOM killer messages and
> UDP socket for logging trace_pipe messages, udplogger at
> https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/
> might fit for logging both output with timestamp into a single file.

Actually, I decided to give this a try once more on machine #2, i.e.
not the one that produced the previous trace, but the other one.

I logged via netconsole as well as 'cat /debug/trace/trace_pipe' via
the network to another machine running udplogger. After the machine
had been frehsly booted and I had set up the logging, unpacking of the
firefox source tarball started. After it had been unpacking for a
while, the first load of trace messages started to appear. Some time
later, OOMs started to appear - I've got quite a lot of them in my
capture file this time.

Unfortunately, the reclaim trace messages stopped a while after the first
OOM messages show up - most likely my "cat" had been killed at that
point or became unresponsive. :-/

In the end, the machine didn't completely panic, but after nothing new
showed up being logged via the network, I walked up to the
machine and found it in a state where I couldn't really log in to it
anymore, but all that worked was, as always, a magic SysRequest reboot.

The complete log, from machine boot right up to the point where it
wouldn't really do anything anymore, is up again on my web server (~42
MB, 928 KB packed):

http://ftp.tisys.org/pub/misc/teela_2016-12-17.log.xz

Greetings
Nils
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help please: BTRFS fs crashed due to bad removal of USB drive, no help from recovery procedures

2016-12-17 Thread Xin Zhou


Hi Jari,
 
Similar with other file system, btrfs has copies of super blocks.
Try to run "man btrfs check", "man btrfs rescue" and related commands for more 
details.
Regards,
Xin
 
 

Sent: Saturday, December 17, 2016 at 2:06 AM
From: "Jari Seppälä" 
To: linux-btrfs@vger.kernel.org
Subject: Help please: BTRFS fs crashed due to bad removal of USB drive, no help 
from recovery procedures
Syslog tells:
[ 135.446222] BTRFS error (device sdb1): system chunk array too small 0 < 97
[ 135.446260] BTRFS error (device sdb1): superblock contains fatal errors
[ 135.462544] BTRFS error (device sdb1): open_ctree failed

What have been done:
* All "btrfs rescue" options

Info on system
* fs on external SSD via USB
* kernel 4.9.0 (tried with 4.8.13)
* btrfs-tools 4.4
* Mythbuntu (Ubuntu) 16.04.1 LTS with latest fixes 2012-12-16

Any help appreciated. Around 300G of TV recordings on the drive, which of 
course will eventually come as replays.

Jari
--
*** Jari Seppälä

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM: Better, but still there on

2016-12-17 Thread Nils Holland
On Sat, Dec 17, 2016 at 11:44:45PM +0900, Tetsuo Handa wrote:
> On 2016/12/17 21:59, Nils Holland wrote:
> > On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote:
> >> mount -t tracefs none /debug/trace
> >> echo 1 > /debug/trace/events/vmscan/enable
> >> cat /debug/trace/trace_pipe > trace.log
> >>
> >> should help
> >> [...]
> > 
> > No problem! I enabled writing the trace data to a file and then tried
> > to trigger another OOM situation. That worked, this time without a
> > complete kernel panic, but with only my processes being killed and the
> > system becoming unresponsive.
> > [...]
> 
> Under OOM situation, writing to a file on disk unlikely works. Maybe
> logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port"
> if your are using bash) works better. (I wish we can do it from kernel
> so that /bin/cat is not disturbed by delays due to page fault.)
> 
> If you can configure netconsole for logging OOM killer messages and
> UDP socket for logging trace_pipe messages, udplogger at
> https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/
> might fit for logging both output with timestamp into a single file.

Thanks for the hint, sounds very sane! I'll try to go that route for
the next log / trace I produce. Of course, if Michal says that the
trace file I've already posted, and which has been logged to file, is
useless and would have been better if I had instead logged to a
different machine via the network, I could also repeat the current
experiment and produce a new file at any time. :-)

Greetings
Nils
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM: Better, but still there on

2016-12-17 Thread Tetsuo Handa
On 2016/12/17 21:59, Nils Holland wrote:
> On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote:
>> mount -t tracefs none /debug/trace
>> echo 1 > /debug/trace/events/vmscan/enable
>> cat /debug/trace/trace_pipe > trace.log
>>
>> should help
>> [...]
> 
> No problem! I enabled writing the trace data to a file and then tried
> to trigger another OOM situation. That worked, this time without a
> complete kernel panic, but with only my processes being killed and the
> system becoming unresponsive. When that happened, I let it run for
> another minute or two so that in case it was still logging something
> to the trace file, it could continue to do so some time longer. Then I
> rebooted with the only thing that still worked, i.e. by means of magic
> SysRequest.

Under OOM situation, writing to a file on disk unlikely works. Maybe
logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port"
if your are using bash) works better. (I wish we can do it from kernel
so that /bin/cat is not disturbed by delays due to page fault.)

If you can configure netconsole for logging OOM killer messages and
UDP socket for logging trace_pipe messages, udplogger at
https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/
might fit for logging both output with timestamp into a single file.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM: Better, but still there on

2016-12-17 Thread Nils Holland
On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote:
> On Fri 16-12-16 19:47:00, Nils Holland wrote:
> > 
> > Dec 16 18:56:24 boerne.fritz.box kernel: Purging GPU memory, 37 pages 
> > freed, 10219 pages still pinned.
> > Dec 16 18:56:29 boerne.fritz.box kernel: kthreadd invoked oom-killer: 
> > gfp_mask=0x27080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), 
> > nodemask=0, order=1, oom_score_adj=0
> > Dec 16 18:56:29 boerne.fritz.box kernel: kthreadd cpuset=/ mems_allowed=0
> [...]
> > Dec 16 18:56:29 boerne.fritz.box kernel: Normal free:41008kB min:41100kB 
> > low:51372kB high:61644kB active_anon:0kB inactive_anon:0kB 
> > active_file:470556kB inactive_file:148kB unevictable:0kB 
> > writepending:1616kB present:897016kB managed:831480kB mlocked:0kB 
> > slab_reclaimable:213172kB slab_unreclaimable:86236kB kernel_stack:1864kB 
> > pagetables:3572kB bounce:0kB free_pcp:532kB local_pcp:456kB free_cma:0kB
> 
> this is a GFP_KERNEL allocation so it cannot use the highmem zone again.
> There is no anonymous memory in this zone but the allocation
> context implies the full reclaim context so the file LRU should be
> reclaimable. For some reason ~470MB of the active file LRU is still
> there. This is quite unexpected. It is harder to tell more without
> further data. It would be great if you could enable reclaim related
> tracepoints:
> 
> mount -t tracefs none /debug/trace
> echo 1 > /debug/trace/events/vmscan/enable
> cat /debug/trace/trace_pipe > trace.log
> 
> should help
> [...]

No problem! I enabled writing the trace data to a file and then tried
to trigger another OOM situation. That worked, this time without a
complete kernel panic, but with only my processes being killed and the
system becoming unresponsive. When that happened, I let it run for
another minute or two so that in case it was still logging something
to the trace file, it could continue to do so some time longer. Then I
rebooted with the only thing that still worked, i.e. by means of magic
SysRequest.

The trace file has actually become rather big (around 21 MB). I didn't
dare to cut anything from it because I didn't want to risk deleting
something that might turn out important. So, due to the size, I'm not
attaching the trace file to this message, but it's up compressed
(about 536 KB) to be grabbed at:

http://ftp.tisys.org/pub/misc/trace.log.xz

For reference, here's the OOM report that goes along with this
incident and the trace file:

Dec 17 13:31:06 boerne.fritz.box kernel: Purging GPU memory, 145 pages freed, 
10287 pages still pinned.
Dec 17 13:31:07 boerne.fritz.box kernel: awesome invoked oom-killer: 
gfp_mask=0x25000c0(GFP_KERNEL_ACCOUNT), nodemask=0, order=0, oom_score_adj=0
Dec 17 13:31:07 boerne.fritz.box kernel: awesome cpuset=/ mems_allowed=0
Dec 17 13:31:07 boerne.fritz.box kernel: CPU: 1 PID: 5599 Comm: awesome Not 
tainted 4.9.0-gentoo #3
Dec 17 13:31:07 boerne.fritz.box kernel: Hardware name: TOSHIBA Satellite 
L500/KSWAA, BIOS V1.80 10/28/2009
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37c18
Dec 17 13:31:07 boerne.fritz.box kernel:  c1433406
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37d48
Dec 17 13:31:07 boerne.fritz.box kernel:  c5319280
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37c48
Dec 17 13:31:07 boerne.fritz.box kernel:  c1170011
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37c9c
Dec 17 13:31:07 boerne.fritz.box kernel:  00200286
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37c48
Dec 17 13:31:07 boerne.fritz.box kernel:  c1438fff
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37c4c
Dec 17 13:31:07 boerne.fritz.box kernel:  c72479c0
Dec 17 13:31:07 boerne.fritz.box kernel:  c60dd200
Dec 17 13:31:07 boerne.fritz.box kernel:  c5319280
Dec 17 13:31:07 boerne.fritz.box kernel:  c1ad1899
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37d48
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37c8c
Dec 17 13:31:07 boerne.fritz.box kernel:  c1114407
Dec 17 13:31:07 boerne.fritz.box kernel:  c10513a5
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37c78
Dec 17 13:31:07 boerne.fritz.box kernel:  c11140a1
Dec 17 13:31:07 boerne.fritz.box kernel:  0005
Dec 17 13:31:07 boerne.fritz.box kernel:  
Dec 17 13:31:07 boerne.fritz.box kernel:  
Dec 17 13:31:07 boerne.fritz.box kernel: Call Trace:
Dec 17 13:31:07 boerne.fritz.box kernel:  [] dump_stack+0x47/0x61
Dec 17 13:31:07 boerne.fritz.box kernel:  [] dump_header+0x5f/0x175
Dec 17 13:31:07 boerne.fritz.box kernel:  [] ? ___ratelimit+0x7f/0xe0
Dec 17 13:31:07 boerne.fritz.box kernel:  [] 
oom_kill_process+0x207/0x3c0
Dec 17 13:31:07 boerne.fritz.box kernel:  [] ? 
has_capability_noaudit+0x15/0x20
Dec 17 13:31:07 boerne.fritz.box kernel:  [] ? 
oom_badness.part.13+0xb1/0x120
Dec 17 13:31:07 boerne.fritz.box kernel:  [] out_of_memory+0xd4/0x270
Dec 17 13:31:07 boerne.fritz.box kernel:  [] 
__alloc_pages_nodemask+0xcf5/0xd60
Dec 17 13:31:07 boerne.fritz.box kernel:  [] ? 
skb_queue_purge+0x30/0x30
Dec 17 13:31:07 boerne.fritz.box kernel:  [] 
alloc_skb_with_fr

Re: [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically

2016-12-17 Thread Tetsuo Handa
Michal Hocko wrote:
> On Fri 16-12-16 12:31:51, Johannes Weiner wrote:
>>> @@ -3737,6 +3752,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
>>> order,
>>>  */
>>> WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);
>>>  
>>> +   /*
>>> +* Help non-failing allocations by giving them access to memory
>>> +* reserves but do not use ALLOC_NO_WATERMARKS because this
>>> +* could deplete whole memory reserves which would just make
>>> +* the situation worse
>>> +*/
>>> +   page = __alloc_pages_cpuset_fallback(gfp_mask, order, 
>>> ALLOC_HARDER, ac);
>>> +   if (page)
>>> +   goto got_pg;
>>> +
>>
>> But this should be a separate patch, IMO.
>>
>> Do we observe GFP_NOFS lockups when we don't do this? 
> 
> this is hard to tell but considering users like grow_dev_page we can get
> stuck with a very slow progress I believe. Those allocations could see
> some help.
> 
>> Don't we risk
>> premature exhaustion of the memory reserves, and it's better to wait
>> for other reclaimers to make some progress instead?
> 
> waiting for other reclaimers would be preferable but we should at least
> give these some priority, which is what ALLOC_HARDER should help with.
> 
>> Should we give
>> reserve access to all GFP_NOFS allocations, or just the ones from a
>> reclaim/cleaning context?
> 
> I would focus only for those which are important enough. Which are those
> is a harder question. But certainly those with GFP_NOFAIL are important
> enough.
> 
>> All that should go into the changelog of a separate allocation booster
>> patch, I think.
> 
> The reason I did both in the same patch is to address the concern about
> potential lockups when NOFS|NOFAIL cannot make any progress. I've chosen
> ALLOC_HARDER to give the minimum portion of the reserves so that we do
> not risk other high priority users to be blocked out but still help a
> bit at least and prevent from starvation when other reclaimers are
> faster to consume the reclaimed memory.
> 
> I can extend the changelog of course but I believe that having both
> changes together makes some sense. NOFS|NOFAIL allocations are not all
> that rare and sometimes we really depend on them making a further
> progress.
> 

I feel that allowing access to memory reserves based on __GFP_NOFAIL might not
make sense. My understanding is that actual I/O operation triggered by I/O
requests by filesystem code are processed by other threads. Even if we grant
access to memory reserves to GFP_NOFS | __GFP_NOFAIL allocations by fs code,
I think that it is possible that memory allocations by underlying bio code
fails to make a further progress unless memory reserves are granted as well.

Below is a typical trace which I observe under OOM lockuped situation (though
this trace is from an OOM stress test using XFS).


[ 1845.187246] MemAlloc: kworker/2:1(14498) flags=0x4208060 switches=323636 
seq=48 gfp=0x240(GFP_NOIO) order=0 delay=430400 uninterruptible
[ 1845.187248] kworker/2:1 D12712 14498  2 0x0080
[ 1845.187251] Workqueue: events_freezable_power_ disk_events_workfn
[ 1845.187252] Call Trace:
[ 1845.187253]  ? __schedule+0x23f/0xba0
[ 1845.187254]  schedule+0x38/0x90
[ 1845.187255]  schedule_timeout+0x205/0x4a0
[ 1845.187256]  ? del_timer_sync+0xd0/0xd0
[ 1845.187257]  schedule_timeout_uninterruptible+0x25/0x30
[ 1845.187258]  __alloc_pages_nodemask+0x1035/0x10e0
[ 1845.187259]  ? alloc_request_struct+0x14/0x20
[ 1845.187261]  alloc_pages_current+0x96/0x1b0
[ 1845.187262]  ? bio_alloc_bioset+0x20f/0x2e0
[ 1845.187264]  bio_copy_kern+0xc4/0x180
[ 1845.187265]  blk_rq_map_kern+0x6f/0x120
[ 1845.187268]  __scsi_execute.isra.23+0x12f/0x160
[ 1845.187270]  scsi_execute_req_flags+0x8f/0x100
[ 1845.187271]  sr_check_events+0xba/0x2b0 [sr_mod]
[ 1845.187274]  cdrom_check_events+0x13/0x30 [cdrom]
[ 1845.187275]  sr_block_check_events+0x25/0x30 [sr_mod]
[ 1845.187276]  disk_check_events+0x5b/0x150
[ 1845.187277]  disk_events_workfn+0x17/0x20
[ 1845.187278]  process_one_work+0x1fc/0x750
[ 1845.187279]  ? process_one_work+0x167/0x750
[ 1845.187279]  worker_thread+0x126/0x4a0
[ 1845.187280]  kthread+0x10a/0x140
[ 1845.187281]  ? process_one_work+0x750/0x750
[ 1845.187282]  ? kthread_create_on_node+0x60/0x60
[ 1845.187283]  ret_from_fork+0x2a/0x40


I think that this GFP_NOIO allocation request needs to consume more memory 
reserves
than GFP_NOFS allocation request to make progress. 
Do we want to add __GFP_NOFAIL to this GFP_NOIO allocation request in order to 
allow
access to memory reserves as well as GFP_NOFS | __GFP_NOFAIL allocation request?

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Help please: BTRFS fs crashed due to bad removal of USB drive, no help from recovery procedures

2016-12-17 Thread Jari Seppälä
Syslog tells:
[  135.446222] BTRFS error (device sdb1): system chunk array too small 0 < 97
[  135.446260] BTRFS error (device sdb1): superblock contains fatal errors
[  135.462544] BTRFS error (device sdb1): open_ctree failed

What have been done:
* All "btrfs rescue" options

Info on system
* fs on external SSD via USB
* kernel 4.9.0 (tried with 4.8.13)
* btrfs-tools 4.4
* Mythbuntu (Ubuntu) 16.04.1 LTS with latest fixes 2012-12-16

Any help appreciated. Around 300G of TV recordings on the drive, which of 
course will eventually come as replays.

Jari
--
*** Jari Seppälä

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-check finds file extent holes

2016-12-17 Thread Marc Joliet
On Saturday 17 December 2016 00:18:13 Marc Joliet wrote:
> Is this something that btrfs-check can safely repair, or that is perhaps
> even  harmless?

Never mind, I just found that this has been repairable since btrfs-progs 3.19.

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


Re: btrfs-check finds file extent holes

2016-12-17 Thread Marc Joliet
OK, btrfs-check finished about an hour after I sent this, here's the complete 
output:

# btrfs check /dev/sdd2   
Checking filesystem on /dev/sdd2
UUID: f97b3cda-15e8-418b-bb9b-235391ef2a38
checking extents
checking free space cache
checking fs roots
root 30634 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 30635 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 30636 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 30657 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 30746 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 30747 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 30764 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 30834 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 30835 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 30915 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 30916 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 30942 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 31038 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 31053 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 31366 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 31367 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 31368 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 31385 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 31425 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 31473 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 31499 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 31554 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 31572 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 31606 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 31653 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
root 31680 inode 95066 errors 100, file extent discount
Found file extent holes:
start: 413696, len: 4096
found 904425616176 bytes used err is 1
total csum bytes: 873691128
total tree bytes: 11120295936
total fs tree bytes: 8620965888
total extent tree bytes: 1368756224
btree space waste bytes: 2415249740
file data blocks allocated: 19427350777856
 referenced 1003936649216

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.