Re: OOM: Better, but still there on
Nils Holland wrote: > On Sat, Dec 17, 2016 at 11:44:45PM +0900, Tetsuo Handa wrote: > > On 2016/12/17 21:59, Nils Holland wrote: > > > On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote: > > >> mount -t tracefs none /debug/trace > > >> echo 1 > /debug/trace/events/vmscan/enable > > >> cat /debug/trace/trace_pipe > trace.log > > >> > > >> should help > > >> [...] > > > > > > No problem! I enabled writing the trace data to a file and then tried > > > to trigger another OOM situation. That worked, this time without a > > > complete kernel panic, but with only my processes being killed and the > > > system becoming unresponsive. > > > > Under OOM situation, writing to a file on disk unlikely works. Maybe > > logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port" > > if your are using bash) works better. (I wish we can do it from kernel > > so that /bin/cat is not disturbed by delays due to page fault.) > > > > If you can configure netconsole for logging OOM killer messages and > > UDP socket for logging trace_pipe messages, udplogger at > > https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/ > > might fit for logging both output with timestamp into a single file. > > Actually, I decided to give this a try once more on machine #2, i.e. > not the one that produced the previous trace, but the other one. > > I logged via netconsole as well as 'cat /debug/trace/trace_pipe' via > the network to another machine running udplogger. After the machine > had been frehsly booted and I had set up the logging, unpacking of the > firefox source tarball started. After it had been unpacking for a > while, the first load of trace messages started to appear. Some time > later, OOMs started to appear - I've got quite a lot of them in my > capture file this time. Thank you for capturing. I think it worked well. Let's wait for Michal. The first OOM killer invocation was 2016-12-17 21:36:56 192.168.17.23:6665 [ 1276.828639] Killed process 3894 (xz) total-vm:68640kB, anon-rss:65920kB, file-rss:1696kB, shmem-rss:0kB and the last OOM killer invocation was 2016-12-17 21:39:27 192.168.17.23:6665 [ 1426.800677] Killed process 3070 (screen) total-vm:7440kB, anon-rss:960kB, file-rss:2360kB, shmem-rss:0kB and trace output was sent until 2016-12-17 21:37:07 192.168.17.23:48468 kworker/u4:4-3896 [000] 1287.202958: mm_shrink_slab_start: super_cache_scan+0x0/0x170 f4436ed4: nid: 0 objects to shrink 86 gfp_flags GFP_NOFS|__GFP_NOFAIL pgs_scanned 32 lru_pgs 406078 cache items 412 delta 0 total_scan 86 which (I hope) should be sufficient for analysis. > > Unfortunately, the reclaim trace messages stopped a while after the first > OOM messages show up - most likely my "cat" had been killed at that > point or became unresponsive. :-/ > > In the end, the machine didn't completely panic, but after nothing new > showed up being logged via the network, I walked up to the > machine and found it in a state where I couldn't really log in to it > anymore, but all that worked was, as always, a magic SysRequest reboot. There is a known issue (since Linux 2.6.32) that all memory allocation requests get stuck due to kswapd v.s. shrink_inactive_list() livelock which occurs under almost OOM situation ( http://lkml.kernel.org/r/20160211225929.GU14668@dastard ). If we hit it, even "page allocation stalls for " messages do not show up. Even if we didn't hit it, although agetty and sshd were still alive 2016-12-17 21:39:27 192.168.17.23:6665 [ 1426.800614] [ 2800] 0 2800 1152 494 6 30 0 agetty 2016-12-17 21:39:27 192.168.17.23:6665 [ 1426.800618] [ 2802] 0 2802 1457 1055 6 30 -1000 sshd memory allocation was delaying too much 2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034624] btrfs-transacti: page alloction stalls for 93995ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL) 2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034628] CPU: 1 PID: 1949 Comm: btrfs-transacti Not tainted 4.9.0-gentoo #3 2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034630] Hardware name: Hewlett-Packard Compaq 15 Notebook PC/21F7, BIOS F.22 08/06/2014 2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034638] f162f94c c142bd8e 0001 f162f970 c110ad7e c1b58833 02400840 2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034645] f162f978 f162f980 c1b55814 f162f960 0160 f162fa38 c110b78c 02400840 2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034652] c1b55814 00016f2b 0040 f21d f21d 0001 2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034653] Call Trace: 2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034660] [] dump_stack+0x47/0x69 2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034666] [] warn_alloc+0xce/0xf0 2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034671] [] __alloc_pages_nodemask+0x97c/0xd30 2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034678] [
Re: OOM: Better, but still there on
Hi, The system supposes to have special memory reservation for coredump and other debug info when encountering panic, the size seems configurable. Thanks, Xin Sent: Saturday, December 17, 2016 at 6:44 AM From: "Tetsuo Handa" To: "Nils Holland" , "Michal Hocko" Cc: linux-ker...@vger.kernel.org, linux...@kvack.org, "Chris Mason" , "David Sterba" , linux-btrfs@vger.kernel.org Subject: Re: OOM: Better, but still there on On 2016/12/17 21:59, Nils Holland wrote: > On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote: >> mount -t tracefs none /debug/trace >> echo 1 > /debug/trace/events/vmscan/enable >> cat /debug/trace/trace_pipe > trace.log >> >> should help >> [...] > > No problem! I enabled writing the trace data to a file and then tried > to trigger another OOM situation. That worked, this time without a > complete kernel panic, but with only my processes being killed and the > system becoming unresponsive. When that happened, I let it run for > another minute or two so that in case it was still logging something > to the trace file, it could continue to do so some time longer. Then I > rebooted with the only thing that still worked, i.e. by means of magic > SysRequest. Under OOM situation, writing to a file on disk unlikely works. Maybe logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port" if your are using bash) works better. (I wish we can do it from kernel so that /bin/cat is not disturbed by delays due to page fault.) If you can configure netconsole for logging OOM killer messages and UDP socket for logging trace_pipe messages, udplogger at https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/ might fit for logging both output with timestamp into a single file. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html[http://vger.kernel.org/majordomo-info.html] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mount raid1 gives open_ctree failed
On 25. nov. 2016 21:19, Kai Stian Olstad wrote: I have problem mounting my 3 disk raid1. This happened after upgrading from Kubuntu 14.04 to 16.04. I finally found the problem. Since I needed to reboot after the upgrade I decided to add some disks, and in order to do that I needed to move around some of the other disks. And the disk(6TB) for this btrfs raid1 happened to land on a HBA that doesn't support disk lager than 2 TB. Moved them to the motherboards SATA connection and they mounted like nothing had happened. -- Kai Stian Olstad PS: I really need to replace those HBAs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OOM: Better, but still there on
On Sat, Dec 17, 2016 at 11:44:45PM +0900, Tetsuo Handa wrote: > On 2016/12/17 21:59, Nils Holland wrote: > > On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote: > >> mount -t tracefs none /debug/trace > >> echo 1 > /debug/trace/events/vmscan/enable > >> cat /debug/trace/trace_pipe > trace.log > >> > >> should help > >> [...] > > > > No problem! I enabled writing the trace data to a file and then tried > > to trigger another OOM situation. That worked, this time without a > > complete kernel panic, but with only my processes being killed and the > > system becoming unresponsive. > > Under OOM situation, writing to a file on disk unlikely works. Maybe > logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port" > if your are using bash) works better. (I wish we can do it from kernel > so that /bin/cat is not disturbed by delays due to page fault.) > > If you can configure netconsole for logging OOM killer messages and > UDP socket for logging trace_pipe messages, udplogger at > https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/ > might fit for logging both output with timestamp into a single file. Actually, I decided to give this a try once more on machine #2, i.e. not the one that produced the previous trace, but the other one. I logged via netconsole as well as 'cat /debug/trace/trace_pipe' via the network to another machine running udplogger. After the machine had been frehsly booted and I had set up the logging, unpacking of the firefox source tarball started. After it had been unpacking for a while, the first load of trace messages started to appear. Some time later, OOMs started to appear - I've got quite a lot of them in my capture file this time. Unfortunately, the reclaim trace messages stopped a while after the first OOM messages show up - most likely my "cat" had been killed at that point or became unresponsive. :-/ In the end, the machine didn't completely panic, but after nothing new showed up being logged via the network, I walked up to the machine and found it in a state where I couldn't really log in to it anymore, but all that worked was, as always, a magic SysRequest reboot. The complete log, from machine boot right up to the point where it wouldn't really do anything anymore, is up again on my web server (~42 MB, 928 KB packed): http://ftp.tisys.org/pub/misc/teela_2016-12-17.log.xz Greetings Nils -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help please: BTRFS fs crashed due to bad removal of USB drive, no help from recovery procedures
Hi Jari, Similar with other file system, btrfs has copies of super blocks. Try to run "man btrfs check", "man btrfs rescue" and related commands for more details. Regards, Xin Sent: Saturday, December 17, 2016 at 2:06 AM From: "Jari Seppälä" To: linux-btrfs@vger.kernel.org Subject: Help please: BTRFS fs crashed due to bad removal of USB drive, no help from recovery procedures Syslog tells: [ 135.446222] BTRFS error (device sdb1): system chunk array too small 0 < 97 [ 135.446260] BTRFS error (device sdb1): superblock contains fatal errors [ 135.462544] BTRFS error (device sdb1): open_ctree failed What have been done: * All "btrfs rescue" options Info on system * fs on external SSD via USB * kernel 4.9.0 (tried with 4.8.13) * btrfs-tools 4.4 * Mythbuntu (Ubuntu) 16.04.1 LTS with latest fixes 2012-12-16 Any help appreciated. Around 300G of TV recordings on the drive, which of course will eventually come as replays. Jari -- *** Jari Seppälä -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OOM: Better, but still there on
On Sat, Dec 17, 2016 at 11:44:45PM +0900, Tetsuo Handa wrote: > On 2016/12/17 21:59, Nils Holland wrote: > > On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote: > >> mount -t tracefs none /debug/trace > >> echo 1 > /debug/trace/events/vmscan/enable > >> cat /debug/trace/trace_pipe > trace.log > >> > >> should help > >> [...] > > > > No problem! I enabled writing the trace data to a file and then tried > > to trigger another OOM situation. That worked, this time without a > > complete kernel panic, but with only my processes being killed and the > > system becoming unresponsive. > > [...] > > Under OOM situation, writing to a file on disk unlikely works. Maybe > logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port" > if your are using bash) works better. (I wish we can do it from kernel > so that /bin/cat is not disturbed by delays due to page fault.) > > If you can configure netconsole for logging OOM killer messages and > UDP socket for logging trace_pipe messages, udplogger at > https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/ > might fit for logging both output with timestamp into a single file. Thanks for the hint, sounds very sane! I'll try to go that route for the next log / trace I produce. Of course, if Michal says that the trace file I've already posted, and which has been logged to file, is useless and would have been better if I had instead logged to a different machine via the network, I could also repeat the current experiment and produce a new file at any time. :-) Greetings Nils -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OOM: Better, but still there on
On 2016/12/17 21:59, Nils Holland wrote: > On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote: >> mount -t tracefs none /debug/trace >> echo 1 > /debug/trace/events/vmscan/enable >> cat /debug/trace/trace_pipe > trace.log >> >> should help >> [...] > > No problem! I enabled writing the trace data to a file and then tried > to trigger another OOM situation. That worked, this time without a > complete kernel panic, but with only my processes being killed and the > system becoming unresponsive. When that happened, I let it run for > another minute or two so that in case it was still logging something > to the trace file, it could continue to do so some time longer. Then I > rebooted with the only thing that still worked, i.e. by means of magic > SysRequest. Under OOM situation, writing to a file on disk unlikely works. Maybe logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port" if your are using bash) works better. (I wish we can do it from kernel so that /bin/cat is not disturbed by delays due to page fault.) If you can configure netconsole for logging OOM killer messages and UDP socket for logging trace_pipe messages, udplogger at https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/ might fit for logging both output with timestamp into a single file. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OOM: Better, but still there on
On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote: > On Fri 16-12-16 19:47:00, Nils Holland wrote: > > > > Dec 16 18:56:24 boerne.fritz.box kernel: Purging GPU memory, 37 pages > > freed, 10219 pages still pinned. > > Dec 16 18:56:29 boerne.fritz.box kernel: kthreadd invoked oom-killer: > > gfp_mask=0x27080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), > > nodemask=0, order=1, oom_score_adj=0 > > Dec 16 18:56:29 boerne.fritz.box kernel: kthreadd cpuset=/ mems_allowed=0 > [...] > > Dec 16 18:56:29 boerne.fritz.box kernel: Normal free:41008kB min:41100kB > > low:51372kB high:61644kB active_anon:0kB inactive_anon:0kB > > active_file:470556kB inactive_file:148kB unevictable:0kB > > writepending:1616kB present:897016kB managed:831480kB mlocked:0kB > > slab_reclaimable:213172kB slab_unreclaimable:86236kB kernel_stack:1864kB > > pagetables:3572kB bounce:0kB free_pcp:532kB local_pcp:456kB free_cma:0kB > > this is a GFP_KERNEL allocation so it cannot use the highmem zone again. > There is no anonymous memory in this zone but the allocation > context implies the full reclaim context so the file LRU should be > reclaimable. For some reason ~470MB of the active file LRU is still > there. This is quite unexpected. It is harder to tell more without > further data. It would be great if you could enable reclaim related > tracepoints: > > mount -t tracefs none /debug/trace > echo 1 > /debug/trace/events/vmscan/enable > cat /debug/trace/trace_pipe > trace.log > > should help > [...] No problem! I enabled writing the trace data to a file and then tried to trigger another OOM situation. That worked, this time without a complete kernel panic, but with only my processes being killed and the system becoming unresponsive. When that happened, I let it run for another minute or two so that in case it was still logging something to the trace file, it could continue to do so some time longer. Then I rebooted with the only thing that still worked, i.e. by means of magic SysRequest. The trace file has actually become rather big (around 21 MB). I didn't dare to cut anything from it because I didn't want to risk deleting something that might turn out important. So, due to the size, I'm not attaching the trace file to this message, but it's up compressed (about 536 KB) to be grabbed at: http://ftp.tisys.org/pub/misc/trace.log.xz For reference, here's the OOM report that goes along with this incident and the trace file: Dec 17 13:31:06 boerne.fritz.box kernel: Purging GPU memory, 145 pages freed, 10287 pages still pinned. Dec 17 13:31:07 boerne.fritz.box kernel: awesome invoked oom-killer: gfp_mask=0x25000c0(GFP_KERNEL_ACCOUNT), nodemask=0, order=0, oom_score_adj=0 Dec 17 13:31:07 boerne.fritz.box kernel: awesome cpuset=/ mems_allowed=0 Dec 17 13:31:07 boerne.fritz.box kernel: CPU: 1 PID: 5599 Comm: awesome Not tainted 4.9.0-gentoo #3 Dec 17 13:31:07 boerne.fritz.box kernel: Hardware name: TOSHIBA Satellite L500/KSWAA, BIOS V1.80 10/28/2009 Dec 17 13:31:07 boerne.fritz.box kernel: c5a37c18 Dec 17 13:31:07 boerne.fritz.box kernel: c1433406 Dec 17 13:31:07 boerne.fritz.box kernel: c5a37d48 Dec 17 13:31:07 boerne.fritz.box kernel: c5319280 Dec 17 13:31:07 boerne.fritz.box kernel: c5a37c48 Dec 17 13:31:07 boerne.fritz.box kernel: c1170011 Dec 17 13:31:07 boerne.fritz.box kernel: c5a37c9c Dec 17 13:31:07 boerne.fritz.box kernel: 00200286 Dec 17 13:31:07 boerne.fritz.box kernel: c5a37c48 Dec 17 13:31:07 boerne.fritz.box kernel: c1438fff Dec 17 13:31:07 boerne.fritz.box kernel: c5a37c4c Dec 17 13:31:07 boerne.fritz.box kernel: c72479c0 Dec 17 13:31:07 boerne.fritz.box kernel: c60dd200 Dec 17 13:31:07 boerne.fritz.box kernel: c5319280 Dec 17 13:31:07 boerne.fritz.box kernel: c1ad1899 Dec 17 13:31:07 boerne.fritz.box kernel: c5a37d48 Dec 17 13:31:07 boerne.fritz.box kernel: c5a37c8c Dec 17 13:31:07 boerne.fritz.box kernel: c1114407 Dec 17 13:31:07 boerne.fritz.box kernel: c10513a5 Dec 17 13:31:07 boerne.fritz.box kernel: c5a37c78 Dec 17 13:31:07 boerne.fritz.box kernel: c11140a1 Dec 17 13:31:07 boerne.fritz.box kernel: 0005 Dec 17 13:31:07 boerne.fritz.box kernel: Dec 17 13:31:07 boerne.fritz.box kernel: Dec 17 13:31:07 boerne.fritz.box kernel: Call Trace: Dec 17 13:31:07 boerne.fritz.box kernel: [] dump_stack+0x47/0x61 Dec 17 13:31:07 boerne.fritz.box kernel: [] dump_header+0x5f/0x175 Dec 17 13:31:07 boerne.fritz.box kernel: [] ? ___ratelimit+0x7f/0xe0 Dec 17 13:31:07 boerne.fritz.box kernel: [] oom_kill_process+0x207/0x3c0 Dec 17 13:31:07 boerne.fritz.box kernel: [] ? has_capability_noaudit+0x15/0x20 Dec 17 13:31:07 boerne.fritz.box kernel: [] ? oom_badness.part.13+0xb1/0x120 Dec 17 13:31:07 boerne.fritz.box kernel: [] out_of_memory+0xd4/0x270 Dec 17 13:31:07 boerne.fritz.box kernel: [] __alloc_pages_nodemask+0xcf5/0xd60 Dec 17 13:31:07 boerne.fritz.box kernel: [] ? skb_queue_purge+0x30/0x30 Dec 17 13:31:07 boerne.fritz.box kernel: [] alloc_skb_with_fr
Re: [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically
Michal Hocko wrote: > On Fri 16-12-16 12:31:51, Johannes Weiner wrote: >>> @@ -3737,6 +3752,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int >>> order, >>> */ >>> WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER); >>> >>> + /* >>> +* Help non-failing allocations by giving them access to memory >>> +* reserves but do not use ALLOC_NO_WATERMARKS because this >>> +* could deplete whole memory reserves which would just make >>> +* the situation worse >>> +*/ >>> + page = __alloc_pages_cpuset_fallback(gfp_mask, order, >>> ALLOC_HARDER, ac); >>> + if (page) >>> + goto got_pg; >>> + >> >> But this should be a separate patch, IMO. >> >> Do we observe GFP_NOFS lockups when we don't do this? > > this is hard to tell but considering users like grow_dev_page we can get > stuck with a very slow progress I believe. Those allocations could see > some help. > >> Don't we risk >> premature exhaustion of the memory reserves, and it's better to wait >> for other reclaimers to make some progress instead? > > waiting for other reclaimers would be preferable but we should at least > give these some priority, which is what ALLOC_HARDER should help with. > >> Should we give >> reserve access to all GFP_NOFS allocations, or just the ones from a >> reclaim/cleaning context? > > I would focus only for those which are important enough. Which are those > is a harder question. But certainly those with GFP_NOFAIL are important > enough. > >> All that should go into the changelog of a separate allocation booster >> patch, I think. > > The reason I did both in the same patch is to address the concern about > potential lockups when NOFS|NOFAIL cannot make any progress. I've chosen > ALLOC_HARDER to give the minimum portion of the reserves so that we do > not risk other high priority users to be blocked out but still help a > bit at least and prevent from starvation when other reclaimers are > faster to consume the reclaimed memory. > > I can extend the changelog of course but I believe that having both > changes together makes some sense. NOFS|NOFAIL allocations are not all > that rare and sometimes we really depend on them making a further > progress. > I feel that allowing access to memory reserves based on __GFP_NOFAIL might not make sense. My understanding is that actual I/O operation triggered by I/O requests by filesystem code are processed by other threads. Even if we grant access to memory reserves to GFP_NOFS | __GFP_NOFAIL allocations by fs code, I think that it is possible that memory allocations by underlying bio code fails to make a further progress unless memory reserves are granted as well. Below is a typical trace which I observe under OOM lockuped situation (though this trace is from an OOM stress test using XFS). [ 1845.187246] MemAlloc: kworker/2:1(14498) flags=0x4208060 switches=323636 seq=48 gfp=0x240(GFP_NOIO) order=0 delay=430400 uninterruptible [ 1845.187248] kworker/2:1 D12712 14498 2 0x0080 [ 1845.187251] Workqueue: events_freezable_power_ disk_events_workfn [ 1845.187252] Call Trace: [ 1845.187253] ? __schedule+0x23f/0xba0 [ 1845.187254] schedule+0x38/0x90 [ 1845.187255] schedule_timeout+0x205/0x4a0 [ 1845.187256] ? del_timer_sync+0xd0/0xd0 [ 1845.187257] schedule_timeout_uninterruptible+0x25/0x30 [ 1845.187258] __alloc_pages_nodemask+0x1035/0x10e0 [ 1845.187259] ? alloc_request_struct+0x14/0x20 [ 1845.187261] alloc_pages_current+0x96/0x1b0 [ 1845.187262] ? bio_alloc_bioset+0x20f/0x2e0 [ 1845.187264] bio_copy_kern+0xc4/0x180 [ 1845.187265] blk_rq_map_kern+0x6f/0x120 [ 1845.187268] __scsi_execute.isra.23+0x12f/0x160 [ 1845.187270] scsi_execute_req_flags+0x8f/0x100 [ 1845.187271] sr_check_events+0xba/0x2b0 [sr_mod] [ 1845.187274] cdrom_check_events+0x13/0x30 [cdrom] [ 1845.187275] sr_block_check_events+0x25/0x30 [sr_mod] [ 1845.187276] disk_check_events+0x5b/0x150 [ 1845.187277] disk_events_workfn+0x17/0x20 [ 1845.187278] process_one_work+0x1fc/0x750 [ 1845.187279] ? process_one_work+0x167/0x750 [ 1845.187279] worker_thread+0x126/0x4a0 [ 1845.187280] kthread+0x10a/0x140 [ 1845.187281] ? process_one_work+0x750/0x750 [ 1845.187282] ? kthread_create_on_node+0x60/0x60 [ 1845.187283] ret_from_fork+0x2a/0x40 I think that this GFP_NOIO allocation request needs to consume more memory reserves than GFP_NOFS allocation request to make progress. Do we want to add __GFP_NOFAIL to this GFP_NOIO allocation request in order to allow access to memory reserves as well as GFP_NOFS | __GFP_NOFAIL allocation request? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Help please: BTRFS fs crashed due to bad removal of USB drive, no help from recovery procedures
Syslog tells: [ 135.446222] BTRFS error (device sdb1): system chunk array too small 0 < 97 [ 135.446260] BTRFS error (device sdb1): superblock contains fatal errors [ 135.462544] BTRFS error (device sdb1): open_ctree failed What have been done: * All "btrfs rescue" options Info on system * fs on external SSD via USB * kernel 4.9.0 (tried with 4.8.13) * btrfs-tools 4.4 * Mythbuntu (Ubuntu) 16.04.1 LTS with latest fixes 2012-12-16 Any help appreciated. Around 300G of TV recordings on the drive, which of course will eventually come as replays. Jari -- *** Jari Seppälä -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-check finds file extent holes
On Saturday 17 December 2016 00:18:13 Marc Joliet wrote: > Is this something that btrfs-check can safely repair, or that is perhaps > even harmless? Never mind, I just found that this has been repairable since btrfs-progs 3.19. Greetings -- Marc Joliet -- "People who think they know everything really annoy those of us who know we don't" - Bjarne Stroustrup signature.asc Description: This is a digitally signed message part.
Re: btrfs-check finds file extent holes
OK, btrfs-check finished about an hour after I sent this, here's the complete output: # btrfs check /dev/sdd2 Checking filesystem on /dev/sdd2 UUID: f97b3cda-15e8-418b-bb9b-235391ef2a38 checking extents checking free space cache checking fs roots root 30634 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 30635 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 30636 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 30657 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 30746 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 30747 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 30764 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 30834 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 30835 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 30915 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 30916 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 30942 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 31038 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 31053 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 31366 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 31367 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 31368 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 31385 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 31425 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 31473 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 31499 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 31554 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 31572 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 31606 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 31653 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 root 31680 inode 95066 errors 100, file extent discount Found file extent holes: start: 413696, len: 4096 found 904425616176 bytes used err is 1 total csum bytes: 873691128 total tree bytes: 11120295936 total fs tree bytes: 8620965888 total extent tree bytes: 1368756224 btree space waste bytes: 2415249740 file data blocks allocated: 19427350777856 referenced 1003936649216 Greetings -- Marc Joliet -- "People who think they know everything really annoy those of us who know we don't" - Bjarne Stroustrup signature.asc Description: This is a digitally signed message part.