Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 26.07.2018 21:50, Andrew Morton wrote: > On Thu, 26 Jul 2018 10:45:54 +0200 David Hildenbrand wrote: > >>> Does each user of PG_balloon check for PG_reserved? If this is the case >>> then yes this would be OK. >>> >> >> I can only spot one user of PageBalloon() at all (fs/proc/page.c) , >> which makes me wonder if this bit is actually still relevant. I think >> the last "real" user was removed with >> >> commit b1123ea6d3b3da25af5c8a9d843bd07ab63213f4 >> Author: Minchan Kim >> Date: Tue Jul 26 15:23:09 2016 -0700 >> >> mm: balloon: use general non-lru movable page feature >> >> Now, VM has a feature to migrate non-lru movable pages so balloon >> doesn't need custom migration hooks in migrate.c and compaction.c. >> >> >> The only user of PG_balloon in general is >> "include/linux/balloon_compaction.h", used effectively only by >> virtio_balloon. >> >> All such pages are allocated via balloon_page_alloc() and never set >> reserved. >> >> So to me it looks like PG_balloon could be easily reused, especially to >> also exclude virtio-balloon pages from dumps. > > Agree. Maintaining a thingy for page-types.c which hardly anyone uses > (surely) isn't sufficient justification for consuming a page flag. We > should check with the virtio developers first, but this does seem to be > begging to be reclaimed. Okay, I'll be looking into reusing this flag to mark pages as fake/logical offline (e.g. "PG_offline"), so it can be used by - memory onlining/offlining code ("page is offline" e.g. PG_reserved && PG_offline) - balloon drivers ("page is logically offline" e.g. !PG_reserved && PG_offline) In dump tools, we can then skip reading these pages ("page not used by the system, might contain stale data or might not even be accessible"). Can you drop these two patches for now? I'll try to rework patch nr 1 to more closely match what PG_reserved actually means. Patch nr 2 might no longer be necessary if we agree on something like PG_offline. -- Thanks, David / dhildenb
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On Thu, 26 Jul 2018 10:45:54 +0200 David Hildenbrand wrote: > > Does each user of PG_balloon check for PG_reserved? If this is the case > > then yes this would be OK. > > > > I can only spot one user of PageBalloon() at all (fs/proc/page.c) , > which makes me wonder if this bit is actually still relevant. I think > the last "real" user was removed with > > commit b1123ea6d3b3da25af5c8a9d843bd07ab63213f4 > Author: Minchan Kim > Date: Tue Jul 26 15:23:09 2016 -0700 > > mm: balloon: use general non-lru movable page feature > > Now, VM has a feature to migrate non-lru movable pages so balloon > doesn't need custom migration hooks in migrate.c and compaction.c. > > > The only user of PG_balloon in general is > "include/linux/balloon_compaction.h", used effectively only by > virtio_balloon. > > All such pages are allocated via balloon_page_alloc() and never set > reserved. > > So to me it looks like PG_balloon could be easily reused, especially to > also exclude virtio-balloon pages from dumps. Agree. Maintaining a thingy for page-types.c which hardly anyone uses (surely) isn't sufficient justification for consuming a page flag. We should check with the virtio developers first, but this does seem to be begging to be reclaimed.
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 26.07.2018 10:30, Michal Hocko wrote: > On Thu 26-07-18 10:22:41, David Hildenbrand wrote: >> On 24.07.2018 09:22, Michal Hocko wrote: >>> On Mon 23-07-18 19:12:58, David Hildenbrand wrote: On 23.07.2018 13:45, Vlastimil Babka wrote: > On 07/20/2018 02:34 PM, David Hildenbrand wrote: >> Dumping tools (like makedumpfile) right now don't exclude reserved pages. >> So reserved pages might be access by dump tools although nobody except >> the owner should touch them. > > Are you sure about that? Or maybe I understand wrong. Maybe it changed > recently, but IIRC pages that are backing memmap (struct pages) are also > PG_reserved. And you definitely do want those in the dump. I proposed a new flag/value to mask pages that are logically offline but Michal wanted me to go into this direction. While we can special case struct pages in dump tools ("we have to read/interpret them either way, so we can also dump them"), it smells like my original attempt was cleaner. Michal? >>> >>> But we do not have many page flags spare and even if we have one or two >>> this doesn't look like the use for them. So I still think we should try >>> the PageReserved way. >>> >> >> So as a summary, the only real approach that would be acceptable is >> using PageReserved + some other identifier to mark pages as "logically >> offline". >> >> I wonder what identifier could be used, as this has to be consistent for >> all reserved pages (to avoid false positives). >> >> Using other pageflags in combination might be possible, but then we have >> to make assumptions about all users of PageReserved right now. >> >> As far as I can see (and as has been discussed), page_type could be >> used. If we don't want to consume a new bit, we could overload/reuse the >> "PG_balloon" bit. >> >> >> E.g. "PG_balloon" set -> exclude page from dump > > Does each user of PG_balloon check for PG_reserved? If this is the case > then yes this would be OK. > I can only spot one user of PageBalloon() at all (fs/proc/page.c) , which makes me wonder if this bit is actually still relevant. I think the last "real" user was removed with commit b1123ea6d3b3da25af5c8a9d843bd07ab63213f4 Author: Minchan Kim Date: Tue Jul 26 15:23:09 2016 -0700 mm: balloon: use general non-lru movable page feature Now, VM has a feature to migrate non-lru movable pages so balloon doesn't need custom migration hooks in migrate.c and compaction.c. The only user of PG_balloon in general is "include/linux/balloon_compaction.h", used effectively only by virtio_balloon. All such pages are allocated via balloon_page_alloc() and never set reserved. So to me it looks like PG_balloon could be easily reused, especially to also exclude virtio-balloon pages from dumps. -- Thanks, David / dhildenb
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 26.07.2018 10:27, Michal Hocko wrote: > On Wed 25-07-18 16:20:41, David Hildenbrand wrote: >> On 25.07.2018 15:51, Michal Hocko wrote: >>> On Tue 24-07-18 16:13:09, David Hildenbrand wrote: >>> [...] So I see right now: - Pg_reserved + e.g. new page type (or some other unique identifier in combination with Pg_reserved) -> Avoid reads of pages we know are offline - extend is_ram_page() -> Fake zero memory for pages we know are offline Or even both (avoid reading and don't crash the kernel if it is being done). >>> >>> I really fail to see how that can work without kernel being aware of >>> PageOffline. What will/should happen if you run an old kdump tool on a >>> kernel with this partially offline memory? >>> >> >> New kernel with old dump tool: >> >> a) we have not fixed up is_ram_page() >> >> -> crash, as we access memory we shouldn't > > this is not acceptable, right? You do not want to crash your crash > kernel ;) Well, the same can happen today with PageHWPoison. The "new" kernel will happily access such pages and crash as far as I understand (it has has no idea of the old struct pages). Of course, this is "less likely" than what I describe. > >> b) we have fixed up is_ram_page() >> >> -> We have a callback to check for applicable memory in the hypervisor >> whether the parts are accessible / online or not accessible / offline. >> (e.g. via a device driver that controls a certain memory region) >> >> -> Don't read, but fake a page full of 0 >> >> >> So instead of the kernel being aware of it, it asks via is_ram_page() >> the hypervisor. > > I am still confused why do we even care about hypervisor. What if > somebody wants to have partial memory hotplug on native OS? Good point I was ignoring so far (too much focusing on my use case I assume). So for these, we would have to catch illegal accesses and a) report them (-EINVAL / - EIO) as you said b) fake a zero page I assume catching illegal accesses should be possible. Might require some work across all architectures. Still, dump tools should in addition not even try to read if possible. > >> I don't think a) is a problem. AFAICS, we have to update makedumpfile >> for every new kernel. We can perform changes and update makedumpfile >> to be compatible with new dump tools. > > Not really. You simply do not crash the kernel just because you are > trying to dump the already crashed kernel. > >> E.g. remember SECTION_IS_ONLINE you introduced ? It broke dump >> tools and required > > But has it crashed the kernel when reading the dump? If yes then the > whole dumping is fragile as hell... No, I think it simply didn't work. At least that's what I assume ;) I was rather saying that dump tools may have to be fixed up to work with a new kernel. -- Thanks, David / dhildenb
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On Thu 26-07-18 10:22:41, David Hildenbrand wrote: > On 24.07.2018 09:22, Michal Hocko wrote: > > On Mon 23-07-18 19:12:58, David Hildenbrand wrote: > >> On 23.07.2018 13:45, Vlastimil Babka wrote: > >>> On 07/20/2018 02:34 PM, David Hildenbrand wrote: > Dumping tools (like makedumpfile) right now don't exclude reserved pages. > So reserved pages might be access by dump tools although nobody except > the owner should touch them. > >>> > >>> Are you sure about that? Or maybe I understand wrong. Maybe it changed > >>> recently, but IIRC pages that are backing memmap (struct pages) are also > >>> PG_reserved. And you definitely do want those in the dump. > >> > >> I proposed a new flag/value to mask pages that are logically offline but > >> Michal wanted me to go into this direction. > >> > >> While we can special case struct pages in dump tools ("we have to > >> read/interpret them either way, so we can also dump them"), it smells > >> like my original attempt was cleaner. Michal? > > > > But we do not have many page flags spare and even if we have one or two > > this doesn't look like the use for them. So I still think we should try > > the PageReserved way. > > > > So as a summary, the only real approach that would be acceptable is > using PageReserved + some other identifier to mark pages as "logically > offline". > > I wonder what identifier could be used, as this has to be consistent for > all reserved pages (to avoid false positives). > > Using other pageflags in combination might be possible, but then we have > to make assumptions about all users of PageReserved right now. > > As far as I can see (and as has been discussed), page_type could be > used. If we don't want to consume a new bit, we could overload/reuse the > "PG_balloon" bit. > > > E.g. "PG_balloon" set -> exclude page from dump Does each user of PG_balloon check for PG_reserved? If this is the case then yes this would be OK. -- Michal Hocko SUSE Labs
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On Wed 25-07-18 16:20:41, David Hildenbrand wrote: > On 25.07.2018 15:51, Michal Hocko wrote: > > On Tue 24-07-18 16:13:09, David Hildenbrand wrote: > > [...] > >> So I see right now: > >> > >> - Pg_reserved + e.g. new page type (or some other unique identifier in > >> combination with Pg_reserved) > >> -> Avoid reads of pages we know are offline > >> - extend is_ram_page() > >> -> Fake zero memory for pages we know are offline > >> > >> Or even both (avoid reading and don't crash the kernel if it is being > >> done). > > > > I really fail to see how that can work without kernel being aware of > > PageOffline. What will/should happen if you run an old kdump tool on a > > kernel with this partially offline memory? > > > > New kernel with old dump tool: > > a) we have not fixed up is_ram_page() > > -> crash, as we access memory we shouldn't this is not acceptable, right? You do not want to crash your crash kernel ;) > b) we have fixed up is_ram_page() > > -> We have a callback to check for applicable memory in the hypervisor > whether the parts are accessible / online or not accessible / offline. > (e.g. via a device driver that controls a certain memory region) > > -> Don't read, but fake a page full of 0 > > > So instead of the kernel being aware of it, it asks via is_ram_page() > the hypervisor. I am still confused why do we even care about hypervisor. What if somebody wants to have partial memory hotplug on native OS? > I don't think a) is a problem. AFAICS, we have to update makedumpfile > for every new kernel. We can perform changes and update makedumpfile > to be compatible with new dump tools. Not really. You simply do not crash the kernel just because you are trying to dump the already crashed kernel. > E.g. remember SECTION_IS_ONLINE you introduced ? It broke dump > tools and required But has it crashed the kernel when reading the dump? If yes then the whole dumping is fragile as hell... -- Michal Hocko SUSE Labs
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 24.07.2018 09:22, Michal Hocko wrote: > On Mon 23-07-18 19:12:58, David Hildenbrand wrote: >> On 23.07.2018 13:45, Vlastimil Babka wrote: >>> On 07/20/2018 02:34 PM, David Hildenbrand wrote: Dumping tools (like makedumpfile) right now don't exclude reserved pages. So reserved pages might be access by dump tools although nobody except the owner should touch them. >>> >>> Are you sure about that? Or maybe I understand wrong. Maybe it changed >>> recently, but IIRC pages that are backing memmap (struct pages) are also >>> PG_reserved. And you definitely do want those in the dump. >> >> I proposed a new flag/value to mask pages that are logically offline but >> Michal wanted me to go into this direction. >> >> While we can special case struct pages in dump tools ("we have to >> read/interpret them either way, so we can also dump them"), it smells >> like my original attempt was cleaner. Michal? > > But we do not have many page flags spare and even if we have one or two > this doesn't look like the use for them. So I still think we should try > the PageReserved way. > So as a summary, the only real approach that would be acceptable is using PageReserved + some other identifier to mark pages as "logically offline". I wonder what identifier could be used, as this has to be consistent for all reserved pages (to avoid false positives). Using other pageflags in combination might be possible, but then we have to make assumptions about all users of PageReserved right now. As far as I can see (and as has been discussed), page_type could be used. If we don't want to consume a new bit, we could overload/reuse the "PG_balloon" bit. E.g. "PG_balloon" set -> exclude page from dump PG_balloon + !PG_reserved -> ballooned page (state as of now) PG_balloon + PG_reserved -> offline page This way, even pages inflated by virtio_balloon would not be touched. Opinions? -- Thanks, David / dhildenb
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 25.07.2018 15:51, Michal Hocko wrote: > On Tue 24-07-18 16:13:09, David Hildenbrand wrote: > [...] >> So I see right now: >> >> - Pg_reserved + e.g. new page type (or some other unique identifier in >> combination with Pg_reserved) >> -> Avoid reads of pages we know are offline >> - extend is_ram_page() >> -> Fake zero memory for pages we know are offline >> >> Or even both (avoid reading and don't crash the kernel if it is being done). > > I really fail to see how that can work without kernel being aware of > PageOffline. What will/should happen if you run an old kdump tool on a > kernel with this partially offline memory? > New kernel with old dump tool: a) we have not fixed up is_ram_page() -> crash, as we access memory we shouldn't b) we have fixed up is_ram_page() -> We have a callback to check for applicable memory in the hypervisor whether the parts are accessible / online or not accessible / offline. (e.g. via a device driver that controls a certain memory region) -> Don't read, but fake a page full of 0 So instead of the kernel being aware of it, it asks via is_ram_page() the hypervisor. I don't think a) is a problem. AFAICS, we have to update makedumpfile for every new kernel. We can perform changes and update makedumpfile to be compatible with new dump tools. E.g. remember SECTION_IS_ONLINE you introduced ? It broke dump tools and required commit 4bf4f2b0a855ccf4c7ffe13290778e92b2f5bbc9 Author: Pratyush Anand Date: Thu Aug 17 12:47:13 2017 +0900 [PATCH v2] Fix SECTION_MAP_MASK for kernel >= v.13 * Required for kernel 4.13 commit 2d070eab2e82 "mm: consider zone which is not fully populated to have holes" added a new flag SECTION_IS_ONLINE and therefore SECTION_MAP_MASK has been changed. We are not able to find correct mem_map in makedumpfile for kernel version v4.13-rc1 and onward because of the above kernel change. This patch fixes the MASK value keeping the code backward compatible Signed-off-by: Pratyush Anand Same would apply for the new combination of PageReserved + X, where we tell dump tools to exclude this page. -- Thanks, David / dhildenb
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On Tue 24-07-18 16:13:09, David Hildenbrand wrote: [...] > So I see right now: > > - Pg_reserved + e.g. new page type (or some other unique identifier in > combination with Pg_reserved) > -> Avoid reads of pages we know are offline > - extend is_ram_page() > -> Fake zero memory for pages we know are offline > > Or even both (avoid reading and don't crash the kernel if it is being done). I really fail to see how that can work without kernel being aware of PageOffline. What will/should happen if you run an old kdump tool on a kernel with this partially offline memory? -- Michal Hocko SUSE Labs
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 24.07.2018 15:35, Michal Hocko wrote: > On Tue 24-07-18 15:27:51, David Hildenbrand wrote: >> On 24.07.2018 15:13, Michal Hocko wrote: >>> On Tue 24-07-18 14:17:12, David Hildenbrand wrote: On 24.07.2018 09:25, Michal Hocko wrote: > On Mon 23-07-18 19:20:43, David Hildenbrand wrote: >> On 23.07.2018 14:30, Michal Hocko wrote: >>> On Mon 23-07-18 13:45:18, Vlastimil Babka wrote: On 07/20/2018 02:34 PM, David Hildenbrand wrote: > Dumping tools (like makedumpfile) right now don't exclude reserved > pages. > So reserved pages might be access by dump tools although nobody except > the owner should touch them. Are you sure about that? Or maybe I understand wrong. Maybe it changed recently, but IIRC pages that are backing memmap (struct pages) are also PG_reserved. And you definitely do want those in the dump. >>> >>> You are right. reserve_bootmem_region will make all early bootmem >>> allocations (including those backing memmaps) PageReserved. I have asked >>> several times but I haven't seen a satisfactory answer yet. Why do we >>> even care for kdump about those. If they are reserved the nobody should >>> really look at those specific struct pages and manipulate them. Kdump >>> tools are using a kernel interface to read the content. If the specific >>> content is backed by a non-existing memory then they should simply not >>> return anything. >>> >> >> "new kernel" provides an interface to read memory from "old kernel". >> >> The new kernel has no idea about >> - which memory was added/online in the old kernel >> - where struct pages of the old kernel are and what their content is >> - which memory is save to touch and which not >> >> Dump tools figure all that out by interpreting the VMCORE. They e.g. >> identify "struct pages" and see if they should be dumped. The "new >> kernel" only allows to read that memory. It cannot hinder to crash the >> system (e.g. if a dump tool would try to read a hwpoison page). >> >> So how should the "new kernel" know if a page can be touched or not? > > I am sorry I am not familiar with kdump much. But from what I remember > it reads from /proc/vmcore and implementation of this interface should > simply return EINVAL or alike when you try to dump inaccessible memory > range. Oh, and BTW, while something like -EINVAL could work, we usually don't want to try to read certain pages at all (e.g. ballooned pages - accessing the page might work but involves quite some overhead in the hypervisor). So we should either handle this in dump tools (reserved + ...?) or while doing the read similar to XEN (is_ram_page()). >>> >>> Yes, I think this is the proper way. Just test for PageOnline >>> in read_from_oldmem/copy_oldmem_page. Btw. we already page >>> pfn_to_online_page which performs the per-section online/offline >>> status. This should be extendable to consider your new PageOffline >>> state. >> >> That is the important bit: >> >> What the new kernel sees is not what the old kernel saw. >> >> Checking for pfn_to_online_page() from >> read_from_oldmem/copy_oldmem_page() is plain wrong. >> >> E.g. ACPI hotplug memory is not even added in the new kernel - see >> "acpi_no_memhotplug" which is used in kdump environments. >> >> The only thing we can do is >> - query the hypervisor >> - try to access and get an exception > > But we do preserve struct page's (aka memmap) from the crash kernel, > don't we? So you have the whole state there. Or am I missing something? > Yes, they are preserved but we don't interpret them, that is up to dump tools. We only provide access to the vmcore, which includes read/writing the memory indicated in it. The struct pages are simply part of the vmcore. Completely hidden from the new kernel. Finding/interpreting the struct pages is not (and most probably should never) be done in the kernel. E.g. The old kernel could be a different kernel version, different memory configuration (!SPARSE, SPARSE ...), page flags could be different ... it's not a straight forward access. That's why dump tools interpret struct pages instead. And also why I want a simple identifier in them so user space dump tools can figure out "this page is better not to be touched, the content is stale or not accessible". So I see right now: - Pg_reserved + e.g. new page type (or some other unique identifier in combination with Pg_reserved) -> Avoid reads of pages we know are offline - extend is_ram_page() -> Fake zero memory for pages we know are offline Or even both (avoid reading and don't crash the kernel if it is being done). I am not a friend of the "try to access and get an exception" approach. -- Thanks, David / dhildenb
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On Tue 24-07-18 15:27:51, David Hildenbrand wrote: > On 24.07.2018 15:13, Michal Hocko wrote: > > On Tue 24-07-18 14:17:12, David Hildenbrand wrote: > >> On 24.07.2018 09:25, Michal Hocko wrote: > >>> On Mon 23-07-18 19:20:43, David Hildenbrand wrote: > On 23.07.2018 14:30, Michal Hocko wrote: > > On Mon 23-07-18 13:45:18, Vlastimil Babka wrote: > >> On 07/20/2018 02:34 PM, David Hildenbrand wrote: > >>> Dumping tools (like makedumpfile) right now don't exclude reserved > >>> pages. > >>> So reserved pages might be access by dump tools although nobody except > >>> the owner should touch them. > >> > >> Are you sure about that? Or maybe I understand wrong. Maybe it changed > >> recently, but IIRC pages that are backing memmap (struct pages) are > >> also > >> PG_reserved. And you definitely do want those in the dump. > > > > You are right. reserve_bootmem_region will make all early bootmem > > allocations (including those backing memmaps) PageReserved. I have asked > > several times but I haven't seen a satisfactory answer yet. Why do we > > even care for kdump about those. If they are reserved the nobody should > > really look at those specific struct pages and manipulate them. Kdump > > tools are using a kernel interface to read the content. If the specific > > content is backed by a non-existing memory then they should simply not > > return anything. > > > > "new kernel" provides an interface to read memory from "old kernel". > > The new kernel has no idea about > - which memory was added/online in the old kernel > - where struct pages of the old kernel are and what their content is > - which memory is save to touch and which not > > Dump tools figure all that out by interpreting the VMCORE. They e.g. > identify "struct pages" and see if they should be dumped. The "new > kernel" only allows to read that memory. It cannot hinder to crash the > system (e.g. if a dump tool would try to read a hwpoison page). > > So how should the "new kernel" know if a page can be touched or not? > >>> > >>> I am sorry I am not familiar with kdump much. But from what I remember > >>> it reads from /proc/vmcore and implementation of this interface should > >>> simply return EINVAL or alike when you try to dump inaccessible memory > >>> range. > >> > >> Oh, and BTW, while something like -EINVAL could work, we usually don't > >> want to try to read certain pages at all (e.g. ballooned pages - > >> accessing the page might work but involves quite some overhead in the > >> hypervisor). > >> > >> So we should either handle this in dump tools (reserved + ...?) or while > >> doing the read similar to XEN (is_ram_page()). > > > > Yes, I think this is the proper way. Just test for PageOnline > > in read_from_oldmem/copy_oldmem_page. Btw. we already page > > pfn_to_online_page which performs the per-section online/offline > > status. This should be extendable to consider your new PageOffline > > state. > > That is the important bit: > > What the new kernel sees is not what the old kernel saw. > > Checking for pfn_to_online_page() from > read_from_oldmem/copy_oldmem_page() is plain wrong. > > E.g. ACPI hotplug memory is not even added in the new kernel - see > "acpi_no_memhotplug" which is used in kdump environments. > > The only thing we can do is > - query the hypervisor > - try to access and get an exception But we do preserve struct page's (aka memmap) from the crash kernel, don't we? So you have the whole state there. Or am I missing something? -- Michal Hocko SUSE Labs
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 24.07.2018 15:13, Michal Hocko wrote: > On Tue 24-07-18 14:17:12, David Hildenbrand wrote: >> On 24.07.2018 09:25, Michal Hocko wrote: >>> On Mon 23-07-18 19:20:43, David Hildenbrand wrote: On 23.07.2018 14:30, Michal Hocko wrote: > On Mon 23-07-18 13:45:18, Vlastimil Babka wrote: >> On 07/20/2018 02:34 PM, David Hildenbrand wrote: >>> Dumping tools (like makedumpfile) right now don't exclude reserved >>> pages. >>> So reserved pages might be access by dump tools although nobody except >>> the owner should touch them. >> >> Are you sure about that? Or maybe I understand wrong. Maybe it changed >> recently, but IIRC pages that are backing memmap (struct pages) are also >> PG_reserved. And you definitely do want those in the dump. > > You are right. reserve_bootmem_region will make all early bootmem > allocations (including those backing memmaps) PageReserved. I have asked > several times but I haven't seen a satisfactory answer yet. Why do we > even care for kdump about those. If they are reserved the nobody should > really look at those specific struct pages and manipulate them. Kdump > tools are using a kernel interface to read the content. If the specific > content is backed by a non-existing memory then they should simply not > return anything. > "new kernel" provides an interface to read memory from "old kernel". The new kernel has no idea about - which memory was added/online in the old kernel - where struct pages of the old kernel are and what their content is - which memory is save to touch and which not Dump tools figure all that out by interpreting the VMCORE. They e.g. identify "struct pages" and see if they should be dumped. The "new kernel" only allows to read that memory. It cannot hinder to crash the system (e.g. if a dump tool would try to read a hwpoison page). So how should the "new kernel" know if a page can be touched or not? >>> >>> I am sorry I am not familiar with kdump much. But from what I remember >>> it reads from /proc/vmcore and implementation of this interface should >>> simply return EINVAL or alike when you try to dump inaccessible memory >>> range. >> >> Oh, and BTW, while something like -EINVAL could work, we usually don't >> want to try to read certain pages at all (e.g. ballooned pages - >> accessing the page might work but involves quite some overhead in the >> hypervisor). >> >> So we should either handle this in dump tools (reserved + ...?) or while >> doing the read similar to XEN (is_ram_page()). > > Yes, I think this is the proper way. Just test for PageOnline > in read_from_oldmem/copy_oldmem_page. Btw. we already page > pfn_to_online_page which performs the per-section online/offline > status. This should be extendable to consider your new PageOffline > state. That is the important bit: What the new kernel sees is not what the old kernel saw. Checking for pfn_to_online_page() from read_from_oldmem/copy_oldmem_page() is plain wrong. E.g. ACPI hotplug memory is not even added in the new kernel - see "acpi_no_memhotplug" which is used in kdump environments. The only thing we can do is - query the hypervisor - try to access and get an exception -- Thanks, David / dhildenb
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On Tue 24-07-18 14:17:12, David Hildenbrand wrote: > On 24.07.2018 09:25, Michal Hocko wrote: > > On Mon 23-07-18 19:20:43, David Hildenbrand wrote: > >> On 23.07.2018 14:30, Michal Hocko wrote: > >>> On Mon 23-07-18 13:45:18, Vlastimil Babka wrote: > On 07/20/2018 02:34 PM, David Hildenbrand wrote: > > Dumping tools (like makedumpfile) right now don't exclude reserved > > pages. > > So reserved pages might be access by dump tools although nobody except > > the owner should touch them. > > Are you sure about that? Or maybe I understand wrong. Maybe it changed > recently, but IIRC pages that are backing memmap (struct pages) are also > PG_reserved. And you definitely do want those in the dump. > >>> > >>> You are right. reserve_bootmem_region will make all early bootmem > >>> allocations (including those backing memmaps) PageReserved. I have asked > >>> several times but I haven't seen a satisfactory answer yet. Why do we > >>> even care for kdump about those. If they are reserved the nobody should > >>> really look at those specific struct pages and manipulate them. Kdump > >>> tools are using a kernel interface to read the content. If the specific > >>> content is backed by a non-existing memory then they should simply not > >>> return anything. > >>> > >> > >> "new kernel" provides an interface to read memory from "old kernel". > >> > >> The new kernel has no idea about > >> - which memory was added/online in the old kernel > >> - where struct pages of the old kernel are and what their content is > >> - which memory is save to touch and which not > >> > >> Dump tools figure all that out by interpreting the VMCORE. They e.g. > >> identify "struct pages" and see if they should be dumped. The "new > >> kernel" only allows to read that memory. It cannot hinder to crash the > >> system (e.g. if a dump tool would try to read a hwpoison page). > >> > >> So how should the "new kernel" know if a page can be touched or not? > > > > I am sorry I am not familiar with kdump much. But from what I remember > > it reads from /proc/vmcore and implementation of this interface should > > simply return EINVAL or alike when you try to dump inaccessible memory > > range. > > Oh, and BTW, while something like -EINVAL could work, we usually don't > want to try to read certain pages at all (e.g. ballooned pages - > accessing the page might work but involves quite some overhead in the > hypervisor). > > So we should either handle this in dump tools (reserved + ...?) or while > doing the read similar to XEN (is_ram_page()). Yes, I think this is the proper way. Just test for PageOnline in read_from_oldmem/copy_oldmem_page. Btw. we already page pfn_to_online_page which performs the per-section online/offline status. This should be extendable to consider your new PageOffline state. > I wonder if we could convert the early allocated memory (PG_reserved) at > some point (buddy initialized) into ordinary "simply allocated" memory. I do not think so. There is good reason why we keep them reserved. There are many pfn walkers that simply shouldn't touch those pages. Maybe we can achieve a page reserve type for all usages but that will be a larger project I am afraid. -- Michal Hocko SUSE Labs
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On Tue 24-07-18 14:22:06, Vlastimil Babka wrote: > On 07/24/2018 01:19 PM, Michal Hocko wrote: > >> When creating a crashdump, I definitely need the pages containing memmap > >> included in the dump, so I can inspect the struct pages. But this is a > >> bit recursive issue, so I'll try making it clearer: > >> > >> 1) there are kernel pages with data (e.g. slab) that I typically need in > >> the dump, and are not PageReserved > >> 2) there are struct pages for pages 1) in the memmap that physically > >> hold the pageflags for 1), and these are PageReserved > >> 3) there are struct pages for pages 2) somewhere else in the memmap, > >> physically hold the pageflags for 2). They are probably also > >> PageReserved themselves ? and self-referencing. > >> > >> Excluding PageReserved from dump means there won't be cases 2) and 3) in > >> the dump, which at least for case 2) is making such dump almost useless > >> in many cases. > > > > Yes, we cannot simply exclude all PageReserved pages. I was merely > > suggesting to rule out new special PageReserved pages that are denoting > > offline pages. The same could be applied to HWPoison pages > > So how about marking them with some "page type" that we got after > Matthew's struct page reorg? I assume the pages we're talking about are > in a state that they don't need the mapcount/mapping field or whatever > unions with the page type... but I guess some care would be needed to > not have false positives when the union field is actually used but > happens to look like the new type. The idea was to use PageReserved because those pages are generally ignored by MM and then reuse some parts of the struct page. It belongs to the owner of the page and nothing should be really used from it at the time when you mark it offline. So pagetype or something else is merely an implementation detail. -- Michal Hocko SUSE Labs
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 24.07.2018 14:22, Vlastimil Babka wrote: > On 07/24/2018 01:19 PM, Michal Hocko wrote: >>> When creating a crashdump, I definitely need the pages containing memmap >>> included in the dump, so I can inspect the struct pages. But this is a >>> bit recursive issue, so I'll try making it clearer: >>> >>> 1) there are kernel pages with data (e.g. slab) that I typically need in >>> the dump, and are not PageReserved >>> 2) there are struct pages for pages 1) in the memmap that physically >>> hold the pageflags for 1), and these are PageReserved >>> 3) there are struct pages for pages 2) somewhere else in the memmap, >>> physically hold the pageflags for 2). They are probably also >>> PageReserved themselves ? and self-referencing. >>> >>> Excluding PageReserved from dump means there won't be cases 2) and 3) in >>> the dump, which at least for case 2) is making such dump almost useless >>> in many cases. >> >> Yes, we cannot simply exclude all PageReserved pages. I was merely >> suggesting to rule out new special PageReserved pages that are denoting >> offline pages. The same could be applied to HWPoison pages > > So how about marking them with some "page type" that we got after > Matthew's struct page reorg? I assume the pages we're talking about are > in a state that they don't need the mapcount/mapping field or whatever > unions with the page type... but I guess some care would be needed to > not have false positives when the union field is actually used but > happens to look like the new type. > Had that implemented, Michal didn't like it so far. ("waste of one bit") -- Thanks, David / dhildenb
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 07/24/2018 01:19 PM, Michal Hocko wrote: >> When creating a crashdump, I definitely need the pages containing memmap >> included in the dump, so I can inspect the struct pages. But this is a >> bit recursive issue, so I'll try making it clearer: >> >> 1) there are kernel pages with data (e.g. slab) that I typically need in >> the dump, and are not PageReserved >> 2) there are struct pages for pages 1) in the memmap that physically >> hold the pageflags for 1), and these are PageReserved >> 3) there are struct pages for pages 2) somewhere else in the memmap, >> physically hold the pageflags for 2). They are probably also >> PageReserved themselves ? and self-referencing. >> >> Excluding PageReserved from dump means there won't be cases 2) and 3) in >> the dump, which at least for case 2) is making such dump almost useless >> in many cases. > > Yes, we cannot simply exclude all PageReserved pages. I was merely > suggesting to rule out new special PageReserved pages that are denoting > offline pages. The same could be applied to HWPoison pages So how about marking them with some "page type" that we got after Matthew's struct page reorg? I assume the pages we're talking about are in a state that they don't need the mapcount/mapping field or whatever unions with the page type... but I guess some care would be needed to not have false positives when the union field is actually used but happens to look like the new type.
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 24.07.2018 09:25, Michal Hocko wrote: > On Mon 23-07-18 19:20:43, David Hildenbrand wrote: >> On 23.07.2018 14:30, Michal Hocko wrote: >>> On Mon 23-07-18 13:45:18, Vlastimil Babka wrote: On 07/20/2018 02:34 PM, David Hildenbrand wrote: > Dumping tools (like makedumpfile) right now don't exclude reserved pages. > So reserved pages might be access by dump tools although nobody except > the owner should touch them. Are you sure about that? Or maybe I understand wrong. Maybe it changed recently, but IIRC pages that are backing memmap (struct pages) are also PG_reserved. And you definitely do want those in the dump. >>> >>> You are right. reserve_bootmem_region will make all early bootmem >>> allocations (including those backing memmaps) PageReserved. I have asked >>> several times but I haven't seen a satisfactory answer yet. Why do we >>> even care for kdump about those. If they are reserved the nobody should >>> really look at those specific struct pages and manipulate them. Kdump >>> tools are using a kernel interface to read the content. If the specific >>> content is backed by a non-existing memory then they should simply not >>> return anything. >>> >> >> "new kernel" provides an interface to read memory from "old kernel". >> >> The new kernel has no idea about >> - which memory was added/online in the old kernel >> - where struct pages of the old kernel are and what their content is >> - which memory is save to touch and which not >> >> Dump tools figure all that out by interpreting the VMCORE. They e.g. >> identify "struct pages" and see if they should be dumped. The "new >> kernel" only allows to read that memory. It cannot hinder to crash the >> system (e.g. if a dump tool would try to read a hwpoison page). >> >> So how should the "new kernel" know if a page can be touched or not? > > I am sorry I am not familiar with kdump much. But from what I remember > it reads from /proc/vmcore and implementation of this interface should > simply return EINVAL or alike when you try to dump inaccessible memory > range. Oh, and BTW, while something like -EINVAL could work, we usually don't want to try to read certain pages at all (e.g. ballooned pages - accessing the page might work but involves quite some overhead in the hypervisor). So we should either handle this in dump tools (reserved + ...?) or while doing the read similar to XEN (is_ram_page()). I wonder if we could convert the early allocated memory (PG_reserved) at some point (buddy initialized) into ordinary "simply allocated" memory. -- Thanks, David / dhildenb
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On Tue 24-07-18 11:47:02, Vlastimil Babka wrote: > On 07/23/2018 02:30 PM, Michal Hocko wrote: > > On Mon 23-07-18 13:45:18, Vlastimil Babka wrote: > >> On 07/20/2018 02:34 PM, David Hildenbrand wrote: > >>> Dumping tools (like makedumpfile) right now don't exclude reserved pages. > >>> So reserved pages might be access by dump tools although nobody except > >>> the owner should touch them. > >> > >> Are you sure about that? Or maybe I understand wrong. Maybe it changed > >> recently, but IIRC pages that are backing memmap (struct pages) are also > >> PG_reserved. And you definitely do want those in the dump. > > > > You are right. reserve_bootmem_region will make all early bootmem > > allocations (including those backing memmaps) PageReserved. I have asked > > several times but I haven't seen a satisfactory answer yet. Why do we > > even care for kdump about those. If they are reserved the nobody should > > really look at those specific struct pages and manipulate them. Kdump > > tools are using a kernel interface to read the content. If the specific > > content is backed by a non-existing memory then they should simply not > > return anything. > > When creating a crashdump, I definitely need the pages containing memmap > included in the dump, so I can inspect the struct pages. But this is a > bit recursive issue, so I'll try making it clearer: > > 1) there are kernel pages with data (e.g. slab) that I typically need in > the dump, and are not PageReserved > 2) there are struct pages for pages 1) in the memmap that physically > hold the pageflags for 1), and these are PageReserved > 3) there are struct pages for pages 2) somewhere else in the memmap, > physically hold the pageflags for 2). They are probably also > PageReserved themselves ? and self-referencing. > > Excluding PageReserved from dump means there won't be cases 2) and 3) in > the dump, which at least for case 2) is making such dump almost useless > in many cases. Yes, we cannot simply exclude all PageReserved pages. I was merely suggesting to rule out new special PageReserved pages that are denoting offline pages. The same could be applied to HWPoison pages -- Michal Hocko SUSE Labs
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 07/24/2018 09:22 AM, Michal Hocko wrote: > On Mon 23-07-18 19:12:58, David Hildenbrand wrote: >> On 23.07.2018 13:45, Vlastimil Babka wrote: >>> On 07/20/2018 02:34 PM, David Hildenbrand wrote: Dumping tools (like makedumpfile) right now don't exclude reserved pages. So reserved pages might be access by dump tools although nobody except the owner should touch them. >>> >>> Are you sure about that? Or maybe I understand wrong. Maybe it changed >>> recently, but IIRC pages that are backing memmap (struct pages) are also >>> PG_reserved. And you definitely do want those in the dump. >> >> I proposed a new flag/value to mask pages that are logically offline but >> Michal wanted me to go into this direction. >> >> While we can special case struct pages in dump tools ("we have to >> read/interpret them either way, so we can also dump them"), it smells >> like my original attempt was cleaner. Michal? > > But we do not have many page flags spare and even if we have one or two > this doesn't look like the use for them. So I still think we should try > the PageReserved way. First we would have to audit everything that's using PageReserved and might be important for the crash dump to be useful. memmap might not be the only case...
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 07/23/2018 02:30 PM, Michal Hocko wrote: > On Mon 23-07-18 13:45:18, Vlastimil Babka wrote: >> On 07/20/2018 02:34 PM, David Hildenbrand wrote: >>> Dumping tools (like makedumpfile) right now don't exclude reserved pages. >>> So reserved pages might be access by dump tools although nobody except >>> the owner should touch them. >> >> Are you sure about that? Or maybe I understand wrong. Maybe it changed >> recently, but IIRC pages that are backing memmap (struct pages) are also >> PG_reserved. And you definitely do want those in the dump. > > You are right. reserve_bootmem_region will make all early bootmem > allocations (including those backing memmaps) PageReserved. I have asked > several times but I haven't seen a satisfactory answer yet. Why do we > even care for kdump about those. If they are reserved the nobody should > really look at those specific struct pages and manipulate them. Kdump > tools are using a kernel interface to read the content. If the specific > content is backed by a non-existing memory then they should simply not > return anything. When creating a crashdump, I definitely need the pages containing memmap included in the dump, so I can inspect the struct pages. But this is a bit recursive issue, so I'll try making it clearer: 1) there are kernel pages with data (e.g. slab) that I typically need in the dump, and are not PageReserved 2) there are struct pages for pages 1) in the memmap that physically hold the pageflags for 1), and these are PageReserved 3) there are struct pages for pages 2) somewhere else in the memmap, physically hold the pageflags for 2). They are probably also PageReserved themselves ? and self-referencing. Excluding PageReserved from dump means there won't be cases 2) and 3) in the dump, which at least for case 2) is making such dump almost useless in many cases.
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 24.07.2018 10:53, Michal Hocko wrote: > On Tue 24-07-18 10:46:20, David Hildenbrand wrote: >> On 24.07.2018 09:25, Michal Hocko wrote: >>> On Mon 23-07-18 19:20:43, David Hildenbrand wrote: On 23.07.2018 14:30, Michal Hocko wrote: > On Mon 23-07-18 13:45:18, Vlastimil Babka wrote: >> On 07/20/2018 02:34 PM, David Hildenbrand wrote: >>> Dumping tools (like makedumpfile) right now don't exclude reserved >>> pages. >>> So reserved pages might be access by dump tools although nobody except >>> the owner should touch them. >> >> Are you sure about that? Or maybe I understand wrong. Maybe it changed >> recently, but IIRC pages that are backing memmap (struct pages) are also >> PG_reserved. And you definitely do want those in the dump. > > You are right. reserve_bootmem_region will make all early bootmem > allocations (including those backing memmaps) PageReserved. I have asked > several times but I haven't seen a satisfactory answer yet. Why do we > even care for kdump about those. If they are reserved the nobody should > really look at those specific struct pages and manipulate them. Kdump > tools are using a kernel interface to read the content. If the specific > content is backed by a non-existing memory then they should simply not > return anything. > "new kernel" provides an interface to read memory from "old kernel". The new kernel has no idea about - which memory was added/online in the old kernel - where struct pages of the old kernel are and what their content is - which memory is save to touch and which not Dump tools figure all that out by interpreting the VMCORE. They e.g. identify "struct pages" and see if they should be dumped. The "new kernel" only allows to read that memory. It cannot hinder to crash the system (e.g. if a dump tool would try to read a hwpoison page). So how should the "new kernel" know if a page can be touched or not? >>> >>> I am sorry I am not familiar with kdump much. But from what I remember >>> it reads from /proc/vmcore and implementation of this interface should >>> simply return EINVAL or alike when you try to dump inaccessible memory >>> range. >> >> I assume the main problem with this approach is that we would always >> have to fallback to reading old memory from vmcore page by page. e.g. >> makedumpfile will always try to read bigger bunches. I also assume the >> reason HWPOISON is handled in dump tools instead of in the kernel using >> the mechanism you describe is the case. > > Is falling back to page-by-page for some ranges a real problem? I mean > most of pages will simply be there so you can go in larger chunks. Once > you get EINVAL, you just fall back to page-by-page for that particular > range. > Looking at makedumpfile code, I assume implementation wise it should be possible. They always try to read 256 pages at a time. If we get an -EINVAL (-EIO?) we could fallback to reading page by page. This implies having to properly handle exceptions when accessing memory. Not sure if that will be easy. Maybe is_ram_page() is the better alternative, because it hinders us from trying to read invalid memory (or memory with random content) in the first place. Will have to think about this and look into the details. -- Thanks, David / dhildenb
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On Tue 24-07-18 10:46:20, David Hildenbrand wrote: > On 24.07.2018 09:25, Michal Hocko wrote: > > On Mon 23-07-18 19:20:43, David Hildenbrand wrote: > >> On 23.07.2018 14:30, Michal Hocko wrote: > >>> On Mon 23-07-18 13:45:18, Vlastimil Babka wrote: > On 07/20/2018 02:34 PM, David Hildenbrand wrote: > > Dumping tools (like makedumpfile) right now don't exclude reserved > > pages. > > So reserved pages might be access by dump tools although nobody except > > the owner should touch them. > > Are you sure about that? Or maybe I understand wrong. Maybe it changed > recently, but IIRC pages that are backing memmap (struct pages) are also > PG_reserved. And you definitely do want those in the dump. > >>> > >>> You are right. reserve_bootmem_region will make all early bootmem > >>> allocations (including those backing memmaps) PageReserved. I have asked > >>> several times but I haven't seen a satisfactory answer yet. Why do we > >>> even care for kdump about those. If they are reserved the nobody should > >>> really look at those specific struct pages and manipulate them. Kdump > >>> tools are using a kernel interface to read the content. If the specific > >>> content is backed by a non-existing memory then they should simply not > >>> return anything. > >>> > >> > >> "new kernel" provides an interface to read memory from "old kernel". > >> > >> The new kernel has no idea about > >> - which memory was added/online in the old kernel > >> - where struct pages of the old kernel are and what their content is > >> - which memory is save to touch and which not > >> > >> Dump tools figure all that out by interpreting the VMCORE. They e.g. > >> identify "struct pages" and see if they should be dumped. The "new > >> kernel" only allows to read that memory. It cannot hinder to crash the > >> system (e.g. if a dump tool would try to read a hwpoison page). > >> > >> So how should the "new kernel" know if a page can be touched or not? > > > > I am sorry I am not familiar with kdump much. But from what I remember > > it reads from /proc/vmcore and implementation of this interface should > > simply return EINVAL or alike when you try to dump inaccessible memory > > range. > > I assume the main problem with this approach is that we would always > have to fallback to reading old memory from vmcore page by page. e.g. > makedumpfile will always try to read bigger bunches. I also assume the > reason HWPOISON is handled in dump tools instead of in the kernel using > the mechanism you describe is the case. Is falling back to page-by-page for some ranges a real problem? I mean most of pages will simply be there so you can go in larger chunks. Once you get EINVAL, you just fall back to page-by-page for that particular range. -- Michal Hocko SUSE Labs
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 24.07.2018 09:25, Michal Hocko wrote: > On Mon 23-07-18 19:20:43, David Hildenbrand wrote: >> On 23.07.2018 14:30, Michal Hocko wrote: >>> On Mon 23-07-18 13:45:18, Vlastimil Babka wrote: On 07/20/2018 02:34 PM, David Hildenbrand wrote: > Dumping tools (like makedumpfile) right now don't exclude reserved pages. > So reserved pages might be access by dump tools although nobody except > the owner should touch them. Are you sure about that? Or maybe I understand wrong. Maybe it changed recently, but IIRC pages that are backing memmap (struct pages) are also PG_reserved. And you definitely do want those in the dump. >>> >>> You are right. reserve_bootmem_region will make all early bootmem >>> allocations (including those backing memmaps) PageReserved. I have asked >>> several times but I haven't seen a satisfactory answer yet. Why do we >>> even care for kdump about those. If they are reserved the nobody should >>> really look at those specific struct pages and manipulate them. Kdump >>> tools are using a kernel interface to read the content. If the specific >>> content is backed by a non-existing memory then they should simply not >>> return anything. >>> >> >> "new kernel" provides an interface to read memory from "old kernel". >> >> The new kernel has no idea about >> - which memory was added/online in the old kernel >> - where struct pages of the old kernel are and what their content is >> - which memory is save to touch and which not >> >> Dump tools figure all that out by interpreting the VMCORE. They e.g. >> identify "struct pages" and see if they should be dumped. The "new >> kernel" only allows to read that memory. It cannot hinder to crash the >> system (e.g. if a dump tool would try to read a hwpoison page). >> >> So how should the "new kernel" know if a page can be touched or not? > > I am sorry I am not familiar with kdump much. But from what I remember > it reads from /proc/vmcore and implementation of this interface should > simply return EINVAL or alike when you try to dump inaccessible memory > range. I assume the main problem with this approach is that we would always have to fallback to reading old memory from vmcore page by page. e.g. makedumpfile will always try to read bigger bunches. I also assume the reason HWPOISON is handled in dump tools instead of in the kernel using the mechanism you describe is the case. One way to avoid this would be to silently "read zero". Although not nice, it avoids having to touch dump tools. E.g. fs/proc/vmcore.c:read_from_oldmem() has a hook called "pfn_is_ram()". This is the hook for XEN I mentioned previously. -> register_oldmem_pfn_is_ram() However this callback right now assumes that there is a "global hypervisor implemented way of checking whether a page is accessible". We don't want anything like that in KVM. I could imagine extending this register mechanism in a way that - we can have multiple callbacks - we can return something like "Yes" / "No" / "Don't know" So we could have multiple devices (controlling a memory area) register there and when called, they could see if they are responsible for that area and query the hypervisor (e.g. using virtio). Might be complicated but the last resort. -- Thanks, David / dhildenb
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On Mon 23-07-18 19:20:43, David Hildenbrand wrote: > On 23.07.2018 14:30, Michal Hocko wrote: > > On Mon 23-07-18 13:45:18, Vlastimil Babka wrote: > >> On 07/20/2018 02:34 PM, David Hildenbrand wrote: > >>> Dumping tools (like makedumpfile) right now don't exclude reserved pages. > >>> So reserved pages might be access by dump tools although nobody except > >>> the owner should touch them. > >> > >> Are you sure about that? Or maybe I understand wrong. Maybe it changed > >> recently, but IIRC pages that are backing memmap (struct pages) are also > >> PG_reserved. And you definitely do want those in the dump. > > > > You are right. reserve_bootmem_region will make all early bootmem > > allocations (including those backing memmaps) PageReserved. I have asked > > several times but I haven't seen a satisfactory answer yet. Why do we > > even care for kdump about those. If they are reserved the nobody should > > really look at those specific struct pages and manipulate them. Kdump > > tools are using a kernel interface to read the content. If the specific > > content is backed by a non-existing memory then they should simply not > > return anything. > > > > "new kernel" provides an interface to read memory from "old kernel". > > The new kernel has no idea about > - which memory was added/online in the old kernel > - where struct pages of the old kernel are and what their content is > - which memory is save to touch and which not > > Dump tools figure all that out by interpreting the VMCORE. They e.g. > identify "struct pages" and see if they should be dumped. The "new > kernel" only allows to read that memory. It cannot hinder to crash the > system (e.g. if a dump tool would try to read a hwpoison page). > > So how should the "new kernel" know if a page can be touched or not? I am sorry I am not familiar with kdump much. But from what I remember it reads from /proc/vmcore and implementation of this interface should simply return EINVAL or alike when you try to dump inaccessible memory range. > The *only* way would be to have an interface to the hypervisor where we > "sense" if a memory location is safe to touch. I remember that xen or > hyper-v does that - they fake a zero page in that case, after querying > the hypervisor. But this does not sound like a clean approach to me, > especially es we need yet another hypervisor interface to sense for > memory provided via "some" device. > > If we can find a way to just tag pages as "don't touch", it would be the > easiest and cleanest solution in my opinion. If only we could have much more spare room in struct pages... -- Michal Hocko SUSE Labs
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On Mon 23-07-18 19:12:58, David Hildenbrand wrote: > On 23.07.2018 13:45, Vlastimil Babka wrote: > > On 07/20/2018 02:34 PM, David Hildenbrand wrote: > >> Dumping tools (like makedumpfile) right now don't exclude reserved pages. > >> So reserved pages might be access by dump tools although nobody except > >> the owner should touch them. > > > > Are you sure about that? Or maybe I understand wrong. Maybe it changed > > recently, but IIRC pages that are backing memmap (struct pages) are also > > PG_reserved. And you definitely do want those in the dump. > > I proposed a new flag/value to mask pages that are logically offline but > Michal wanted me to go into this direction. > > While we can special case struct pages in dump tools ("we have to > read/interpret them either way, so we can also dump them"), it smells > like my original attempt was cleaner. Michal? But we do not have many page flags spare and even if we have one or two this doesn't look like the use for them. So I still think we should try the PageReserved way. -- Michal Hocko SUSE Labs
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 23.07.2018 14:30, Michal Hocko wrote: > On Mon 23-07-18 13:45:18, Vlastimil Babka wrote: >> On 07/20/2018 02:34 PM, David Hildenbrand wrote: >>> Dumping tools (like makedumpfile) right now don't exclude reserved pages. >>> So reserved pages might be access by dump tools although nobody except >>> the owner should touch them. >> >> Are you sure about that? Or maybe I understand wrong. Maybe it changed >> recently, but IIRC pages that are backing memmap (struct pages) are also >> PG_reserved. And you definitely do want those in the dump. > > You are right. reserve_bootmem_region will make all early bootmem > allocations (including those backing memmaps) PageReserved. I have asked > several times but I haven't seen a satisfactory answer yet. Why do we > even care for kdump about those. If they are reserved the nobody should > really look at those specific struct pages and manipulate them. Kdump > tools are using a kernel interface to read the content. If the specific > content is backed by a non-existing memory then they should simply not > return anything. > "new kernel" provides an interface to read memory from "old kernel". The new kernel has no idea about - which memory was added/online in the old kernel - where struct pages of the old kernel are and what their content is - which memory is save to touch and which not Dump tools figure all that out by interpreting the VMCORE. They e.g. identify "struct pages" and see if they should be dumped. The "new kernel" only allows to read that memory. It cannot hinder to crash the system (e.g. if a dump tool would try to read a hwpoison page). So how should the "new kernel" know if a page can be touched or not? The *only* way would be to have an interface to the hypervisor where we "sense" if a memory location is safe to touch. I remember that xen or hyper-v does that - they fake a zero page in that case, after querying the hypervisor. But this does not sound like a clean approach to me, especially es we need yet another hypervisor interface to sense for memory provided via "some" device. If we can find a way to just tag pages as "don't touch", it would be the easiest and cleanest solution in my opinion. -- Thanks, David / dhildenb
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 23.07.2018 13:45, Vlastimil Babka wrote: > On 07/20/2018 02:34 PM, David Hildenbrand wrote: >> Dumping tools (like makedumpfile) right now don't exclude reserved pages. >> So reserved pages might be access by dump tools although nobody except >> the owner should touch them. > > Are you sure about that? Or maybe I understand wrong. Maybe it changed > recently, but IIRC pages that are backing memmap (struct pages) are also > PG_reserved. And you definitely do want those in the dump. I proposed a new flag/value to mask pages that are logically offline but Michal wanted me to go into this direction. While we can special case struct pages in dump tools ("we have to read/interpret them either way, so we can also dump them"), it smells like my original attempt was cleaner. Michal? > >> This is relevant in virtual environments where we soon might want to >> report certain reserved pages to the hypervisor and they might no longer >> be accessible - what already was documented for reserved pages a long >> time ago ("might not even exist"). >> >> David Hildenbrand (2): >> mm: clarify semantics of reserved pages >> kdump: include PG_reserved value in VMCOREINFO >> >> include/linux/page-flags.h | 4 ++-- >> kernel/crash_core.c| 1 + >> 2 files changed, 3 insertions(+), 2 deletions(-) >> > -- Thanks, David / dhildenb
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On Mon 23-07-18 13:45:18, Vlastimil Babka wrote: > On 07/20/2018 02:34 PM, David Hildenbrand wrote: > > Dumping tools (like makedumpfile) right now don't exclude reserved pages. > > So reserved pages might be access by dump tools although nobody except > > the owner should touch them. > > Are you sure about that? Or maybe I understand wrong. Maybe it changed > recently, but IIRC pages that are backing memmap (struct pages) are also > PG_reserved. And you definitely do want those in the dump. You are right. reserve_bootmem_region will make all early bootmem allocations (including those backing memmaps) PageReserved. I have asked several times but I haven't seen a satisfactory answer yet. Why do we even care for kdump about those. If they are reserved the nobody should really look at those specific struct pages and manipulate them. Kdump tools are using a kernel interface to read the content. If the specific content is backed by a non-existing memory then they should simply not return anything. -- Michal Hocko SUSE Labs
Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 07/20/2018 02:34 PM, David Hildenbrand wrote: > Dumping tools (like makedumpfile) right now don't exclude reserved pages. > So reserved pages might be access by dump tools although nobody except > the owner should touch them. Are you sure about that? Or maybe I understand wrong. Maybe it changed recently, but IIRC pages that are backing memmap (struct pages) are also PG_reserved. And you definitely do want those in the dump. > This is relevant in virtual environments where we soon might want to > report certain reserved pages to the hypervisor and they might no longer > be accessible - what already was documented for reserved pages a long > time ago ("might not even exist"). > > David Hildenbrand (2): > mm: clarify semantics of reserved pages > kdump: include PG_reserved value in VMCOREINFO > > include/linux/page-flags.h | 4 ++-- > kernel/crash_core.c| 1 + > 2 files changed, 3 insertions(+), 2 deletions(-) >