Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
On 04/22/20 at 12:05pm, David Hildenbrand wrote: > On 22.04.20 11:57, Baoquan He wrote: > > On 04/22/20 at 11:24am, David Hildenbrand wrote: > >> On 22.04.20 11:17, Baoquan He wrote: > >>> On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we > >> don't > >> pass the efi, it won't get the SRAT table correctly, if I remember > >> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >> ACPI only, this won't happen on bare metal though. Need check > >> carefully. > >> I have been using kvm guest with uefi firmwire recently. > > > > Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > > > > I'm also asking because of virtio-mem. Memory added via virtio-mem is > > not part of any efi tables or whatsoever. So I assume the kexec kernel > > will not detect it automatically (good!), instead load the virtio-mem > > driver and let it add memory back to the system. > > > > I should probably play with kexec and virtio-mem once I have some spare > > cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). > >>> > >>> Right, kdump is not impacted later added memory. > >>> > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. > >>> > >>> kexec_file_load just behaves as you tested. It doesn't collect later > >>> added memory to e820 because it uses e820_table_kexec directly to pass > >>> e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > >>> during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > >>> doesn't have it in e820 during bootup, but it's recoginized and added > >>> when ACPI scanning. I think we should update e820_table_kexec when hot > >>> add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > >>> balloon will need be added into e820_table_kexec too, and if this is > >>> expected behaviour. > >>> > >>> But whatever we do, it won't impact the kexec file_loading, because of > >>> the searching strategy bottom up. Just adding them into e820_table_kexec > >>> will make it consistent with cold reboot which get recognizes and get > >>> them into e820 during bootup. > >> > >> Yeah, I think whatever a cold-booted kernel will see is what kexec-ed > >> kernel should see. Not more, not less. > >> > >> Regarding virtio-mem: Not in e820 on cold-boot. > >> Regarding DIMMs: DIMMs under KVM will never show up in the e820 map > >> IIRC. I think on real HW it can be different. > > > > Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature > > of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of > > QEMU/KVM guest in this area, why we don't make kvm guest recognize > > hotpluggable DIMM and add it into e820 map, he said he had tried to make > > it, but this will corrupt guest on HyperV. So he had to revert the > > Yeah, I remember that this had to be reverted due to something breaking. > But OTOH, it allows us to online coldplugged DIMMs online_movable > easily, so I'd say it's even a feature (although, does not behave like > real HW we have). > > I use this extensively when testing memory hot(un)plug via coldplugged > DIMMs. > > I do wonder if there is real HW, where this is also the case. None for what I know. Hotplug on real HW includes two parts, the boot mem being hotpluggable is more flexiable one. It allows people to replace bad DIMM. And you can see code in boot stage has been adjusted a lot on this purpose, at that time, people haven't thought about kvm guest. > > > commit on qemu. So I think we can leave it for now for both real HW and > > kvm, or update the e820_table_kexec to include added DIMM for both real > > HW and KVM. I hope one day KVM dev will find a way to conquer the defect > > on HyperV and make the e820map consistent with bare metal. After all, > > kvm guest is trying to imitate real HW for the most part. > > > > Anyway, I will think about the e820_table_kexec updating. See if we can > > do something about it. > > Yeah, for DIMMs on real HW it might definitely make sense. We might be > able to hook into updates of /sys/firmware/memmap on memory add/remove.
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
On 22.04.20 11:57, Baoquan He wrote: > On 04/22/20 at 11:24am, David Hildenbrand wrote: >> On 22.04.20 11:17, Baoquan He wrote: >>> On 04/21/20 at 03:29pm, David Hildenbrand wrote: >> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we >> don't >> pass the efi, it won't get the SRAT table correctly, if I remember >> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >> ACPI only, this won't happen on bare metal though. Need check carefully. >> I have been using kvm guest with uefi firmwire recently. > > Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > > I'm also asking because of virtio-mem. Memory added via virtio-mem is > not part of any efi tables or whatsoever. So I assume the kexec kernel > will not detect it automatically (good!), instead load the virtio-mem > driver and let it add memory back to the system. > > I should probably play with kexec and virtio-mem once I have some spare > cycles ... to find out what's broken and needs to be addressed :) FWIW, I just gave virtio-mem and kexec/kdump a try. a) kdump seems to work. Memory added by virtio-mem is getting dumped. The kexec kernel only uses memory in the crash region. The virtio-mem driver properly bails out due to is_kdump_kernel(). >>> >>> Right, kdump is not impacted later added memory. >>> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem to get placed on virtio-mem memory (pure luck due to the left-to-right search). Memory added by virtio-mem is not getting added to the e820 map. Once the virtio-mem driver comes back up in the kexec kernel, the right memory is readded. >>> >>> kexec_file_load just behaves as you tested. It doesn't collect later >>> added memory to e820 because it uses e820_table_kexec directly to pass >>> e820 to kexec-ed kernel. However, this e820_table_kexec is only updated >>> during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel >>> doesn't have it in e820 during bootup, but it's recoginized and added >>> when ACPI scanning. I think we should update e820_table_kexec when hot >>> add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, >>> balloon will need be added into e820_table_kexec too, and if this is >>> expected behaviour. >>> >>> But whatever we do, it won't impact the kexec file_loading, because of >>> the searching strategy bottom up. Just adding them into e820_table_kexec >>> will make it consistent with cold reboot which get recognizes and get >>> them into e820 during bootup. >> >> Yeah, I think whatever a cold-booted kernel will see is what kexec-ed >> kernel should see. Not more, not less. >> >> Regarding virtio-mem: Not in e820 on cold-boot. >> Regarding DIMMs: DIMMs under KVM will never show up in the e820 map >> IIRC. I think on real HW it can be different. > > Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature > of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of > QEMU/KVM guest in this area, why we don't make kvm guest recognize > hotpluggable DIMM and add it into e820 map, he said he had tried to make > it, but this will corrupt guest on HyperV. So he had to revert the Yeah, I remember that this had to be reverted due to something breaking. But OTOH, it allows us to online coldplugged DIMMs online_movable easily, so I'd say it's even a feature (although, does not behave like real HW we have). I use this extensively when testing memory hot(un)plug via coldplugged DIMMs. I do wonder if there is real HW, where this is also the case. > commit on qemu. So I think we can leave it for now for both real HW and > kvm, or update the e820_table_kexec to include added DIMM for both real > HW and KVM. I hope one day KVM dev will find a way to conquer the defect > on HyperV and make the e820map consistent with bare metal. After all, > kvm guest is trying to imitate real HW for the most part. > > Anyway, I will think about the e820_table_kexec updating. See if we can > do something about it. Yeah, for DIMMs on real HW it might definitely make sense. We might be able to hook into updates of /sys/firmware/memmap on memory add/remove. -- Thanks, David / dhildenb
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
On 04/22/20 at 11:24am, David Hildenbrand wrote: > On 22.04.20 11:17, Baoquan He wrote: > > On 04/21/20 at 03:29pm, David Hildenbrand wrote: > ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we > don't > pass the efi, it won't get the SRAT table correctly, if I remember > correctly. Yeah, I remeber kvm guest can get memory hotplugged with > ACPI only, this won't happen on bare metal though. Need check carefully. > I have been using kvm guest with uefi firmwire recently. > >>> > >>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > >>> > >>> I'm also asking because of virtio-mem. Memory added via virtio-mem is > >>> not part of any efi tables or whatsoever. So I assume the kexec kernel > >>> will not detect it automatically (good!), instead load the virtio-mem > >>> driver and let it add memory back to the system. > >>> > >>> I should probably play with kexec and virtio-mem once I have some spare > >>> cycles ... to find out what's broken and needs to be addressed :) > >> > >> FWIW, I just gave virtio-mem and kexec/kdump a try. > >> > >> a) kdump seems to work. Memory added by virtio-mem is getting dumped. > >> The kexec kernel only uses memory in the crash region. The virtio-mem > >> driver properly bails out due to is_kdump_kernel(). > > > > Right, kdump is not impacted later added memory. > > > >> > >> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > >> to get placed on virtio-mem memory (pure luck due to the left-to-right > >> search). Memory added by virtio-mem is not getting added to the e820 > >> map. Once the virtio-mem driver comes back up in the kexec kernel, the > >> right memory is readded. > > > > kexec_file_load just behaves as you tested. It doesn't collect later > > added memory to e820 because it uses e820_table_kexec directly to pass > > e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > > during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > > doesn't have it in e820 during bootup, but it's recoginized and added > > when ACPI scanning. I think we should update e820_table_kexec when hot > > add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > > balloon will need be added into e820_table_kexec too, and if this is > > expected behaviour. > > > > But whatever we do, it won't impact the kexec file_loading, because of > > the searching strategy bottom up. Just adding them into e820_table_kexec > > will make it consistent with cold reboot which get recognizes and get > > them into e820 during bootup. > > Yeah, I think whatever a cold-booted kernel will see is what kexec-ed > kernel should see. Not more, not less. > > Regarding virtio-mem: Not in e820 on cold-boot. > Regarding DIMMs: DIMMs under KVM will never show up in the e820 map > IIRC. I think on real HW it can be different. Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of QEMU/KVM guest in this area, why we don't make kvm guest recognize hotpluggable DIMM and add it into e820 map, he said he had tried to make it, but this will corrupt guest on HyperV. So he had to revert the commit on qemu. So I think we can leave it for now for both real HW and kvm, or update the e820_table_kexec to include added DIMM for both real HW and KVM. I hope one day KVM dev will find a way to conquer the defect on HyperV and make the e820map consistent with bare metal. After all, kvm guest is trying to imitate real HW for the most part. Anyway, I will think about the e820_table_kexec updating. See if we can do something about it.
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
On 22.04.20 11:17, Baoquan He wrote: > On 04/21/20 at 03:29pm, David Hildenbrand wrote: ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't pass the efi, it won't get the SRAT table correctly, if I remember correctly. Yeah, I remeber kvm guest can get memory hotplugged with ACPI only, this won't happen on bare metal though. Need check carefully. I have been using kvm guest with uefi firmwire recently. >>> >>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >>> >>> I'm also asking because of virtio-mem. Memory added via virtio-mem is >>> not part of any efi tables or whatsoever. So I assume the kexec kernel >>> will not detect it automatically (good!), instead load the virtio-mem >>> driver and let it add memory back to the system. >>> >>> I should probably play with kexec and virtio-mem once I have some spare >>> cycles ... to find out what's broken and needs to be addressed :) >> >> FWIW, I just gave virtio-mem and kexec/kdump a try. >> >> a) kdump seems to work. Memory added by virtio-mem is getting dumped. >> The kexec kernel only uses memory in the crash region. The virtio-mem >> driver properly bails out due to is_kdump_kernel(). > > Right, kdump is not impacted later added memory. > >> >> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >> to get placed on virtio-mem memory (pure luck due to the left-to-right >> search). Memory added by virtio-mem is not getting added to the e820 >> map. Once the virtio-mem driver comes back up in the kexec kernel, the >> right memory is readded. > > kexec_file_load just behaves as you tested. It doesn't collect later > added memory to e820 because it uses e820_table_kexec directly to pass > e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > doesn't have it in e820 during bootup, but it's recoginized and added > when ACPI scanning. I think we should update e820_table_kexec when hot > add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > balloon will need be added into e820_table_kexec too, and if this is > expected behaviour. > > But whatever we do, it won't impact the kexec file_loading, because of > the searching strategy bottom up. Just adding them into e820_table_kexec > will make it consistent with cold reboot which get recognizes and get > them into e820 during bootup. Yeah, I think whatever a cold-booted kernel will see is what kexec-ed kernel should see. Not more, not less. Regarding virtio-mem: Not in e820 on cold-boot. Regarding DIMMs: DIMMs under KVM will never show up in the e820 map IIRC. I think on real HW it can be different. -- Thanks, David / dhildenb
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >> pass the efi, it won't get the SRAT table correctly, if I remember > >> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >> ACPI only, this won't happen on bare metal though. Need check carefully. > >> I have been using kvm guest with uefi firmwire recently. > > > > Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > > > > I'm also asking because of virtio-mem. Memory added via virtio-mem is > > not part of any efi tables or whatsoever. So I assume the kexec kernel > > will not detect it automatically (good!), instead load the virtio-mem > > driver and let it add memory back to the system. > > > > I should probably play with kexec and virtio-mem once I have some spare > > cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). Right, kdump is not impacted later added memory. > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. kexec_file_load just behaves as you tested. It doesn't collect later added memory to e820 because it uses e820_table_kexec directly to pass e820 to kexec-ed kernel. However, this e820_table_kexec is only updated during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel doesn't have it in e820 during bootup, but it's recoginized and added when ACPI scanning. I think we should update e820_table_kexec when hot add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, balloon will need be added into e820_table_kexec too, and if this is expected behaviour. But whatever we do, it won't impact the kexec file_loading, because of the searching strategy bottom up. Just adding them into e820_table_kexec will make it consistent with cold reboot which get recognizes and get them into e820 during bootup. > > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. Yes, kexec_load will read memory regions from /sys/firmware/memmap/ or /proc/iomem. Making it right seems a little harder, we can export them to /proc/iomem or /sys/firmware/memmap/ with mark them with 'hotplug', but the attribute that which zone they belongs to is not easy to tell. We are proactive on widely testing kexec_file_load on x86_64, s390, arm64 by adding test cases into CKI. > > > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. > > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). > > Baoquan, any opinion on that? > > -- > Thanks, > > David / dhildenb
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
>> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >> to get placed on virtio-mem memory (pure luck due to the left-to-right >> search). Memory added by virtio-mem is not getting added to the e820 >> map. Once the virtio-mem driver comes back up in the kexec kernel, the >> right memory is readded. > > This sounds like a bug. This is how virtio-mem wants its memory to get handled. > >> c) "kexec -c -l" does not work properly. All memory added by virtio-mem >> is added to the e820 map, which is wrong. Memory that should not be >> touched will be touched by the kexec kernel. I assume kexec-tools just >> goes ahead and adds anything it can find in /proc/iomem (or >> /sys/firmware/memmap/) to the e820 map of the new kernel. >> >> Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is >> similarly added to the e820 map and, therefore, won't be able to be >> onlined MOVABLE easily. > > This sounds like correct behavior to me. If you add memory to the > system it is treated as memory to the system. Yeah, I would agree if we are talking about DIMMs, but this memory is special. It's added via a paravirtualized interface and will contain holes, especially after unplug. While memory in these holes can usually be read, it should not be written. More on that below. > > If we need to make it a special kind of memory with special rules we can > have some kind of special marking for the memory. But hotplugged is not > in itself a sufficient criteria to say don't use this as normal memory. Agreed. It is special, though. > > If take a huge server and I plug in an extra dimm it is just memory. Agreed. [...] > > Now perhaps virtualization needs a special tier of memory that should > only be used for cases where the memory is easily movable. > > I am not familiar with virtio-mem but my skim of the initial design > is that virtio-mem was not designed to be such a special tier of memory. > Perhaps something has changed? > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html Yes, a lot changed. See https://lkml.kernel.org/r/20200311171422.10484-1-da...@redhat.com for the latest-greatest design overview. > >> b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by >> indicating it in /proc/iomem in a special way ("System RAM >> (hotplugged)"/"System RAM (virtio-mem)"). > > How does the kernel memory allocator treat this memory? So what virtio-mem does is add memory sections on demand and populate within these sections the requested amount of memory. E.g., if 64MB are requested, it will add a 128MB section/resource but only make the first 64MB accessible (via the hypervisor) and only give the first 64MB to the buddy. This way of adding memory is similar to what XEN and hypver-v balloon drivers do when hotplugging memory. When requested to plug more memory, it might go ahead and make (parts of) the remaining 64MB accessible and give them to the buddy. In case it cannot "fill any holes", it will add a new section. When requested to unplug memory, it will try to remove memory from the added (here 64MB) memory from the buddy and tell the hypervisor about it. So, it has some similarity to ballooning in virtual environment, however, it manages its own device memory only and can therefore give better guarantees and detect malicious guests. Right now, I think the right approach would be to not create /sys/firmware/memmap entries from memory virtio-mem added. [...] > > p.s. Please excuse me for jumping in I may be missing some important > context, but what I read when I saw this message in my inbox just seemed > very wrong. Yeah, still, thanks for having a look. Please let me know if you need more information. -- Thanks, David / dhildenb
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
David Hildenbrand writes: >>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>> pass the efi, it won't get the SRAT table correctly, if I remember >>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>> ACPI only, this won't happen on bare metal though. Need check carefully. >>> I have been using kvm guest with uefi firmwire recently. >> >> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >> >> I'm also asking because of virtio-mem. Memory added via virtio-mem is >> not part of any efi tables or whatsoever. So I assume the kexec kernel >> will not detect it automatically (good!), instead load the virtio-mem >> driver and let it add memory back to the system. >> >> I should probably play with kexec and virtio-mem once I have some spare >> cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. This sounds like a bug. > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. This sounds like correct behavior to me. If you add memory to the system it is treated as memory to the system. If we need to make it a special kind of memory with special rules we can have some kind of special marking for the memory. But hotplugged is not in itself a sufficient criteria to say don't use this as normal memory. If take a huge server and I plug in an extra dimm it is just memory. For a similarly huge server I might want to have memory that the system booted with unpluggable, in case hardware error reporting notices a dimm generating a lot of memory errors. Now perhaps virtualization needs a special tier of memory that should only be used for cases where the memory is easily movable. I am not familiar with virtio-mem but my skim of the initial design is that virtio-mem was not designed to be such a special tier of memory. Perhaps something has changed? https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. No. > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). How does the kernel memory allocator treat this memory? The logic is simple. If the kernel memory allocator treats that memory as ordinary memory available for all uses it should be presented as ordinary memory available for all uses. If the kernel memory allocator treats that memory as special memory only available for uses that we can easily free later and give back to the system. AKA it is special and not oridinary memory we should mark it as such. Eric p.s. Please excuse me for jumping in I may be missing some important context, but what I read when I saw this message in my inbox just seemed very wrong.
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
On 21.04.20 15:29, David Hildenbrand wrote: >>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>> pass the efi, it won't get the SRAT table correctly, if I remember >>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>> ACPI only, this won't happen on bare metal though. Need check carefully. >>> I have been using kvm guest with uefi firmwire recently. >> >> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >> >> I'm also asking because of virtio-mem. Memory added via virtio-mem is >> not part of any efi tables or whatsoever. So I assume the kexec kernel >> will not detect it automatically (good!), instead load the virtio-mem >> driver and let it add memory back to the system. >> >> I should probably play with kexec and virtio-mem once I have some spare >> cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. > > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. > > > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. > > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). I just realized, that *not* creating /sys/firmware/memmap/ entries for virtio-mem memory seems to be the right thing to do. -- Thanks, David / dhildenb
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >> pass the efi, it won't get the SRAT table correctly, if I remember >> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >> ACPI only, this won't happen on bare metal though. Need check carefully. >> I have been using kvm guest with uefi firmwire recently. > > Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > > I'm also asking because of virtio-mem. Memory added via virtio-mem is > not part of any efi tables or whatsoever. So I assume the kexec kernel > will not detect it automatically (good!), instead load the virtio-mem > driver and let it add memory back to the system. > > I should probably play with kexec and virtio-mem once I have some spare > cycles ... to find out what's broken and needs to be addressed :) FWIW, I just gave virtio-mem and kexec/kdump a try. a) kdump seems to work. Memory added by virtio-mem is getting dumped. The kexec kernel only uses memory in the crash region. The virtio-mem driver properly bails out due to is_kdump_kernel(). b) "kexec -s -l" seems to work fine. For now, the kernel does not seem to get placed on virtio-mem memory (pure luck due to the left-to-right search). Memory added by virtio-mem is not getting added to the e820 map. Once the virtio-mem driver comes back up in the kexec kernel, the right memory is readded. c) "kexec -c -l" does not work properly. All memory added by virtio-mem is added to the e820 map, which is wrong. Memory that should not be touched will be touched by the kexec kernel. I assume kexec-tools just goes ahead and adds anything it can find in /proc/iomem (or /sys/firmware/memmap/) to the e820 map of the new kernel. Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is similarly added to the e820 map and, therefore, won't be able to be onlined MOVABLE easily. At least for virtio-mem, I would either have to a) Not support "kexec -c -l". A viable option if we would be planning on not supporting it either way in the long term. I could block this in-kernel somehow eventually. b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by indicating it in /proc/iomem in a special way ("System RAM (hotplugged)"/"System RAM (virtio-mem)"). Baoquan, any opinion on that? -- Thanks, David / dhildenb
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
>> kexec_walk_memblock() has the option for "kbuf->top_down". Only >> kexec_walk_resources() seems to ignore it. > > Yeah, that top down searching is done in a found low mem area. Means > firstly search an available region bottom up, then put kernel top down > in that region. The reason is our iomem res is linked with singly linked > list. So we can only search bottom up efficiently. > > kexec_load is doing the real top down searching, so kernel will be put > at the top of system ram. I ever tried to change it to support top down > searching for kexec_file_load too with patches, since QE and customers > are often confused with this difference when debugging. > > Andrew may remeber this, he suggested me to change the singly linked list > to doubly linked list for iomem res, then do the top down searching for > kexec_file_load. I tried with some effort, the change introduced too much > code change, I just gave up finally. Well, at least right now this seems to be the right approach (hotplug), lol :) > > http://archive.lwn.net:8080/devicetree/20180718024944.577-1-...@redhat.com/ > > I can see that top down searching for kexec can avoid the highly used > low memory region, esp under 4G, for dma, kinds of firmware reserving, > etc. And customers/QE of kexec get used to it. I can change kexec_file_load > to top down too with a simple way if people really complain it. But now, > seems bottom up is not bad too. Ah, I understand the problem. Maybe a simple "optimization" would be to start searching bottom-up from e.g.,2GB/4GB first. If nothing was found, search botoom-up from 0-2GB/4GB etc. > >> >> So I think in case of memblocks (e.g., arm64), this still applies? > > Yeah, aren't you trying to remove it? I haven't read your patches > carefully, maybe I got it wrong. And arm64 even can't support the hot added For arm64 we're still creating memblocks for hotplugged memory, but I guess it's not too hard to stop doing that. > memory being able to recorded into firmware, seems it's not so ready, > won't they change that design in the future? It seems to be incomplete, yes. No idea if it's fixable, no arm64 expert ... >> - powerpc to filter out all LMBs that can be removed (assuming not all >> memory corresponds to LMBs that can be removed, otherwise we're in >> trouble ... :) ) >> - virtio-mem to filter out all memory it added. >> - hyper-v to filter out partially backed memory blocks (esp. the last >> memory block it added and only partially backed it by memory). >> >> This would make it work for kexec_file_load(), however, I do wonder how >> we would want to approach that from userspace kexec-tools when handling >> it from kexec_load(). > > Let's make kexec_file_load work firstly. Since this work is only first > step to make kexec-ed kernel not break memory hotplug. After kexec > rebooting, the KASLR may locate kernel into hotpluggable area too. Can you elaborate how that would work? >>> >>> Well, boot memory can be hotplugged or not after boot, they are marked >>> in uefi tables, the current kexec doesn't save and pass them into 2nd >>> kenrel, when kexec kernel bootup, it need read them and avoid them to >>> randomize kernel into. >> >> What about e.g., memory hotplugged by ACPI? I would assume, that the >> kexec kernel will not make use of that (IOW detected that) until the >> ACPI driver comes up and re-detects + adds that memory. >> >> Or how would that machinery work in case we have a DIMM hotplugged via ACPI? > > ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > pass the efi, it won't get the SRAT table correctly, if I remember > correctly. Yeah, I remeber kvm guest can get memory hotplugged with > ACPI only, this won't happen on bare metal though. Need check carefully. > I have been using kvm guest with uefi firmwire recently. Yeah, I can imagine that bare metal is different. kvm only uses ACPI. I'm also asking because of virtio-mem. Memory added via virtio-mem is not part of any efi tables or whatsoever. So I assume the kexec kernel will not detect it automatically (good!), instead load the virtio-mem driver and let it add memory back to the system. I should probably play with kexec and virtio-mem once I have some spare cycles ... to find out what's broken and needs to be addressed :) -- Thanks, David / dhildenb
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
On 04/16/20 at 04:09pm, David Hildenbrand wrote: > >>> Sounds doable to me, and not complicated. > >>> > images. It would apply to > > - arm64 and filter out all hotadded memory (IIRC, only boot memory can > be used). > >>> > >>> Do you mean hot added memory after boot can't be recognized and added > >>> into system RAM on arm64? > >> > >> See patch #3 of this patch set, which wants to avoid placing kexec > >> binaries on hotplugged memory. But I have no idea what the current plan > >> regarding arm64 is (this thread exploded :) ). > >> > >> I would assume that we don't want to place kexec images on any > >> hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > > > Yes, noticed that and James replied to DaveY. > > > > Later, when I was considering to make a draft patch to do the picking of > > memory from normal zone, and add a notifier, as we discussed at above, I > > suddenly realized that kexec_file_load doesn't have this issue. It > > traverse system RAM bottom up to get an available region to put > > kernel/initrd/boot_param, etc. I can't think of a system where its > > low memory could be unavailable. > > kexec_walk_memblock() has the option for "kbuf->top_down". Only > kexec_walk_resources() seems to ignore it. Yeah, that top down searching is done in a found low mem area. Means firstly search an available region bottom up, then put kernel top down in that region. The reason is our iomem res is linked with singly linked list. So we can only search bottom up efficiently. kexec_load is doing the real top down searching, so kernel will be put at the top of system ram. I ever tried to change it to support top down searching for kexec_file_load too with patches, since QE and customers are often confused with this difference when debugging. Andrew may remeber this, he suggested me to change the singly linked list to doubly linked list for iomem res, then do the top down searching for kexec_file_load. I tried with some effort, the change introduced too much code change, I just gave up finally. http://archive.lwn.net:8080/devicetree/20180718024944.577-1-...@redhat.com/ I can see that top down searching for kexec can avoid the highly used low memory region, esp under 4G, for dma, kinds of firmware reserving, etc. And customers/QE of kexec get used to it. I can change kexec_file_load to top down too with a simple way if people really complain it. But now, seems bottom up is not bad too. > > So I think in case of memblocks (e.g., arm64), this still applies? Yeah, aren't you trying to remove it? I haven't read your patches carefully, maybe I got it wrong. And arm64 even can't support the hot added memory being able to recorded into firmware, seems it's not so ready, won't they change that design in the future? > > >> > >>> > >>> > - powerpc to filter out all LMBs that can be removed (assuming not all > memory corresponds to LMBs that can be removed, otherwise we're in > trouble ... :) ) > - virtio-mem to filter out all memory it added. > - hyper-v to filter out partially backed memory blocks (esp. the last > memory block it added and only partially backed it by memory). > > This would make it work for kexec_file_load(), however, I do wonder how > we would want to approach that from userspace kexec-tools when handling > it from kexec_load(). > >>> > >>> Let's make kexec_file_load work firstly. Since this work is only first > >>> step to make kexec-ed kernel not break memory hotplug. After kexec > >>> rebooting, the KASLR may locate kernel into hotpluggable area too. > >> > >> Can you elaborate how that would work? > > > > Well, boot memory can be hotplugged or not after boot, they are marked > > in uefi tables, the current kexec doesn't save and pass them into 2nd > > kenrel, when kexec kernel bootup, it need read them and avoid them to > > randomize kernel into. > > What about e.g., memory hotplugged by ACPI? I would assume, that the > kexec kernel will not make use of that (IOW detected that) until the > ACPI driver comes up and re-detects + adds that memory. > > Or how would that machinery work in case we have a DIMM hotplugged via ACPI? ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't pass the efi, it won't get the SRAT table correctly, if I remember correctly. Yeah, I remeber kvm guest can get memory hotplugged with ACPI only, this won't happen on bare metal though. Need check carefully. I have been using kvm guest with uefi firmwire recently.
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
>>> Sounds doable to me, and not complicated. >>> images. It would apply to - arm64 and filter out all hotadded memory (IIRC, only boot memory can be used). >>> >>> Do you mean hot added memory after boot can't be recognized and added >>> into system RAM on arm64? >> >> See patch #3 of this patch set, which wants to avoid placing kexec >> binaries on hotplugged memory. But I have no idea what the current plan >> regarding arm64 is (this thread exploded :) ). >> >> I would assume that we don't want to place kexec images on any >> hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > Yes, noticed that and James replied to DaveY. > > Later, when I was considering to make a draft patch to do the picking of > memory from normal zone, and add a notifier, as we discussed at above, I > suddenly realized that kexec_file_load doesn't have this issue. It > traverse system RAM bottom up to get an available region to put > kernel/initrd/boot_param, etc. I can't think of a system where its > low memory could be unavailable. kexec_walk_memblock() has the option for "kbuf->top_down". Only kexec_walk_resources() seems to ignore it. So I think in case of memblocks (e.g., arm64), this still applies? >> >>> >>> - powerpc to filter out all LMBs that can be removed (assuming not all memory corresponds to LMBs that can be removed, otherwise we're in trouble ... :) ) - virtio-mem to filter out all memory it added. - hyper-v to filter out partially backed memory blocks (esp. the last memory block it added and only partially backed it by memory). This would make it work for kexec_file_load(), however, I do wonder how we would want to approach that from userspace kexec-tools when handling it from kexec_load(). >>> >>> Let's make kexec_file_load work firstly. Since this work is only first >>> step to make kexec-ed kernel not break memory hotplug. After kexec >>> rebooting, the KASLR may locate kernel into hotpluggable area too. >> >> Can you elaborate how that would work? > > Well, boot memory can be hotplugged or not after boot, they are marked > in uefi tables, the current kexec doesn't save and pass them into 2nd > kenrel, when kexec kernel bootup, it need read them and avoid them to > randomize kernel into. What about e.g., memory hotplugged by ACPI? I would assume, that the kexec kernel will not make use of that (IOW detected that) until the ACPI driver comes up and re-detects + adds that memory. Or how would that machinery work in case we have a DIMM hotplugged via ACPI? -- Thanks, David / dhildenb
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
On 04/16/20 at 03:31pm, David Hildenbrand wrote: > > Not sure if I get the notifier idea clearly. If you mean > > > > 1) Add a common function to pick memory in unmovable zone; > > Not strictly required IMHO. But, minor detail. > > > 2) Let DLPAR, balloon register with notifier; > > Yeah, or virtio-mem, or any other technology that adds/removes memory > dynamically. > > > 3) In the common function, ask notified part to check if the picked > >unmovable memory is available for locating kexec kernel; > > Yeah. These may not be needed, please see below comment. > > > > > Sounds doable to me, and not complicated. > > > >> images. It would apply to > >> > >> - arm64 and filter out all hotadded memory (IIRC, only boot memory can > >> be used). > > > > Do you mean hot added memory after boot can't be recognized and added > > into system RAM on arm64? > > See patch #3 of this patch set, which wants to avoid placing kexec > binaries on hotplugged memory. But I have no idea what the current plan > regarding arm64 is (this thread exploded :) ). > > I would assume that we don't want to place kexec images on any > hotplugged (or rather: hot(un)pluggable) memory - on any architecture. Yes, noticed that and James replied to DaveY. Later, when I was considering to make a draft patch to do the picking of memory from normal zone, and add a notifier, as we discussed at above, I suddenly realized that kexec_file_load doesn't have this issue. It traverse system RAM bottom up to get an available region to put kernel/initrd/boot_param, etc. I can't think of a system where its low memory could be unavailable. > > > > > > >> - powerpc to filter out all LMBs that can be removed (assuming not all > >> memory corresponds to LMBs that can be removed, otherwise we're in > >> trouble ... :) ) > >> - virtio-mem to filter out all memory it added. > >> - hyper-v to filter out partially backed memory blocks (esp. the last > >> memory block it added and only partially backed it by memory). > >> > >> This would make it work for kexec_file_load(), however, I do wonder how > >> we would want to approach that from userspace kexec-tools when handling > >> it from kexec_load(). > > > > Let's make kexec_file_load work firstly. Since this work is only first > > step to make kexec-ed kernel not break memory hotplug. After kexec > > rebooting, the KASLR may locate kernel into hotpluggable area too. > > Can you elaborate how that would work? Well, boot memory can be hotplugged or not after boot, they are marked in uefi tables, the current kexec doesn't save and pass them into 2nd kenrel, when kexec kernel bootup, it need read them and avoid them to randomize kernel into.
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
> Not sure if I get the notifier idea clearly. If you mean > > 1) Add a common function to pick memory in unmovable zone; Not strictly required IMHO. But, minor detail. > 2) Let DLPAR, balloon register with notifier; Yeah, or virtio-mem, or any other technology that adds/removes memory dynamically. > 3) In the common function, ask notified part to check if the picked >unmovable memory is available for locating kexec kernel; Yeah. > > Sounds doable to me, and not complicated. > >> images. It would apply to >> >> - arm64 and filter out all hotadded memory (IIRC, only boot memory can >> be used). > > Do you mean hot added memory after boot can't be recognized and added > into system RAM on arm64? See patch #3 of this patch set, which wants to avoid placing kexec binaries on hotplugged memory. But I have no idea what the current plan regarding arm64 is (this thread exploded :) ). I would assume that we don't want to place kexec images on any hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > >> - powerpc to filter out all LMBs that can be removed (assuming not all >> memory corresponds to LMBs that can be removed, otherwise we're in >> trouble ... :) ) >> - virtio-mem to filter out all memory it added. >> - hyper-v to filter out partially backed memory blocks (esp. the last >> memory block it added and only partially backed it by memory). >> >> This would make it work for kexec_file_load(), however, I do wonder how >> we would want to approach that from userspace kexec-tools when handling >> it from kexec_load(). > > Let's make kexec_file_load work firstly. Since this work is only first > step to make kexec-ed kernel not break memory hotplug. After kexec > rebooting, the KASLR may locate kernel into hotpluggable area too. Can you elaborate how that would work? -- Thanks, David / dhildenb
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
On 04/14/20 at 04:49pm, David Hildenbrand wrote: > > The root cause is kexec-ed kernel is targeted at hotpluggable memory > > region. Just avoiding the movable area can fix it. In kexec_file_load(), > > just checking or picking those unmovable region to put kernel/initrd in > > function locate_mem_hole_callback() can fix it. The page or pageblock's > > zone is movable or not, it's easy to know. This fix doesn't need to > > bother other component. > > I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > does not imply that it cannot get offlined and removed e.g., this is > heavily used on ppc64, with 16MB sections. > >>> > >>> Really? I just know there are two kinds of mem hoplug in ppc, but don't > >>> know the details. So in this case, is there any flag or a way to know > >>> those memory block are hotpluggable? I am curious how those kernel data > >>> is avoided to be put in this area. Or ppc just freely uses it for kernel > >>> data or user space data, then try to migrate when hot remove? > >> > >> See > >> arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() > >> > >> Under DLAPR, it can remove memory in LMB granularity, which is usually > >> 16MB (== single section on ppc64). DLPAR will directly online all > >> hotplugged memory (LMBs) from the kernel using device_online(), which > >> will go to ZONE_NORMAL. > >> > >> When trying to remove memory, it simply scans for offlineable 16MB > >> memory blocks (==section == LMB), offlines and removes them. No need for > >> the movable zone and all the involved issues. > > > > Yes, this is a different one, thanks for pointing it out. It sounds like > > balloon driver in virt platform, doesn't it? > > With DLPAR there is a hypervisor involved (which manages the actual HW > DIMMs), so yes. > > > > > Avoiding to put kexec kernel into movable zone can't solve this DLPAR > > case as you said. > > > >> > >> Now, the interesting question is, can we have LMBs added during boot > >> (not via add_memory()), that will later be removed via remove_memory(). > >> IIRC, we had BUGs related to that, so I think yes. If a section contains > >> no unmovable allocations (after boot), it can get removed. > > > > I do want to ask this question. If we can add LMB into system RAM, then > > reload kexec can solve it. > > > > Another better way is adding a common function to filter out the > > movable zone when search position for kexec kernel, use a arch specific > > funciton to filter out DLPAR memory blocks for ppc only. Over there, > > we can simply use for_each_drmem_lmb() to do that. > > I was thinking about something similar. Maybe something like a notifier > that can be used to test if selected memory can be used for kexec Not sure if I get the notifier idea clearly. If you mean 1) Add a common function to pick memory in unmovable zone; 2) Let DLPAR, balloon register with notifier; 3) In the common function, ask notified part to check if the picked unmovable memory is available for locating kexec kernel; Sounds doable to me, and not complicated. > images. It would apply to > > - arm64 and filter out all hotadded memory (IIRC, only boot memory can > be used). Do you mean hot added memory after boot can't be recognized and added into system RAM on arm64? > - powerpc to filter out all LMBs that can be removed (assuming not all > memory corresponds to LMBs that can be removed, otherwise we're in > trouble ... :) ) > - virtio-mem to filter out all memory it added. > - hyper-v to filter out partially backed memory blocks (esp. the last > memory block it added and only partially backed it by memory). > > This would make it work for kexec_file_load(), however, I do wonder how > we would want to approach that from userspace kexec-tools when handling > it from kexec_load(). Let's make kexec_file_load work firstly. Since this work is only first step to make kexec-ed kernel not break memory hotplug. After kexec rebooting, the KASLR may locate kernel into hotpluggable area too.
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
On 14.04.20 16:39, Baoquan He wrote: > On 04/14/20 at 11:37am, David Hildenbrand wrote: >> On 14.04.20 11:22, Baoquan He wrote: >>> On 04/14/20 at 10:00am, David Hildenbrand wrote: On 14.04.20 08:40, Baoquan He wrote: > On 04/13/20 at 08:15am, Eric W. Biederman wrote: >> Baoquan He writes: >> >>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: The only benefit of kexec_file_load is that it is simple enough from a kernel perspective that signatures can be checked. >>> >>> We don't have this restriction any more with below commit: >>> >>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG >>> and KEXEC_SIG_FORCE") >>> >>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both >>> secure boot or legacy system for kexec/kdump. Being simple enough is >>> enough to astract and convince us to use it instead. And kexec_file_load >>> has been in use for several years on systems with secure boot, since >>> added in 2014, on x86_64. >> >> No. Actaully kexec_file_load is the less capable interface, and less >> flexible interface. Which is why it is appropriate for signature >> verification. > > Well, everyone has a stance and the corresponding view. You could have > wider view from long time maintenance and in upstrem position, and think > kexec_file_load is horrible. But I can only see from our work as a front > line engineer to maintain/develop kexec/kdump in RHEL, and think > kexec_file_load is easier to maintain. > > Surely except of multiple kernel image format support. No matter it is > kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > This is produced from kerel building by default. We have no way to > support it in our distros and add it into kexec_file_load. > > [RFC PATCH] x86/boot: make ELF kernel multiboot-able > https://lkml.org/lkml/2017/2/15/654 > >> kexec_load in every other respect is the more capable and functional interface. It makes no sense to get rid of it. It does make sense to reload with a loaded kernel on memory hotplug. That is simple and easy. If we are going to handle something in the kernel it should simple an automated unloading of the kernel on memory hotplug. I think it would be irresponsible to deprecate kexec_load on any platform. I also suspect that kexec_file_load could be taught to copy the dtb on arm32 if someone wants to deal with signatures. We definitely can not even think of deprecating kexec_load until architecture that supports it also supports kexec_file_load and everyone is happy with that interface. That is Linus's no regression rule. >>> >>> I should pick a milder word to express our tendency and tell our plan >>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help >>> much. I didn't mean to say 'deprecate' at all when replied. >>> >>> The situation and trend I understand about kexec_load and >>> kexec_file_load >>> are: >>> >>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't >>> have yet, just as x86_64, arm64 and s390 have done; >>> >>> 2) kexec_file_load is suggested to use, and take precedence over >>> kexec_load in the future, if both are supported in one ARCH. >> >> The deep problem is that kexec_file_load is distinctly less expressive >> than kexec_load. >> >>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, >>> and by ARCHes for back compatibility w/ kexec_file_load support. >>> >>> For 1) and 2), I think the reason is obvious as Eric said, >>> kexec_file_load is simple enough. And currently, whenever we got a bug >>> report, we may need fix them twice, for kexec_load and kexec_file_load. >>> If kexec_file_load is made by default, e.g on x86_64, we will change it >>> in kernel space only, for kexec_file_load. This is what I meant about >>> 'obsolete gradually'. I think for arm64, s390, they will do these too. >>> Unless there's some critical/blocker bug in kexec_load, to corrupt the >>> old kexec_load interface in old product. >> >> Maybe. The code that kexec_file_load sucked into the kernel is quite >> stable and rarely needs changes except during a port of kexec to >> another architecture. >> >> Last I looked the real maintenance effor of kexec and kexec on panic was >> in the drivers. So I don't think we can use maintenance to do anything. > > Not sure if I got it. But if check Lianbo's patches, a lot of effort has > been taken to make SEV work well on kexec_file_load. And we have > switched to use kexec_file_load in the newly
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
On 04/14/20 at 11:37am, David Hildenbrand wrote: > On 14.04.20 11:22, Baoquan He wrote: > > On 04/14/20 at 10:00am, David Hildenbrand wrote: > >> On 14.04.20 08:40, Baoquan He wrote: > >>> On 04/13/20 at 08:15am, Eric W. Biederman wrote: > Baoquan He writes: > > > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >> > >> The only benefit of kexec_file_load is that it is simple enough from a > >> kernel perspective that signatures can be checked. > > > > We don't have this restriction any more with below commit: > > > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > > and KEXEC_SIG_FORCE") > > > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > > secure boot or legacy system for kexec/kdump. Being simple enough is > > enough to astract and convince us to use it instead. And kexec_file_load > > has been in use for several years on systems with secure boot, since > > added in 2014, on x86_64. > > No. Actaully kexec_file_load is the less capable interface, and less > flexible interface. Which is why it is appropriate for signature > verification. > >>> > >>> Well, everyone has a stance and the corresponding view. You could have > >>> wider view from long time maintenance and in upstrem position, and think > >>> kexec_file_load is horrible. But I can only see from our work as a front > >>> line engineer to maintain/develop kexec/kdump in RHEL, and think > >>> kexec_file_load is easier to maintain. > >>> > >>> Surely except of multiple kernel image format support. No matter it is > >>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > >>> This is produced from kerel building by default. We have no way to > >>> support it in our distros and add it into kexec_file_load. > >>> > >>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able > >>> https://lkml.org/lkml/2017/2/15/654 > >>> > > >> kexec_load in every other respect is the more capable and functional > >> interface. It makes no sense to get rid of it. > >> > >> It does make sense to reload with a loaded kernel on memory hotplug. > >> That is simple and easy. If we are going to handle something in the > >> kernel it should simple an automated unloading of the kernel on memory > >> hotplug. > >> > >> > >> I think it would be irresponsible to deprecate kexec_load on any > >> platform. > >> > >> I also suspect that kexec_file_load could be taught to copy the dtb > >> on arm32 if someone wants to deal with signatures. > >> > >> We definitely can not even think of deprecating kexec_load until > >> architecture that supports it also supports kexec_file_load and > >> everyone > >> is happy with that interface. That is Linus's no regression rule. > > > > I should pick a milder word to express our tendency and tell our plan > > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > > much. I didn't mean to say 'deprecate' at all when replied. > > > > The situation and trend I understand about kexec_load and > > kexec_file_load > > are: > > > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > > have yet, just as x86_64, arm64 and s390 have done; > > > > 2) kexec_file_load is suggested to use, and take precedence over > > kexec_load in the future, if both are supported in one ARCH. > > The deep problem is that kexec_file_load is distinctly less expressive > than kexec_load. > > > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > > and by ARCHes for back compatibility w/ kexec_file_load support. > > > > For 1) and 2), I think the reason is obvious as Eric said, > > kexec_file_load is simple enough. And currently, whenever we got a bug > > report, we may need fix them twice, for kexec_load and kexec_file_load. > > If kexec_file_load is made by default, e.g on x86_64, we will change it > > in kernel space only, for kexec_file_load. This is what I meant about > > 'obsolete gradually'. I think for arm64, s390, they will do these too. > > Unless there's some critical/blocker bug in kexec_load, to corrupt the > > old kexec_load interface in old product. > > Maybe. The code that kexec_file_load sucked into the kernel is quite > stable and rarely needs changes except during a port of kexec to > another architecture. > > Last I looked the real maintenance effor of kexec and kexec on panic was > in the drivers. So I don't think we can use maintenance to do anything. > >>> > >>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has > >>> been taken to make SEV work well on kexec_file_load. And we have > >>> switched to use kexec_file_load in the newly published Fedora release > >>> on x86_6
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
On 14.04.20 11:22, Baoquan He wrote: > On 04/14/20 at 10:00am, David Hildenbrand wrote: >> On 14.04.20 08:40, Baoquan He wrote: >>> On 04/13/20 at 08:15am, Eric W. Biederman wrote: Baoquan He writes: > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >> >> The only benefit of kexec_file_load is that it is simple enough from a >> kernel perspective that signatures can be checked. > > We don't have this restriction any more with below commit: > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > and KEXEC_SIG_FORCE") > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > secure boot or legacy system for kexec/kdump. Being simple enough is > enough to astract and convince us to use it instead. And kexec_file_load > has been in use for several years on systems with secure boot, since > added in 2014, on x86_64. No. Actaully kexec_file_load is the less capable interface, and less flexible interface. Which is why it is appropriate for signature verification. >>> >>> Well, everyone has a stance and the corresponding view. You could have >>> wider view from long time maintenance and in upstrem position, and think >>> kexec_file_load is horrible. But I can only see from our work as a front >>> line engineer to maintain/develop kexec/kdump in RHEL, and think >>> kexec_file_load is easier to maintain. >>> >>> Surely except of multiple kernel image format support. No matter it is >>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. >>> This is produced from kerel building by default. We have no way to >>> support it in our distros and add it into kexec_file_load. >>> >>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able >>> https://lkml.org/lkml/2017/2/15/654 >>> >> kexec_load in every other respect is the more capable and functional >> interface. It makes no sense to get rid of it. >> >> It does make sense to reload with a loaded kernel on memory hotplug. >> That is simple and easy. If we are going to handle something in the >> kernel it should simple an automated unloading of the kernel on memory >> hotplug. >> >> >> I think it would be irresponsible to deprecate kexec_load on any >> platform. >> >> I also suspect that kexec_file_load could be taught to copy the dtb >> on arm32 if someone wants to deal with signatures. >> >> We definitely can not even think of deprecating kexec_load until >> architecture that supports it also supports kexec_file_load and everyone >> is happy with that interface. That is Linus's no regression rule. > > I should pick a milder word to express our tendency and tell our plan > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > much. I didn't mean to say 'deprecate' at all when replied. > > The situation and trend I understand about kexec_load and kexec_file_load > are: > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > have yet, just as x86_64, arm64 and s390 have done; > > 2) kexec_file_load is suggested to use, and take precedence over > kexec_load in the future, if both are supported in one ARCH. The deep problem is that kexec_file_load is distinctly less expressive than kexec_load. > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > and by ARCHes for back compatibility w/ kexec_file_load support. > > For 1) and 2), I think the reason is obvious as Eric said, > kexec_file_load is simple enough. And currently, whenever we got a bug > report, we may need fix them twice, for kexec_load and kexec_file_load. > If kexec_file_load is made by default, e.g on x86_64, we will change it > in kernel space only, for kexec_file_load. This is what I meant about > 'obsolete gradually'. I think for arm64, s390, they will do these too. > Unless there's some critical/blocker bug in kexec_load, to corrupt the > old kexec_load interface in old product. Maybe. The code that kexec_file_load sucked into the kernel is quite stable and rarely needs changes except during a port of kexec to another architecture. Last I looked the real maintenance effor of kexec and kexec on panic was in the drivers. So I don't think we can use maintenance to do anything. >>> >>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has >>> been taken to make SEV work well on kexec_file_load. And we have >>> switched to use kexec_file_load in the newly published Fedora release >>> on x86_64 by default. Before this, Lianbo has investigated and done many >>> experiments to make sure the switching is safe. We finally made this >>> decision. Next we will do the switch in Enterprise distros. Once these >>> are proved safe, we will suggest customers to use
Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
On 04/14/20 at 10:00am, David Hildenbrand wrote: > On 14.04.20 08:40, Baoquan He wrote: > > On 04/13/20 at 08:15am, Eric W. Biederman wrote: > >> Baoquan He writes: > >> > >>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > > The only benefit of kexec_file_load is that it is simple enough from a > kernel perspective that signatures can be checked. > >>> > >>> We don't have this restriction any more with below commit: > >>> > >>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > >>> and KEXEC_SIG_FORCE") > >>> > >>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > >>> secure boot or legacy system for kexec/kdump. Being simple enough is > >>> enough to astract and convince us to use it instead. And kexec_file_load > >>> has been in use for several years on systems with secure boot, since > >>> added in 2014, on x86_64. > >> > >> No. Actaully kexec_file_load is the less capable interface, and less > >> flexible interface. Which is why it is appropriate for signature > >> verification. > > > > Well, everyone has a stance and the corresponding view. You could have > > wider view from long time maintenance and in upstrem position, and think > > kexec_file_load is horrible. But I can only see from our work as a front > > line engineer to maintain/develop kexec/kdump in RHEL, and think > > kexec_file_load is easier to maintain. > > > > Surely except of multiple kernel image format support. No matter it is > > kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > > This is produced from kerel building by default. We have no way to > > support it in our distros and add it into kexec_file_load. > > > > [RFC PATCH] x86/boot: make ELF kernel multiboot-able > > https://lkml.org/lkml/2017/2/15/654 > > > >> > kexec_load in every other respect is the more capable and functional > interface. It makes no sense to get rid of it. > > It does make sense to reload with a loaded kernel on memory hotplug. > That is simple and easy. If we are going to handle something in the > kernel it should simple an automated unloading of the kernel on memory > hotplug. > > > I think it would be irresponsible to deprecate kexec_load on any > platform. > > I also suspect that kexec_file_load could be taught to copy the dtb > on arm32 if someone wants to deal with signatures. > > We definitely can not even think of deprecating kexec_load until > architecture that supports it also supports kexec_file_load and everyone > is happy with that interface. That is Linus's no regression rule. > >>> > >>> I should pick a milder word to express our tendency and tell our plan > >>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help > >>> much. I didn't mean to say 'deprecate' at all when replied. > >>> > >>> The situation and trend I understand about kexec_load and kexec_file_load > >>> are: > >>> > >>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > >>> have yet, just as x86_64, arm64 and s390 have done; > >>> > >>> 2) kexec_file_load is suggested to use, and take precedence over > >>> kexec_load in the future, if both are supported in one ARCH. > >> > >> The deep problem is that kexec_file_load is distinctly less expressive > >> than kexec_load. > >> > >>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > >>> and by ARCHes for back compatibility w/ kexec_file_load support. > >>> > >>> For 1) and 2), I think the reason is obvious as Eric said, > >>> kexec_file_load is simple enough. And currently, whenever we got a bug > >>> report, we may need fix them twice, for kexec_load and kexec_file_load. > >>> If kexec_file_load is made by default, e.g on x86_64, we will change it > >>> in kernel space only, for kexec_file_load. This is what I meant about > >>> 'obsolete gradually'. I think for arm64, s390, they will do these too. > >>> Unless there's some critical/blocker bug in kexec_load, to corrupt the > >>> old kexec_load interface in old product. > >> > >> Maybe. The code that kexec_file_load sucked into the kernel is quite > >> stable and rarely needs changes except during a port of kexec to > >> another architecture. > >> > >> Last I looked the real maintenance effor of kexec and kexec on panic was > >> in the drivers. So I don't think we can use maintenance to do anything. > > > > Not sure if I got it. But if check Lianbo's patches, a lot of effort has > > been taken to make SEV work well on kexec_file_load. And we have > > switched to use kexec_file_load in the newly published Fedora release > > on x86_64 by default. Before this, Lianbo has investigated and done many > > experiments to make sure the switching is safe. We finally made this > > decision. Next we will do the switch in Enterprise distros. Once these > > are proved safe, we will suggest customers to use kexec_file_load for > > kexec rebo