Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
From: Mike Rapoport (IBM) On Thu, 13 Jun 2024 11:55:06 -0400, Steven Rostedt wrote: > Reserve unspecified location of physical memory from kernel command line > > Background: > > In ChromeOS, we have 1 MB of pstore ramoops reserved so that we can extract > dmesg output and some other information when a crash happens in the field. > (This is only done when the user selects "Allow Google to collect data for > improving the system"). But there are cases when there's a bug that > requires more data to be retrieved to figure out what is happening. We would > like to increase the pstore size, either temporarily, or maybe even > permanently. The pstore on these devices are at a fixed location in RAM (as > the RAM is not cleared on soft reboots nor crashes). The location is chosen > by the BIOS (coreboot) and passed to the kernel via ACPI tables on x86. > There's a driver that queries for this to initialize the pstore for > ChromeOS: > > [...] Applied to for-next branch of memblock.git tree, thanks! [0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up commit: 1e4c64b71c9bf230b25fde12cbcceacfdc8b3332 [1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up commit: 1e4c64b71c9bf230b25fde12cbcceacfdc8b3332 [2/2] pstore/ramoops: Add ramoops.mem_name= command line option commit: d9d814eebb1ae9742e7fd7f39730653b16326bd4 tree: https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock branch: for-next -- Sincerely yours, Mike.
Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
On Tue, Jun 18, 2024 at 12:21:19PM +0200, Ard Biesheuvel wrote: > On Mon, 17 Jun 2024 at 23:01, Alexander Graf wrote: > > So I would personally prefer for this code not to go in at all. But if > it does go in (and Steven has already agreed to this), it needs a > giant disclaimer that it is best effort and may get broken > inadvertently by changes that may seem unrelated. I think that reserve_mem= is way less intrusive than memmap= we anyway have. It will reserve memory and the documentation adequately warns that the location of that memory might be at different physical addresses. -- Sincerely yours, Mike.
Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
On Tue, 18 Jun 2024 13:47:49 +0200 Alexander Graf wrote: > IMHO the big fat disclaimer should be in the argument name. > "reserve_mem" to me sounds like it actually guarantees a reservation - > which it doesn't. Can we name it more along the lines of "debug" (to > indicate it's not for production data) or "phoenix" (usually gets reborn > out of ashes, but you can never know for sure): "debug_mem", / > "phoenix_mem"? I don't see any reason it will not reserve memory. That doesn't seem to be the issue. What is not guaranteed is that it will be in the same location as last time. So the name is correct. It's not "reserve_consistent_memory" ;-) -- Steve
Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
On 18.06.24 12:21, Ard Biesheuvel wrote: On Mon, 17 Jun 2024 at 23:01, Alexander Graf wrote: On 17.06.24 22:40, Steven Rostedt wrote: On Mon, 17 Jun 2024 09:07:29 +0200 Alexander Graf wrote: Hey Steve, I believe we're talking about 2 different things :). Let me rephrase a bit and make a concrete example. Imagine you have passed the "reserve_mem=12M:4096:trace" kernel command line option. The kernel now comes up and allocates a random chunk of memory that - by (admittedly good) chance - may be at the same physical location as before. Let's assume it deemed 0x100 as a good offset. Note, it's not random. It picks from the top of available memory every time. But things can mess with it (see below). Let's now assume you're running on a UEFI system. There, you always have non-volatile storage available to you even in the pre-boot phase. That means the kernel could create a UEFI variable that says "12M:4096:trace -> 0x100". The pre-boot phase takes all these UEFI variables and marks them as reserved. When you finally reach your command line parsing logic for reserve_mem=, you can flip all reservations that were not on the command line back to normal memory. That way you have pretty much guaranteed persistent memory regions, even with KASLR changing your memory layout across boots. The nice thing is that the above is an extension of what you've already built: Systems with UEFI simply get better guarantees that their regions persist. This could be an added feature, but it is very architecture specific, and would likely need architecture specific updates. It definitely would be an added feature, yes. But one that allows you to ensure persistence a lot more safely :). Requirement: Need a way to reserve memory that will be at a consistent location for every boot, if the kernel and system are the same. Does not need to work if rebooting to a different kernel, or if the system can change the memory layout between boots. The reserved memory can not be an hard coded address, as the same kernel / command line needs to run on several different machines. The picked memory reservation just needs to be the same for a given machine, but may be With KASLR is enabled, doesn't this approach break too often to be reliable enough for the data you want to extract? Picking up the idea above, with a persistent variable we could even make KASLR avoid that reserved pstore region in its search for a viable KASLR offset. I think I was hit by it once in all my testing. For our use case, the few times it fails to map is not going to affect what we need this for at all. Once is pretty good. Do you know why? Also once out of how many runs? Is the randomness source not as random as it should be or are the number of bits for KASLR maybe so few on your target architecture that the odds of hitting anything become low? Do these same constraints hold true outside of your testing environment? So I just ran it a hundred times in a loop. I added a patch to print the location of "_text". The loop was this: for i in `seq 100`; do ssh root@debiantesting-x86-64 "dmesg | grep -e 'text starts' -e 'mapped boot' >> text; grub-reboot '1>0'; sleep 0.5; reboot" sleep 25 done It searches dmesg for my added printk as well as the print of were the ring buffer was loaded in physical memory. It takes about 15 seconds to reboot, so I waited 25. The results are attached. I found that it was consistent 76 times, which means 1 out of 4 it's not. Funny enough, it broke whenever it loaded the kernel below 0x1. And then it would be off by a little. It was consistently at: 0x27d00 And when it failed, it was at 0x27ce0. Note, when I used the e820 tables to do this, I never saw a failure. My assumption is that when it is below 0x1, something else gets allocated causing this to get pushed down. Thinking about it again: What if you run the allocation super early (see arch/x86/boot/compressed/kaslr.c:handle_mem_options())? That code is not used by EFI boot anymore. In general, I would recommend (and have recommended) against these kinds of hacks in mainline, because -especially on x86- there is always someone that turns up with some kind of convoluted use case that gets broken if we try to change anything in the boot code. I spent considerable time over the past year making the EFI/x86 boot code compatible with the new MS imposed requirements on PC boot firmware (related to secure boot and NX restrictions on memory mappings). This involved some radical refactoring of the boot sequence, including the KASLR logic. Adding fragile code there that will result in regressions observable to end users when it gets broken is really not what I'd like to see. So I would personally prefer for this code not to go in at all. But if it does go in (and Steven has already agreed to this), it needs a giant disclaimer that it is best effort and may get broken inadvertently by changes that may
Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
On Mon, 17 Jun 2024 at 23:01, Alexander Graf wrote: > > On 17.06.24 22:40, Steven Rostedt wrote: > > On Mon, 17 Jun 2024 09:07:29 +0200 > > Alexander Graf wrote: > > > >> Hey Steve, > >> > >> > >> I believe we're talking about 2 different things :). Let me rephrase a > >> bit and make a concrete example. > >> > >> Imagine you have passed the "reserve_mem=12M:4096:trace" kernel command > >> line option. The kernel now comes up and allocates a random chunk of > >> memory that - by (admittedly good) chance - may be at the same physical > >> location as before. Let's assume it deemed 0x100 as a good offset. > > Note, it's not random. It picks from the top of available memory every > > time. But things can mess with it (see below). > > > >> Let's now assume you're running on a UEFI system. There, you always have > >> non-volatile storage available to you even in the pre-boot phase. That > >> means the kernel could create a UEFI variable that says "12M:4096:trace > >> -> 0x100". The pre-boot phase takes all these UEFI variables and > >> marks them as reserved. When you finally reach your command line parsing > >> logic for reserve_mem=, you can flip all reservations that were not on > >> the command line back to normal memory. > >> > >> That way you have pretty much guaranteed persistent memory regions, even > >> with KASLR changing your memory layout across boots. > >> > >> The nice thing is that the above is an extension of what you've already > >> built: Systems with UEFI simply get better guarantees that their regions > >> persist. > > This could be an added feature, but it is very architecture specific, > > and would likely need architecture specific updates. > > > It definitely would be an added feature, yes. But one that allows you to > ensure persistence a lot more safely :). > > > > Requirement: > > > > Need a way to reserve memory that will be at a consistent location for > > every boot, if the kernel and system are the same. Does not need to work > > if rebooting to a different kernel, or if the system can change the > > memory layout between boots. > > > > The reserved memory can not be an hard coded address, as the same > > kernel / > > command line needs to run on several different machines. The picked > > memory > > reservation just needs to be the same for a given machine, but may be > With KASLR is enabled, doesn't this approach break too often to be > reliable enough for the data you want to extract? > > Picking up the idea above, with a persistent variable we could even make > KASLR avoid that reserved pstore region in its search for a viable KASLR > offset. > >>> I think I was hit by it once in all my testing. For our use case, the > >>> few times it fails to map is not going to affect what we need this for > >>> at all. > >> Once is pretty good. Do you know why? Also once out of how many runs? Is > >> the randomness source not as random as it should be or are the number of > >> bits for KASLR maybe so few on your target architecture that the odds of > >> hitting anything become low? Do these same constraints hold true outside > >> of your testing environment? > > So I just ran it a hundred times in a loop. I added a patch to print > > the location of "_text". The loop was this: > > > >for i in `seq 100`; do > > ssh root@debiantesting-x86-64 "dmesg | grep -e 'text starts' -e > > 'mapped boot' >> text; grub-reboot '1>0'; sleep 0.5; reboot" > > sleep 25 > >done > > > > It searches dmesg for my added printk as well as the print of were the > > ring buffer was loaded in physical memory. > > > > It takes about 15 seconds to reboot, so I waited 25. The results are > > attached. I found that it was consistent 76 times, which means 1 out of > > 4 it's not. Funny enough, it broke whenever it loaded the kernel below > > 0x1. And then it would be off by a little. > > > > It was consistently at: > > > >0x27d00 > > > > And when it failed, it was at 0x27ce0. > > > > Note, when I used the e820 tables to do this, I never saw a failure. My > > assumption is that when it is below 0x1, something else gets > > allocated causing this to get pushed down. > > > Thinking about it again: What if you run the allocation super early (see > arch/x86/boot/compressed/kaslr.c:handle_mem_options())? That code is not used by EFI boot anymore. In general, I would recommend (and have recommended) against these kinds of hacks in mainline, because -especially on x86- there is always someone that turns up with some kind of convoluted use case that gets broken if we try to change anything in the boot code. I spent considerable time over the past year making the EFI/x86 boot code compatible with the new MS imposed requirements on PC boot firmware (related to secure boot and NX restrictions on memory mappings). This involved some radical refactoring of the boot sequence, including the
Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
On Mon, 17 Jun 2024 23:01:12 +0200 Alexander Graf wrote: > > This could be an added feature, but it is very architecture specific, > > and would likely need architecture specific updates. > > > It definitely would be an added feature, yes. But one that allows you to > ensure persistence a lot more safely :). Sure. > > Thinking about it again: What if you run the allocation super early (see > arch/x86/boot/compressed/kaslr.c:handle_mem_options())? If you stick to > allocating only from top, you're effectively kernel version independent > for your allocations because none of the kernel code ran yet and > definitely KASLR independent because you're running deterministically > before KASLR even gets allocated. > > > As this code relies on memblock_phys_alloc() being consistent, if > > something gets allocated before it differently depending on where the > > kernel is, it can also move the location. A plugin to UEFI would mean > > that it would need to reserve the memory, and the code here will need > > to know where it is. We could always make the function reserve_mem() > > global and weak so that architectures can override it. > > > Yes, the in-kernel UEFI loader (efi-stub) could simply populate a new > type of memblock with the respective reservations and you later call > memblock_find_in_range_node() instead of memblock_phys_alloc() to pass > in flags that you want to allocate only from the new > MEMBLOCK_RESERVE_MEM type. The same model would work for BIOS boots > through the handle_mem_options() path above. In fact, if the BIOS way > works fine, we don't even need UEFI variables: The same way allocations > will be identical during BIOS execution, they should stay identical > across UEFI launches. > > As cherry on top, kexec also works seamlessly with the special memblock > approach because kexec (at least on x86) hands memblocks as is to the > next kernel. So the new kernel will also automatically use the same > ranges for its allocations. I'm all for expanding this. But I would just want to get this in for now as is. It theoretically works on all architectures. If someone wants to make in more robust and accurate on a specific architecture, I'm all for it. Like I said, we could make the reserver_mem() function global and weak, and then if an architecture has a better way to handle this, it could use that. Hmm, x86 could do this with the e820 code like I did in my first versions. Like I said, it didn't fail at all with that. And we can have an UEFI version as well. -- Steve
Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
[resend because Thunderbird decided to send the previous version as HTML :(] On 17.06.24 22:40, Steven Rostedt wrote: On Mon, 17 Jun 2024 09:07:29 +0200 Alexander Graf wrote: Hey Steve, I believe we're talking about 2 different things :). Let me rephrase a bit and make a concrete example. Imagine you have passed the "reserve_mem=12M:4096:trace" kernel command line option. The kernel now comes up and allocates a random chunk of memory that - by (admittedly good) chance - may be at the same physical location as before. Let's assume it deemed 0x100 as a good offset. Note, it's not random. It picks from the top of available memory every time. But things can mess with it (see below). Let's now assume you're running on a UEFI system. There, you always have non-volatile storage available to you even in the pre-boot phase. That means the kernel could create a UEFI variable that says "12M:4096:trace -> 0x100". The pre-boot phase takes all these UEFI variables and marks them as reserved. When you finally reach your command line parsing logic for reserve_mem=, you can flip all reservations that were not on the command line back to normal memory. That way you have pretty much guaranteed persistent memory regions, even with KASLR changing your memory layout across boots. The nice thing is that the above is an extension of what you've already built: Systems with UEFI simply get better guarantees that their regions persist. This could be an added feature, but it is very architecture specific, and would likely need architecture specific updates. It definitely would be an added feature, yes. But one that allows you to ensure persistence a lot more safely :). Requirement: Need a way to reserve memory that will be at a consistent location for every boot, if the kernel and system are the same. Does not need to work if rebooting to a different kernel, or if the system can change the memory layout between boots. The reserved memory can not be an hard coded address, as the same kernel / command line needs to run on several different machines. The picked memory reservation just needs to be the same for a given machine, but may be With KASLR is enabled, doesn't this approach break too often to be reliable enough for the data you want to extract? Picking up the idea above, with a persistent variable we could even make KASLR avoid that reserved pstore region in its search for a viable KASLR offset. I think I was hit by it once in all my testing. For our use case, the few times it fails to map is not going to affect what we need this for at all. Once is pretty good. Do you know why? Also once out of how many runs? Is the randomness source not as random as it should be or are the number of bits for KASLR maybe so few on your target architecture that the odds of hitting anything become low? Do these same constraints hold true outside of your testing environment? So I just ran it a hundred times in a loop. I added a patch to print the location of "_text". The loop was this: for i in `seq 100`; do ssh root@debiantesting-x86-64 "dmesg | grep -e 'text starts' -e 'mapped boot' >> text; grub-reboot '1>0'; sleep 0.5; reboot" sleep 25 done It searches dmesg for my added printk as well as the print of were the ring buffer was loaded in physical memory. It takes about 15 seconds to reboot, so I waited 25. The results are attached. I found that it was consistent 76 times, which means 1 out of 4 it's not. Funny enough, it broke whenever it loaded the kernel below 0x1. And then it would be off by a little. It was consistently at: 0x27d00 And when it failed, it was at 0x27ce0. Note, when I used the e820 tables to do this, I never saw a failure. My assumption is that when it is below 0x1, something else gets allocated causing this to get pushed down. Thinking about it again: What if you run the allocation super early (see arch/x86/boot/compressed/kaslr.c:handle_mem_options())? If you stick to allocating only from top, you're effectively kernel version independent for your allocations because none of the kernel code ran yet and definitely KASLR independent because you're running deterministically before KASLR even gets allocated. As this code relies on memblock_phys_alloc() being consistent, if something gets allocated before it differently depending on where the kernel is, it can also move the location. A plugin to UEFI would mean that it would need to reserve the memory, and the code here will need to know where it is. We could always make the function reserve_mem() global and weak so that architectures can override it. Yes, the in-kernel UEFI loader (efi-stub) could simply populate a new type of memblock with the respective reservations and you later call memblock_find_in_range_node() instead of memblock_phys_alloc() to pass in flags that you want to allocate only from the new MEMBLOCK_RESERVE_MEM type. The same model would work for
Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
On Mon, 17 Jun 2024 09:07:29 +0200 Alexander Graf wrote: > Hey Steve, > > > I believe we're talking about 2 different things :). Let me rephrase a > bit and make a concrete example. > > Imagine you have passed the "reserve_mem=12M:4096:trace" kernel command > line option. The kernel now comes up and allocates a random chunk of > memory that - by (admittedly good) chance - may be at the same physical > location as before. Let's assume it deemed 0x100 as a good offset. Note, it's not random. It picks from the top of available memory every time. But things can mess with it (see below). > > Let's now assume you're running on a UEFI system. There, you always have > non-volatile storage available to you even in the pre-boot phase. That > means the kernel could create a UEFI variable that says "12M:4096:trace > -> 0x100". The pre-boot phase takes all these UEFI variables and > marks them as reserved. When you finally reach your command line parsing > logic for reserve_mem=, you can flip all reservations that were not on > the command line back to normal memory. > > That way you have pretty much guaranteed persistent memory regions, even > with KASLR changing your memory layout across boots. > > The nice thing is that the above is an extension of what you've already > built: Systems with UEFI simply get better guarantees that their regions > persist. This could be an added feature, but it is very architecture specific, and would likely need architecture specific updates. > > > > > >> > >>> Requirement: > >>> > >>> Need a way to reserve memory that will be at a consistent location for > >>> every boot, if the kernel and system are the same. Does not need to work > >>> if rebooting to a different kernel, or if the system can change the > >>> memory layout between boots. > >>> > >>> The reserved memory can not be an hard coded address, as the same kernel / > >>> command line needs to run on several different machines. The picked memory > >>> reservation just needs to be the same for a given machine, but may be > >> > >> With KASLR is enabled, doesn't this approach break too often to be > >> reliable enough for the data you want to extract? > >> > >> Picking up the idea above, with a persistent variable we could even make > >> KASLR avoid that reserved pstore region in its search for a viable KASLR > >> offset. > > I think I was hit by it once in all my testing. For our use case, the > > few times it fails to map is not going to affect what we need this for > > at all. > > > Once is pretty good. Do you know why? Also once out of how many runs? Is > the randomness source not as random as it should be or are the number of > bits for KASLR maybe so few on your target architecture that the odds of > hitting anything become low? Do these same constraints hold true outside > of your testing environment? So I just ran it a hundred times in a loop. I added a patch to print the location of "_text". The loop was this: for i in `seq 100`; do ssh root@debiantesting-x86-64 "dmesg | grep -e 'text starts' -e 'mapped boot' >> text; grub-reboot '1>0'; sleep 0.5; reboot" sleep 25 done It searches dmesg for my added printk as well as the print of were the ring buffer was loaded in physical memory. It takes about 15 seconds to reboot, so I waited 25. The results are attached. I found that it was consistent 76 times, which means 1 out of 4 it's not. Funny enough, it broke whenever it loaded the kernel below 0x1. And then it would be off by a little. It was consistently at: 0x27d00 And when it failed, it was at 0x27ce0. Note, when I used the e820 tables to do this, I never saw a failure. My assumption is that when it is below 0x1, something else gets allocated causing this to get pushed down. As this code relies on memblock_phys_alloc() being consistent, if something gets allocated before it differently depending on where the kernel is, it can also move the location. A plugin to UEFI would mean that it would need to reserve the memory, and the code here will need to know where it is. We could always make the function reserve_mem() global and weak so that architectures can override it. -- Steve text Description: Binary data
Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
Hey Steve, On 13.06.24 19:12, Steven Rostedt wrote: On Thu, 13 Jun 2024 18:54:12 +0200 Alexander Graf wrote: Do you have a "real" pstore on these systems that you could store non-volatile variables in, such as persistent UEFI variables? If so, you could create an actually persistent mapping for your trace pstore even across kernel version updates as a general mechanism to create reserved memblocks at fixed offsets. After implementing all this, I don't think I can use pstore for my purpose. pstore is a generic interface for persistent storage, and requires an interface to access it. From what I understand, it's not the place to just ask for an area of RAM. For this, I have a single patch that allows the tracing instance to use an area reserved by reserve_mem. reserve_mem=12M:4096:trace trace_instance=boot_mapped@trace I've already tested this on qemu and a couple of chromebooks. It works well. I believe we're talking about 2 different things :). Let me rephrase a bit and make a concrete example. Imagine you have passed the "reserve_mem=12M:4096:trace" kernel command line option. The kernel now comes up and allocates a random chunk of memory that - by (admittedly good) chance - may be at the same physical location as before. Let's assume it deemed 0x100 as a good offset. Let's now assume you're running on a UEFI system. There, you always have non-volatile storage available to you even in the pre-boot phase. That means the kernel could create a UEFI variable that says "12M:4096:trace -> 0x100". The pre-boot phase takes all these UEFI variables and marks them as reserved. When you finally reach your command line parsing logic for reserve_mem=, you can flip all reservations that were not on the command line back to normal memory. That way you have pretty much guaranteed persistent memory regions, even with KASLR changing your memory layout across boots. The nice thing is that the above is an extension of what you've already built: Systems with UEFI simply get better guarantees that their regions persist. Requirement: Need a way to reserve memory that will be at a consistent location for every boot, if the kernel and system are the same. Does not need to work if rebooting to a different kernel, or if the system can change the memory layout between boots. The reserved memory can not be an hard coded address, as the same kernel / command line needs to run on several different machines. The picked memory reservation just needs to be the same for a given machine, but may be With KASLR is enabled, doesn't this approach break too often to be reliable enough for the data you want to extract? Picking up the idea above, with a persistent variable we could even make KASLR avoid that reserved pstore region in its search for a viable KASLR offset. I think I was hit by it once in all my testing. For our use case, the few times it fails to map is not going to affect what we need this for at all. Once is pretty good. Do you know why? Also once out of how many runs? Is the randomness source not as random as it should be or are the number of bits for KASLR maybe so few on your target architecture that the odds of hitting anything become low? Do these same constraints hold true outside of your testing environment? Alex Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597
[PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
Reserve unspecified location of physical memory from kernel command line Background: In ChromeOS, we have 1 MB of pstore ramoops reserved so that we can extract dmesg output and some other information when a crash happens in the field. (This is only done when the user selects "Allow Google to collect data for improving the system"). But there are cases when there's a bug that requires more data to be retrieved to figure out what is happening. We would like to increase the pstore size, either temporarily, or maybe even permanently. The pstore on these devices are at a fixed location in RAM (as the RAM is not cleared on soft reboots nor crashes). The location is chosen by the BIOS (coreboot) and passed to the kernel via ACPI tables on x86. There's a driver that queries for this to initialize the pstore for ChromeOS: See drivers/platform/chrome/chromeos_pstore.c Problem: The problem is that, even though there's a process to change the kernel on these systems, and is done regularly to install updates, the firmware is updated much less frequently. Choosing the place in RAM also takes special care, and may be in a different address for different boards. Updating the size via firmware is a large effort and not something that many are willing to do for a temporary pstore size change. Requirement: Need a way to reserve memory that will be at a consistent location for every boot, if the kernel and system are the same. Does not need to work if rebooting to a different kernel, or if the system can change the memory layout between boots. The reserved memory can not be an hard coded address, as the same kernel / command line needs to run on several different machines. The picked memory reservation just needs to be the same for a given machine, but may be different for different machines. Solution: The solution I have come up with is to introduce a new "reserve_mem=" kernel command line. This parameter takes the following format: reserve_mem=nn:align:label Where nn is the size of memory to reserve, the align is the alignment of that memory, and label is the way for other sub-systems to find that memory. This way the kernel command line could have: reserve_mem=12M:4096:oops ramoops.mem_name=oops At boot up, the kernel will search for 12 megabytes in usable memory regions with an alignment of 4096. It will start at the highest regions and work its way down (for those old devices that want access to lower address DMA). When it finds a region, it will save it off in a small table and mark it with the "oops" label. Then the pstore ramoops sub-system could ask for that memory and location, and it will map itself there. This prototype allows for 8 different mappings (which may be overkill, 4 is probably plenty) with 16 byte size to store the label. I have tested this and it works for us to solve the above problem. We can update the kernel and command line and increase the size of pstore without needing to update the firmware, or knowing every memory layout of each board. I only tested this locally, it has not been tested in the field. Changes since v5: https://lore.kernel.org/all/20240613003435.401549...@goodmis.org/ [ patch at bottom showing differences ] - Stressed more that this is a best effort use case - Updated ramoops.rst to document this new feature - Used a new variable "tmp" to use in reserve_mem_find_by_name() instead of using "size" and possibly corrupting it. Changes since v4: https://lore.kernel.org/all/20240611215610.548854...@goodmis.org/ - Add all checks about reserve_mem before allocation. This means reserved_mem_add() is now a void function. - Check for name duplications. - Fix compare of align to SMP_CACHE_BYTES ("<" instead of "<=") Changes since v3: https://lore.kernel.org/all/20240611144911.327227...@goodmis.org/ - Changed table type of start and size from unsigned long to phys_addr_t (as well as the parameters to the functions that use them) - Changed old reference to "early_reserve_mem" to "reserve_mem" - Check before reservering memory: o Size is non-zero o name has text in it - If align is less than SMP_CACHE_BYTES, make it SMP_CACHE_BYTES - Remove the silly check of testing *p == '\0' after a p += strlen(p) Changes since v2: https://lore.kernel.org/all/20240606150143.876469...@goodmis.org/ - Fixed typo of "reserver" - Added EXPORT_SYMBOL_GPL() for reserve_mem_find_by_name() - Removed "built-in" from module description that was changed from v1. Changes since v1: https://lore.kernel.org/all/2024060320.801075...@goodmis.org/ - Updated the change log of the first patch as well as added an entry into kernel-parameters.txt about how reserve_mem is for soft reboots and may not be reliable. Steven Rostedt (Google) (2): mm/memblock: Add "reserve_mem" to reserved named memory at boot up pstore/ramoops: Add ramoops.mem_name= command line option Documentation/admin-guide/kernel-parameters.txt | 22 +
Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
On Thu, 13 Jun 2024 18:54:12 +0200 Alexander Graf wrote: > > Do you have a "real" pstore on these systems that you could store > non-volatile variables in, such as persistent UEFI variables? If so, you > could create an actually persistent mapping for your trace pstore even > across kernel version updates as a general mechanism to create reserved > memblocks at fixed offsets. After implementing all this, I don't think I can use pstore for my purpose. pstore is a generic interface for persistent storage, and requires an interface to access it. From what I understand, it's not the place to just ask for an area of RAM. For this, I have a single patch that allows the tracing instance to use an area reserved by reserve_mem. reserve_mem=12M:4096:trace trace_instance=boot_mapped@trace I've already tested this on qemu and a couple of chromebooks. It works well. > > > > Requirement: > > > > Need a way to reserve memory that will be at a consistent location for > > every boot, if the kernel and system are the same. Does not need to work > > if rebooting to a different kernel, or if the system can change the > > memory layout between boots. > > > > The reserved memory can not be an hard coded address, as the same kernel / > > command line needs to run on several different machines. The picked memory > > reservation just needs to be the same for a given machine, but may be > > > With KASLR is enabled, doesn't this approach break too often to be > reliable enough for the data you want to extract? > > Picking up the idea above, with a persistent variable we could even make > KASLR avoid that reserved pstore region in its search for a viable KASLR > offset. I think I was hit by it once in all my testing. For our use case, the few times it fails to map is not going to affect what we need this for at all. -- Steve
Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
Hey Steve, On 13.06.24 17:55, Steven Rostedt wrote: Reserve unspecified location of physical memory from kernel command line Background: In ChromeOS, we have 1 MB of pstore ramoops reserved so that we can extract dmesg output and some other information when a crash happens in the field. (This is only done when the user selects "Allow Google to collect data for improving the system"). But there are cases when there's a bug that requires more data to be retrieved to figure out what is happening. We would like to increase the pstore size, either temporarily, or maybe even permanently. The pstore on these devices are at a fixed location in RAM (as the RAM is not cleared on soft reboots nor crashes). The location is chosen by the BIOS (coreboot) and passed to the kernel via ACPI tables on x86. There's a driver that queries for this to initialize the pstore for ChromeOS: See drivers/platform/chrome/chromeos_pstore.c Problem: The problem is that, even though there's a process to change the kernel on these systems, and is done regularly to install updates, the firmware is updated much less frequently. Choosing the place in RAM also takes special care, and may be in a different address for different boards. Updating the size via firmware is a large effort and not something that many are willing to do for a temporary pstore size change. (sorry for not commenting on earlier versions, I didn't see v1-v5 in my inbox) Do you have a "real" pstore on these systems that you could store non-volatile variables in, such as persistent UEFI variables? If so, you could create an actually persistent mapping for your trace pstore even across kernel version updates as a general mechanism to create reserved memblocks at fixed offsets. Requirement: Need a way to reserve memory that will be at a consistent location for every boot, if the kernel and system are the same. Does not need to work if rebooting to a different kernel, or if the system can change the memory layout between boots. The reserved memory can not be an hard coded address, as the same kernel / command line needs to run on several different machines. The picked memory reservation just needs to be the same for a given machine, but may be With KASLR is enabled, doesn't this approach break too often to be reliable enough for the data you want to extract? Picking up the idea above, with a persistent variable we could even make KASLR avoid that reserved pstore region in its search for a viable KASLR offset. Alex Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597