Re: mm/memblock: export memblock_{start/end}_of_DRAM
On Mon, Nov 02, 2020 at 06:51:25PM -0800, Sudarshan Rajagopalan wrote: > On 2020-10-30 01:38, Mike Rapoport wrote: > > On Thu, Oct 29, 2020 at 02:29:27PM -0700, Sudarshan Rajagopalan wrote: > > > Hello all, > > > > > > We have a usecase where a module driver adds certain memory blocks > > > using > > > add_memory_driver_managed(), so that it can perform memory hotplug > > > operations on these blocks. In general, these memory blocks aren’t > > > something > > > that gets physically added later, but is part of actual RAM that > > > system > > > booted up with. Meaning – we set the ‘mem=’ cmdline parameter to > > > limit the > > > memory and later add the remaining ones using add_memory*() variants. > > > > > > The basic idea is to have driver have ownership and manage certain > > > memory > > > blocks for hotplug operations. > > > > > > For the driver be able to know how much memory was limited and how > > > much > > > actually present, we take the delta of ‘bootmem physical end > > > address’ and > > > ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is > > > obtained by > > > scanning the reg values in ‘memory’ DT node and determining the max > > > {addr,size}. Since our driver is getting modularized, we won’t have > > > access > > > to memblock_end_of_DRAM (i.e. end address of all memory blocks after > > > ‘mem=’ > > > is applied). > > > > > > So checking if memblock_{start/end}_of_DRAM() symbols can be > > > exported? Also, > > > this information can be obtained by userspace by doing ‘cat > > > /proc/iomem’ and > > > greping for ‘System RAM’. So wondering if userspace can have access > > > to such > > > info, can we allow kernel module drivers have access by exporting > > > memblock_{start/end}_of_DRAM(). > > > > These functions cannot be exported not because we want to hide this > > information from the modules but because it is unsafe to use them. > > On most architecturs these functions are __init so they are discarded > > after boot anyway. Beisdes, the memory configuration known to memblock > > might be not accurate in many cases as David explained in his reply. > > > > I don't see how information contained in memblock_{start/end}_of_DRAM() is > considered hidden if the information can be obtained using 'cat > /proc/iomem'. The memory resource manager adds these blocks either in > "System RAM", "reserved", "Kernel data/code" etc. Inspecting this, one could > determine whats the start and end of memblocks. I'm not saying that the memblock data is considered hidden. On most systems it is simply not present after boot. And even if it is not discarded, it might be not accurate on any arch except arm64. > I agree on the part that its __init annotated and could be removed after > boot. This is something that the driver can be vary of too. > > > > Or are there any other ways where a module driver can get the end > > > address of > > > system memory block? > > > > What do you mean by "system memory block"? There could be a lot of > > interpretations if you take into account memory hotplug, "mem=" option, > > reserved and firmware memory. > > I meant the physical end address of memblock. The equivalent of > memblock_end_of_DRAM. > > I'd suggest you to describe the entire use case in more detail. Having > > the complete picture would help finding a proper solution. > > The usecase in general is have a way to add/remove and online/offline > certain memory blocks which are part of boot. We do this by limiting the > memory using "mem=" and latter add the remaining blocks using > add_memory_driver_mamanaged(). I think such infrastructure should be a part of core mm rather than external out-of-tree driver. > Sudarshan > -- Sincerely yours, Mike.
Re: mm/memblock: export memblock_{start/end}_of_DRAM
On 2020-10-30 01:38, Mike Rapoport wrote: On Thu, Oct 29, 2020 at 02:29:27PM -0700, Sudarshan Rajagopalan wrote: Hello all, We have a usecase where a module driver adds certain memory blocks using add_memory_driver_managed(), so that it can perform memory hotplug operations on these blocks. In general, these memory blocks aren’t something that gets physically added later, but is part of actual RAM that system booted up with. Meaning – we set the ‘mem=’ cmdline parameter to limit the memory and later add the remaining ones using add_memory*() variants. The basic idea is to have driver have ownership and manage certain memory blocks for hotplug operations. For the driver be able to know how much memory was limited and how much actually present, we take the delta of ‘bootmem physical end address’ and ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is obtained by scanning the reg values in ‘memory’ DT node and determining the max {addr,size}. Since our driver is getting modularized, we won’t have access to memblock_end_of_DRAM (i.e. end address of all memory blocks after ‘mem=’ is applied). So checking if memblock_{start/end}_of_DRAM() symbols can be exported? Also, this information can be obtained by userspace by doing ‘cat /proc/iomem’ and greping for ‘System RAM’. So wondering if userspace can have access to such info, can we allow kernel module drivers have access by exporting memblock_{start/end}_of_DRAM(). These functions cannot be exported not because we want to hide this information from the modules but because it is unsafe to use them. On most architecturs these functions are __init so they are discarded after boot anyway. Beisdes, the memory configuration known to memblock might be not accurate in many cases as David explained in his reply. I don't see how information contained in memblock_{start/end}_of_DRAM() is considered hidden if the information can be obtained using 'cat /proc/iomem'. The memory resource manager adds these blocks either in "System RAM", "reserved", "Kernel data/code" etc. Inspecting this, one could determine whats the start and end of memblocks. I agree on the part that its __init annotated and could be removed after boot. This is something that the driver can be vary of too. Or are there any other ways where a module driver can get the end address of system memory block? What do you mean by "system memory block"? There could be a lot of interpretations if you take into account memory hotplug, "mem=" option, reserved and firmware memory. I meant the physical end address of memblock. The equivalent of memblock_end_of_DRAM. I'd suggest you to describe the entire use case in more detail. Having the complete picture would help finding a proper solution. The usecase in general is have a way to add/remove and online/offline certain memory blocks which are part of boot. We do this by limiting the memory using "mem=" and latter add the remaining blocks using add_memory_driver_mamanaged(). Sudarshan -- Sincerely yours, Mike. Sudarshan -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: mm/memblock: export memblock_{start/end}_of_DRAM
On 2020-10-29 23:41, David Hildenbrand wrote: On 29.10.20 22:29, Sudarshan Rajagopalan wrote: Hello all, Hi! Hi David.. thanks for the response as always. We have a usecase where a module driver adds certain memory blocks using add_memory_driver_managed(), so that it can perform memory hotplug operations on these blocks. In general, these memory blocks aren’t something that gets physically added later, but is part of actual RAM that system booted up with. Meaning – we set the ‘mem=’ cmdline parameter to limit the memory and later add the remaining ones using add_memory*() variants. The basic idea is to have driver have ownership and manage certain memory blocks for hotplug operations. So, in summary, you're still abusing the memory hot(un)plug infrastructure from your driver - just not in a severe way as before. And I'll tell you why, so you might understand why exposing this API is not really a good idea and why your driver wouldn't - for example - be upstream material. Don't get me wrong, what you are doing might be ok in your context, but it's simply not universally applicable in our current model. Ordinary system RAM works different than many other devices (like PCI devices) whereby *something* senses the device and exposes it to the system, and some available driver binds to it and owns the memory. Memory is detected by a driver and added to the system via e.g., add_memory_driver_managed(). Memory devices are created and the memory is directly handed off to the system, to be used as system RAM as soon as memory devices are onlined. There is no driver that "binds" memory like other devices - it's rather the core (buddy) that uses/owns that memory immediately after device creation. I see.. and I agree that drivers are meant to *sense* that something changed or newly added, so that driver can check if it's the one responsible or compatible for handling this entity and binds to it. So I guess what it boils down to is - a driver that uses memory hotplug _cannot_ add/remove or have ownership of memblock boot memory, but for the newly added RAM blocks later on. I was trying to mimic the detecting and adding of extra RAM by limiting the System RAM with "mem=XGB" as though system booted with XGB of boot memory and later add the remaining blocks (force detection and adding) using add_memorY-driver_manager(). This remaining blocks are calculated by 'physical end addr of boot memory' - 'memblock_end_of_DRAM'. The "physical end addr of boot memory" i.e. the actual RAM that bootloader informs to kernel can be obtained by scanning the 'memory' DT node. For the driver be able to know how much memory was limited and how much actually present, we take the delta of ‘bootmem physical end address’ and ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is obtained by scanning the reg values in ‘memory’ DT node and determining the max {addr,size}. Since our driver is getting modularized, we won’t have access to memblock_end_of_DRAM (i.e. end address of all memory blocks after ‘mem=’ is applied). What you do with "mem=" is force memory detection to ignore some of it's detected memory. So checking if memblock_{start/end}_of_DRAM() symbols can be exported? Also, this information can be obtained by userspace by doing ‘cat /proc/iomem’ and greping for ‘System RAM’. So wondering if userspace can Not correct: with "mem=", cat /proc/iomem only shows *detected* + added system RAM, not the unmodified detection. That's correct - I meant 'memblock_end_of_DRAM' along with "mem=" can be calculated using 'cat /proc/iomem' which shows "detected plus added" System RAM, and not the remaining undetected one which got stripped off due to "mem=XGB". Basically, 'memblock_end_of_DRAM' address with 'mem=XGB' is {end addr of boot RAM - XGB}.. which would be same as end address of "System RAM" showed in /proc/iomem. The reasoning for this is - if userspace can have access to such info and calculate the memblock end address, why not let drivers have this info using memblock_end_of_DRAM()? have access to such info, can we allow kernel module drivers have access by exporting memblock_{start/end}_of_DRAM(). Or are there any other ways where a module driver can get the end address of system memory block? And here is our problem: You disabled *detection* of that memory by the responsible driver (here: core). Now your driver wants to know what would have been detected. Assume you have memory hole in that region - it would not work by simply looking at start/end. You're driver is not the one doing the detection. Regarding the memory hole - the driver can inspect the 'memory' DT node that kernel gets from ABL from RAM partition table if any such holes exist or not. I agree that if such holes exists, hot adding will fail since it needs block size to be added. The same issue will arise if a RAM slot is added and a driver senses it and it only knows the start/end of this RAM slo
Re: mm/memblock: export memblock_{start/end}_of_DRAM
On Sat, Oct 31, 2020 at 11:05:45AM +0100, David Hildenbrand wrote: > On 31.10.20 10:18, Christoph Hellwig wrote: > > On Fri, Oct 30, 2020 at 10:38:42AM +0200, Mike Rapoport wrote: > > > What do you mean by "system memory block"? There could be a lot of > > > interpretations if you take into account memory hotplug, "mem=" option, > > > reserved and firmware memory. > > > > > > I'd suggest you to describe the entire use case in more detail. Having > > > the complete picture would help finding a proper solution. > > > > I think we need the code for the driver trying to do this as an RFC > > submission. Everything else is rather pointless. > > Sharing RFCs is most probably not what people want when developing advanced > hypervisor features :) Well, if they can't even do that it really has no relevance for kernel development.
Re: mm/memblock: export memblock_{start/end}_of_DRAM
On 31.10.20 10:18, Christoph Hellwig wrote: On Fri, Oct 30, 2020 at 10:38:42AM +0200, Mike Rapoport wrote: What do you mean by "system memory block"? There could be a lot of interpretations if you take into account memory hotplug, "mem=" option, reserved and firmware memory. I'd suggest you to describe the entire use case in more detail. Having the complete picture would help finding a proper solution. I think we need the code for the driver trying to do this as an RFC submission. Everything else is rather pointless. Sharing RFCs is most probably not what people want when developing advanced hypervisor features :) @Sudarshan, I recommend looking at the slides of the KVM Forum talk from yesterday https://kvmforum2020.sched.com/event/eE40/towards-an-alternative-memory-architecture-joao-martins-oracle?iframe=no It contains a nice summary of the state of art, and how "mem=", devdax, and dax_hmat can be used to tackle the issue in a hypervisor. -- Thanks, David / dhildenb
Re: mm/memblock: export memblock_{start/end}_of_DRAM
On Fri, Oct 30, 2020 at 10:38:42AM +0200, Mike Rapoport wrote: > > What do you mean by "system memory block"? There could be a lot of > interpretations if you take into account memory hotplug, "mem=" option, > reserved and firmware memory. > > I'd suggest you to describe the entire use case in more detail. Having > the complete picture would help finding a proper solution. I think we need the code for the driver trying to do this as an RFC submission. Everything else is rather pointless.
Re: mm/memblock: export memblock_{start/end}_of_DRAM
On Thu, Oct 29, 2020 at 02:29:27PM -0700, Sudarshan Rajagopalan wrote: > Hello all, > > We have a usecase where a module driver adds certain memory blocks using > add_memory_driver_managed(), so that it can perform memory hotplug > operations on these blocks. In general, these memory blocks aren’t something > that gets physically added later, but is part of actual RAM that system > booted up with. Meaning – we set the ‘mem=’ cmdline parameter to limit the > memory and later add the remaining ones using add_memory*() variants. > > The basic idea is to have driver have ownership and manage certain memory > blocks for hotplug operations. > > For the driver be able to know how much memory was limited and how much > actually present, we take the delta of ‘bootmem physical end address’ and > ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is obtained by > scanning the reg values in ‘memory’ DT node and determining the max > {addr,size}. Since our driver is getting modularized, we won’t have access > to memblock_end_of_DRAM (i.e. end address of all memory blocks after ‘mem=’ > is applied). > > So checking if memblock_{start/end}_of_DRAM() symbols can be exported? Also, > this information can be obtained by userspace by doing ‘cat /proc/iomem’ and > greping for ‘System RAM’. So wondering if userspace can have access to such > info, can we allow kernel module drivers have access by exporting > memblock_{start/end}_of_DRAM(). These functions cannot be exported not because we want to hide this information from the modules but because it is unsafe to use them. On most architecturs these functions are __init so they are discarded after boot anyway. Beisdes, the memory configuration known to memblock might be not accurate in many cases as David explained in his reply. > Or are there any other ways where a module driver can get the end address of > system memory block? What do you mean by "system memory block"? There could be a lot of interpretations if you take into account memory hotplug, "mem=" option, reserved and firmware memory. I'd suggest you to describe the entire use case in more detail. Having the complete picture would help finding a proper solution. > Sudarshan > -- Sincerely yours, Mike.
Re: mm/memblock: export memblock_{start/end}_of_DRAM
On 29.10.20 22:29, Sudarshan Rajagopalan wrote: Hello all, Hi! We have a usecase where a module driver adds certain memory blocks using add_memory_driver_managed(), so that it can perform memory hotplug operations on these blocks. In general, these memory blocks aren’t something that gets physically added later, but is part of actual RAM that system booted up with. Meaning – we set the ‘mem=’ cmdline parameter to limit the memory and later add the remaining ones using add_memory*() variants. The basic idea is to have driver have ownership and manage certain memory blocks for hotplug operations. So, in summary, you're still abusing the memory hot(un)plug infrastructure from your driver - just not in a severe way as before. And I'll tell you why, so you might understand why exposing this API is not really a good idea and why your driver wouldn't - for example - be upstream material. Don't get me wrong, what you are doing might be ok in your context, but it's simply not universally applicable in our current model. Ordinary system RAM works different than many other devices (like PCI devices) whereby *something* senses the device and exposes it to the system, and some available driver binds to it and owns the memory. Memory is detected by a driver and added to the system via e.g., add_memory_driver_managed(). Memory devices are created and the memory is directly handed off to the system, to be used as system RAM as soon as memory devices are onlined. There is no driver that "binds" memory like other devices - it's rather the core (buddy) that uses/owns that memory immediately after device creation. For the driver be able to know how much memory was limited and how much actually present, we take the delta of ‘bootmem physical end address’ and ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is obtained by scanning the reg values in ‘memory’ DT node and determining the max {addr,size}. Since our driver is getting modularized, we won’t have access to memblock_end_of_DRAM (i.e. end address of all memory blocks after ‘mem=’ is applied). What you do with "mem=" is force memory detection to ignore some of it's detected memory. So checking if memblock_{start/end}_of_DRAM() symbols can be exported? Also, this information can be obtained by userspace by doing ‘cat /proc/iomem’ and greping for ‘System RAM’. So wondering if userspace can Not correct: with "mem=", cat /proc/iomem only shows *detected* + added system RAM, not the unmodified detection. have access to such info, can we allow kernel module drivers have access by exporting memblock_{start/end}_of_DRAM(). Or are there any other ways where a module driver can get the end address of system memory block? And here is our problem: You disabled *detection* of that memory by the responsible driver (here: core). Now your driver wants to know what would have been detected. Assume you have memory hole in that region - it would not work by simply looking at start/end. You're driver is not the one doing the detection. Another issue is: when using such memory for KVM guests, there is no mechanism that tracks ownership of that memory - imagine another driver wanting to use that memory. This really only works in special environments. Yet another issue: you cannot assume that memblock data will stay around after boot. While we do it right now for arm64, that might change at some point. This is also one of the reasons why we don't export any real memblock data to drivers. When using "mem=" you have to know the exact layout of your system RAM and communicate the right places how that layout looks like manually: here, to your driver. The clean way of doing things today is to allocate RAM and use it for guests - e.g., using hugetlb/gigantic pages. As I said, there are other techniques coming up to deal with minimizing struct page overhead - if that's what you're concerned with (I still don't know why you're removing the memory from the host when giving it to the guest). -- Thanks, David / dhildenb
mm/memblock: export memblock_{start/end}_of_DRAM
Hello all, We have a usecase where a module driver adds certain memory blocks using add_memory_driver_managed(), so that it can perform memory hotplug operations on these blocks. In general, these memory blocks aren’t something that gets physically added later, but is part of actual RAM that system booted up with. Meaning – we set the ‘mem=’ cmdline parameter to limit the memory and later add the remaining ones using add_memory*() variants. The basic idea is to have driver have ownership and manage certain memory blocks for hotplug operations. For the driver be able to know how much memory was limited and how much actually present, we take the delta of ‘bootmem physical end address’ and ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is obtained by scanning the reg values in ‘memory’ DT node and determining the max {addr,size}. Since our driver is getting modularized, we won’t have access to memblock_end_of_DRAM (i.e. end address of all memory blocks after ‘mem=’ is applied). So checking if memblock_{start/end}_of_DRAM() symbols can be exported? Also, this information can be obtained by userspace by doing ‘cat /proc/iomem’ and greping for ‘System RAM’. So wondering if userspace can have access to such info, can we allow kernel module drivers have access by exporting memblock_{start/end}_of_DRAM(). Or are there any other ways where a module driver can get the end address of system memory block? Sudarshan -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project