Re: mm/memblock: export memblock_{start/end}_of_DRAM

2020-11-03 Thread Mike Rapoport
On Mon, Nov 02, 2020 at 06:51:25PM -0800, Sudarshan Rajagopalan wrote:
> On 2020-10-30 01:38, Mike Rapoport wrote:
> > On Thu, Oct 29, 2020 at 02:29:27PM -0700, Sudarshan Rajagopalan wrote:
> > > Hello all,
> > > 
> > > We have a usecase where a module driver adds certain memory blocks
> > > using
> > > add_memory_driver_managed(), so that it can perform memory hotplug
> > > operations on these blocks. In general, these memory blocks aren’t
> > > something
> > > that gets physically added later, but is part of actual RAM that
> > > system
> > > booted up with. Meaning – we set the ‘mem=’ cmdline parameter to
> > > limit the
> > > memory and later add the remaining ones using add_memory*() variants.
> > > 
> > > The basic idea is to have driver have ownership and manage certain
> > > memory
> > > blocks for hotplug operations.
> > > 
> > > For the driver be able to know how much memory was limited and how
> > > much
> > > actually present, we take the delta of ‘bootmem physical end
> > > address’ and
> > > ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is
> > > obtained by
> > > scanning the reg values in ‘memory’ DT node and determining the max
> > > {addr,size}. Since our driver is getting modularized, we won’t have
> > > access
> > > to memblock_end_of_DRAM (i.e. end address of all memory blocks after
> > > ‘mem=’
> > > is applied).
> > > 
> > > So checking if memblock_{start/end}_of_DRAM() symbols can be
> > > exported? Also,
> > > this information can be obtained by userspace by doing ‘cat
> > > /proc/iomem’ and
> > > greping for ‘System RAM’. So wondering if userspace can have access
> > > to such
> > > info, can we allow kernel module drivers have access by exporting
> > > memblock_{start/end}_of_DRAM().
> > 
> > These functions cannot be exported not because we want to hide this
> > information from the modules but because it is unsafe to use them.
> > On most architecturs these functions are __init so they are discarded
> > after boot anyway. Beisdes, the memory configuration known to memblock
> > might be not accurate in many cases as David explained in his reply.
> > 
> 
> I don't see how information contained in memblock_{start/end}_of_DRAM() is
> considered hidden if the information can be obtained using 'cat
> /proc/iomem'. The memory resource manager adds these blocks either in
> "System RAM", "reserved", "Kernel data/code" etc. Inspecting this, one could
> determine whats the start and end of memblocks.

I'm not saying that the memblock data is considered hidden. On most
systems it is simply not present after boot. And even if it is not
discarded, it might be not accurate on any arch except arm64.

> I agree on the part that its __init annotated and could be removed after
> boot. This is something that the driver can be vary of too.
> 
> > > Or are there any other ways where a module driver can get the end
> > > address of
> > > system memory block?
> > 
> > What do you mean by "system memory block"? There could be a lot of
> > interpretations if you take into account memory hotplug, "mem=" option,
> > reserved and firmware memory.
> 
> I meant the physical end address of memblock. The equivalent of
> memblock_end_of_DRAM.

> > I'd suggest you to describe the entire use case in more detail. Having
> > the complete picture would help finding a proper solution.
> 
> The usecase in general is have a way to add/remove and online/offline
> certain memory blocks which are part of boot. We do this by limiting the
> memory using "mem=" and latter add the remaining blocks using
> add_memory_driver_mamanaged().

I think such infrastructure should be a part of core mm rather than
external out-of-tree driver.

> Sudarshan
> 
-- 
Sincerely yours,
Mike.


Re: mm/memblock: export memblock_{start/end}_of_DRAM

2020-11-03 Thread Sudarshan Rajagopalan

On 2020-10-30 01:38, Mike Rapoport wrote:

On Thu, Oct 29, 2020 at 02:29:27PM -0700, Sudarshan Rajagopalan wrote:

Hello all,

We have a usecase where a module driver adds certain memory blocks 
using

add_memory_driver_managed(), so that it can perform memory hotplug
operations on these blocks. In general, these memory blocks aren’t 
something
that gets physically added later, but is part of actual RAM that 
system
booted up with. Meaning – we set the ‘mem=’ cmdline parameter to limit 
the

memory and later add the remaining ones using add_memory*() variants.

The basic idea is to have driver have ownership and manage certain 
memory

blocks for hotplug operations.

For the driver be able to know how much memory was limited and how 
much
actually present, we take the delta of ‘bootmem physical end address’ 
and
‘memblock_end_of_DRAM’. The 'bootmem physical end address' is obtained 
by

scanning the reg values in ‘memory’ DT node and determining the max
{addr,size}. Since our driver is getting modularized, we won’t have 
access
to memblock_end_of_DRAM (i.e. end address of all memory blocks after 
‘mem=’

is applied).

So checking if memblock_{start/end}_of_DRAM() symbols can be exported? 
Also,
this information can be obtained by userspace by doing ‘cat 
/proc/iomem’ and
greping for ‘System RAM’. So wondering if userspace can have access to 
such

info, can we allow kernel module drivers have access by exporting
memblock_{start/end}_of_DRAM().


These functions cannot be exported not because we want to hide this
information from the modules but because it is unsafe to use them.
On most architecturs these functions are __init so they are discarded
after boot anyway. Beisdes, the memory configuration known to memblock
might be not accurate in many cases as David explained in his reply.



I don't see how information contained in memblock_{start/end}_of_DRAM() 
is considered hidden if the information can be obtained using 'cat 
/proc/iomem'. The memory resource manager adds these blocks either in 
"System RAM", "reserved", "Kernel data/code" etc. Inspecting this, one 
could determine whats the start and end of memblocks.


I agree on the part that its __init annotated and could be removed after 
boot. This is something that the driver can be vary of too.


Or are there any other ways where a module driver can get the end 
address of

system memory block?


What do you mean by "system memory block"? There could be a lot of
interpretations if you take into account memory hotplug, "mem=" option,
reserved and firmware memory.


I meant the physical end address of memblock. The equivalent of 
memblock_end_of_DRAM.




I'd suggest you to describe the entire use case in more detail. Having
the complete picture would help finding a proper solution.


The usecase in general is have a way to add/remove and online/offline 
certain memory blocks which are part of boot. We do this by limiting the 
memory using "mem=" and latter add the remaining blocks using 
add_memory_driver_mamanaged().





Sudarshan



--
Sincerely yours,
Mike.



Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


Re: mm/memblock: export memblock_{start/end}_of_DRAM

2020-11-03 Thread Sudarshan Rajagopalan

On 2020-10-29 23:41, David Hildenbrand wrote:

On 29.10.20 22:29, Sudarshan Rajagopalan wrote:

Hello all,



Hi!



Hi David.. thanks for the response as always.

We have a usecase where a module driver adds certain memory blocks 
using

add_memory_driver_managed(), so that it can perform memory hotplug
operations on these blocks. In general, these memory blocks aren’t
something that gets physically added later, but is part of actual RAM
that system booted up with. Meaning – we set the ‘mem=’ cmdline
parameter to limit the memory and later add the remaining ones using
add_memory*() variants.

The basic idea is to have driver have ownership and manage certain
memory blocks for hotplug operations.


So, in summary, you're still abusing the memory hot(un)plug
infrastructure from your driver - just not in a severe way as before.
And I'll tell you why, so you might understand why exposing this API
is not really a good idea and why your driver wouldn't - for example -
be upstream material.

Don't get me wrong, what you are doing might be ok in your context,
but it's simply not universally applicable in our current model.

Ordinary system RAM works different than many other devices (like PCI
devices) whereby *something* senses the device and exposes it to the
system, and some available driver binds to it and owns the memory.

Memory is detected by a driver and added to the system via e.g.,
add_memory_driver_managed(). Memory devices are created and the memory
is directly handed off to the system, to be used as system RAM as soon
as memory devices are onlined. There is no driver that "binds" memory
like other devices - it's rather the core (buddy) that uses/owns that
memory immediately after device creation.



I see.. and I agree that drivers are meant to *sense* that something 
changed or newly added, so that driver can check if it's the one 
responsible or compatible for handling this entity and binds to it. So I 
guess what it boils down to is - a driver that uses memory hotplug 
_cannot_ add/remove or have ownership of memblock boot memory, but for 
the newly added RAM blocks later on.


I was trying to mimic the detecting and adding of extra RAM by limiting 
the System RAM with "mem=XGB" as though system booted with XGB of boot 
memory and later add the remaining blocks (force detection and adding) 
using add_memorY-driver_manager(). This remaining blocks are calculated 
by 'physical end addr of boot memory' - 'memblock_end_of_DRAM'. The 
"physical end addr of boot memory" i.e. the actual RAM that bootloader 
informs to kernel can be obtained by scanning the 'memory' DT node.




For the driver be able to know how much memory was limited and how 
much

actually present, we take the delta of ‘bootmem physical end address’
and ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is
obtained by scanning the reg values in ‘memory’ DT node and 
determining

the max {addr,size}. Since our driver is getting modularized, we won’t
have access to memblock_end_of_DRAM (i.e. end address of all memory
blocks after ‘mem=’ is applied).


What you do with "mem=" is force memory detection to ignore some of
it's detected memory.



So checking if memblock_{start/end}_of_DRAM() symbols can be exported?
Also, this information can be obtained by userspace by doing ‘cat
/proc/iomem’ and greping for ‘System RAM’. So wondering if userspace 
can


Not correct: with "mem=", cat /proc/iomem only shows *detected* +
added system RAM, not the unmodified detection.



That's correct - I meant 'memblock_end_of_DRAM' along with "mem=" can be 
calculated using 'cat /proc/iomem' which shows "detected plus added" 
System RAM, and not the remaining undetected one which got stripped off 
due to "mem=XGB". Basically, 'memblock_end_of_DRAM' address with 
'mem=XGB' is {end addr of boot RAM - XGB}.. which would be same as end 
address of "System RAM" showed in /proc/iomem.


The reasoning for this is - if userspace can have access to such info 
and calculate the memblock end address, why not let drivers have this 
info using memblock_end_of_DRAM()?


have access to such info, can we allow kernel module drivers have 
access

by exporting memblock_{start/end}_of_DRAM().

Or are there any other ways where a module driver can get the end
address of system memory block?


And here is our problem: You disabled *detection* of that memory by
the responsible driver (here: core). Now your driver wants to know
what would have been detected. Assume you have memory hole in that
region - it would not work by simply looking at start/end. You're
driver is not the one doing the detection.



Regarding the memory hole - the driver can inspect the 'memory' DT node 
that kernel gets from ABL from RAM partition table if any such holes 
exist or not. I agree that if such holes exists, hot adding will fail 
since it needs block size to be added.
The same issue will arise if a RAM slot is added and a driver senses it 
and it only knows the start/end of this RAM slo

Re: mm/memblock: export memblock_{start/end}_of_DRAM

2020-11-03 Thread Christoph Hellwig
On Sat, Oct 31, 2020 at 11:05:45AM +0100, David Hildenbrand wrote:
> On 31.10.20 10:18, Christoph Hellwig wrote:
> > On Fri, Oct 30, 2020 at 10:38:42AM +0200, Mike Rapoport wrote:
> > > What do you mean by "system memory block"? There could be a lot of
> > > interpretations if you take into account memory hotplug, "mem=" option,
> > > reserved and firmware memory.
> > > 
> > > I'd suggest you to describe the entire use case in more detail. Having
> > > the complete picture would help finding a proper solution.
> > 
> > I think we need the code for the driver trying to do this as an RFC
> > submission.  Everything else is rather pointless.
> 
> Sharing RFCs is most probably not what people want when developing advanced
> hypervisor features :)

Well, if they can't even do that it really has no relevance for kernel
development.


Re: mm/memblock: export memblock_{start/end}_of_DRAM

2020-10-31 Thread David Hildenbrand

On 31.10.20 10:18, Christoph Hellwig wrote:

On Fri, Oct 30, 2020 at 10:38:42AM +0200, Mike Rapoport wrote:
  
What do you mean by "system memory block"? There could be a lot of

interpretations if you take into account memory hotplug, "mem=" option,
reserved and firmware memory.

I'd suggest you to describe the entire use case in more detail. Having
the complete picture would help finding a proper solution.


I think we need the code for the driver trying to do this as an RFC
submission.  Everything else is rather pointless.


Sharing RFCs is most probably not what people want when developing 
advanced hypervisor features :)


@Sudarshan, I recommend looking at the slides of the KVM Forum talk from 
yesterday


https://kvmforum2020.sched.com/event/eE40/towards-an-alternative-memory-architecture-joao-martins-oracle?iframe=no

It contains a nice summary of the state of art, and how "mem=", devdax, 
and dax_hmat can be used to tackle the issue in a hypervisor.


--
Thanks,

David / dhildenb



Re: mm/memblock: export memblock_{start/end}_of_DRAM

2020-10-31 Thread Christoph Hellwig
On Fri, Oct 30, 2020 at 10:38:42AM +0200, Mike Rapoport wrote:
>  
> What do you mean by "system memory block"? There could be a lot of
> interpretations if you take into account memory hotplug, "mem=" option,
> reserved and firmware memory.
> 
> I'd suggest you to describe the entire use case in more detail. Having
> the complete picture would help finding a proper solution.

I think we need the code for the driver trying to do this as an RFC
submission.  Everything else is rather pointless.


Re: mm/memblock: export memblock_{start/end}_of_DRAM

2020-10-30 Thread Mike Rapoport
On Thu, Oct 29, 2020 at 02:29:27PM -0700, Sudarshan Rajagopalan wrote:
> Hello all,
> 
> We have a usecase where a module driver adds certain memory blocks using
> add_memory_driver_managed(), so that it can perform memory hotplug
> operations on these blocks. In general, these memory blocks aren’t something
> that gets physically added later, but is part of actual RAM that system
> booted up with. Meaning – we set the ‘mem=’ cmdline parameter to limit the
> memory and later add the remaining ones using add_memory*() variants.
> 
> The basic idea is to have driver have ownership and manage certain memory
> blocks for hotplug operations.
> 
> For the driver be able to know how much memory was limited and how much
> actually present, we take the delta of ‘bootmem physical end address’ and
> ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is obtained by
> scanning the reg values in ‘memory’ DT node and determining the max
> {addr,size}. Since our driver is getting modularized, we won’t have access
> to memblock_end_of_DRAM (i.e. end address of all memory blocks after ‘mem=’
> is applied).
> 
> So checking if memblock_{start/end}_of_DRAM() symbols can be exported? Also,
> this information can be obtained by userspace by doing ‘cat /proc/iomem’ and
> greping for ‘System RAM’. So wondering if userspace can have access to such
> info, can we allow kernel module drivers have access by exporting
> memblock_{start/end}_of_DRAM().

These functions cannot be exported not because we want to hide this
information from the modules but because it is unsafe to use them.
On most architecturs these functions are __init so they are discarded
after boot anyway. Beisdes, the memory configuration known to memblock
might be not accurate in many cases as David explained in his reply.

> Or are there any other ways where a module driver can get the end address of
> system memory block?
 
What do you mean by "system memory block"? There could be a lot of
interpretations if you take into account memory hotplug, "mem=" option,
reserved and firmware memory.

I'd suggest you to describe the entire use case in more detail. Having
the complete picture would help finding a proper solution.

> Sudarshan
> 

--
Sincerely yours,
Mike.


Re: mm/memblock: export memblock_{start/end}_of_DRAM

2020-10-29 Thread David Hildenbrand

On 29.10.20 22:29, Sudarshan Rajagopalan wrote:

Hello all,



Hi!


We have a usecase where a module driver adds certain memory blocks using
add_memory_driver_managed(), so that it can perform memory hotplug
operations on these blocks. In general, these memory blocks aren’t
something that gets physically added later, but is part of actual RAM
that system booted up with. Meaning – we set the ‘mem=’ cmdline
parameter to limit the memory and later add the remaining ones using
add_memory*() variants.

The basic idea is to have driver have ownership and manage certain
memory blocks for hotplug operations.


So, in summary, you're still abusing the memory hot(un)plug 
infrastructure from your driver - just not in a severe way as before. 
And I'll tell you why, so you might understand why exposing this API is 
not really a good idea and why your driver wouldn't - for example - be 
upstream material.


Don't get me wrong, what you are doing might be ok in your context, but 
it's simply not universally applicable in our current model.


Ordinary system RAM works different than many other devices (like PCI 
devices) whereby *something* senses the device and exposes it to the 
system, and some available driver binds to it and owns the memory.


Memory is detected by a driver and added to the system via e.g., 
add_memory_driver_managed(). Memory devices are created and the memory 
is directly handed off to the system, to be used as system RAM as soon 
as memory devices are onlined. There is no driver that "binds" memory 
like other devices - it's rather the core (buddy) that uses/owns that 
memory immediately after device creation.




For the driver be able to know how much memory was limited and how much
actually present, we take the delta of ‘bootmem physical end address’
and ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is
obtained by scanning the reg values in ‘memory’ DT node and determining
the max {addr,size}. Since our driver is getting modularized, we won’t
have access to memblock_end_of_DRAM (i.e. end address of all memory
blocks after ‘mem=’ is applied).


What you do with "mem=" is force memory detection to ignore some of it's 
detected memory.




So checking if memblock_{start/end}_of_DRAM() symbols can be exported?
Also, this information can be obtained by userspace by doing ‘cat
/proc/iomem’ and greping for ‘System RAM’. So wondering if userspace can


Not correct: with "mem=", cat /proc/iomem only shows *detected* + added 
system RAM, not the unmodified detection.



have access to such info, can we allow kernel module drivers have access
by exporting memblock_{start/end}_of_DRAM().

Or are there any other ways where a module driver can get the end
address of system memory block?


And here is our problem: You disabled *detection* of that memory by the 
responsible driver (here: core). Now your driver wants to know what 
would have been detected. Assume you have memory hole in that region - 
it would not work by simply looking at start/end. You're driver is not 
the one doing the detection.


Another issue is: when using such memory for KVM guests, there is no 
mechanism that tracks ownership of that memory - imagine another driver 
wanting to use that memory. This really only works in special environments.


Yet another issue: you cannot assume that memblock data will stay around 
after boot. While we do it right now for arm64, that might change at 
some point. This is also one of the reasons why we don't export any real 
memblock data to drivers.



When using "mem=" you have to know the exact layout of your system RAM 
and communicate the right places how that layout looks like manually: 
here, to your driver.


The clean way of doing things today is to allocate RAM and use it for 
guests - e.g., using hugetlb/gigantic pages. As I said, there are other 
techniques coming up to deal with minimizing struct page overhead - if 
that's what you're concerned with (I still don't know why you're 
removing the memory from the host when giving it to the guest).


--
Thanks,

David / dhildenb



mm/memblock: export memblock_{start/end}_of_DRAM

2020-10-29 Thread Sudarshan Rajagopalan

Hello all,

We have a usecase where a module driver adds certain memory blocks using 
add_memory_driver_managed(), so that it can perform memory hotplug 
operations on these blocks. In general, these memory blocks aren’t 
something that gets physically added later, but is part of actual RAM 
that system booted up with. Meaning – we set the ‘mem=’ cmdline 
parameter to limit the memory and later add the remaining ones using 
add_memory*() variants.


The basic idea is to have driver have ownership and manage certain 
memory blocks for hotplug operations.


For the driver be able to know how much memory was limited and how much 
actually present, we take the delta of ‘bootmem physical end address’ 
and ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is 
obtained by scanning the reg values in ‘memory’ DT node and determining 
the max {addr,size}. Since our driver is getting modularized, we won’t 
have access to memblock_end_of_DRAM (i.e. end address of all memory 
blocks after ‘mem=’ is applied).


So checking if memblock_{start/end}_of_DRAM() symbols can be exported? 
Also, this information can be obtained by userspace by doing ‘cat 
/proc/iomem’ and greping for ‘System RAM’. So wondering if userspace can 
have access to such info, can we allow kernel module drivers have access 
by exporting memblock_{start/end}_of_DRAM().


Or are there any other ways where a module driver can get the end 
address of system memory block?



Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project