Re: RFC [PATCH v4 2/7] Enable balloon drivers to report inflated memory

2022-10-14 Thread David Hildenbrand


Other problem is that there are drivers that do not use
adjust_managed_page_count().


Which ones? Do we care?


VMWare and Virtio balloon drivers. I recently proposed to unify them and
the objection was that it would break existing users - which is valid so
we must care i guess.


I'm confused, I think we care about actual adjustment of the total pages 
available here, that we want to notify the system about. These 
approaches (vmware, virtio-balloon with deflate-on-oom) don't adjust 
totalpages, because the assumption is that we can get back the inflated 
memory any time we really need it automatically.


--
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: RFC [PATCH v4 2/7] Enable balloon drivers to report inflated memory

2022-10-14 Thread David Hildenbrand

On 14.10.22 14:50, Alexander Atanasov wrote:

Hello,

On 11.10.22 12:23, David Hildenbrand wrote:

Sounds to me that all you want is some notifier to be called from
adjust_managed_page_count(). What am I missing?


Notifier will act as an accumulator to report size of change and it
will make things easier for the drivers and users wrt locking.
Notifier is similar to the memory hotplug notifier.


Overall, I am not convinced that there is any value of separating the
value
and the notifier. You can batch both or not batch both. In addition,
as I
mentioned, having two values seems racy.


I have identified two users so far above - may be more to come.
One type needs the value to adjust. Also having the value is necessary
to report it to users and oom. There are options with callbacks and so
on but it will complicate things with no real gain. You are right about
the atomicity but i guess if that's a problem for some user it could
find a way to ensure it. i am yet to find such place.



I haven't followed the whole discussion, but I just wanted to raise that
having a generic mechanism to notify on such changes could be valuable.

For example, virtio-mem also uses adjust_managed_page_count() and might
sometimes not trigger memory hotplug notifiers when adding more memory
(essentially, when it fake-adds memory part of an already added Linux
memory block).

What might make sense is schedule some kind of deferred notification on
adjust_managed_page_count() changes. This way, we could notify without
caring about locking and would naturally batch notifications.

adjust_managed_page_count() users would not require changes.


Making it deferred will bring issues for both the users of the
adjust_managed_page_count and the receivers of the notification -
locking as first. And it is hard to know when the adjustment will
finish, some of the drivers wait and retry in blocks. It will bring
complexity and it will not be possible to convert users in small steps.


What exactly is the issue about handling that deferred? Who needs an 
immediate, 100% precise notification?


Locking from a separate workqueue shouldn't be too hard, or what am i 
missing?




Other problem is that there are drivers that do not use
adjust_managed_page_count().


Which ones? Do we care?

--
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: RFC [PATCH v4 2/7] Enable balloon drivers to report inflated memory

2022-10-11 Thread David Hildenbrand

Sounds to me that all you want is some notifier to be called from
adjust_managed_page_count(). What am I missing?


Notifier will act as an accumulator to report size of change and it will make 
things easier for the drivers and users wrt locking.
Notifier is similar to the memory hotplug notifier.


Overall, I am not convinced that there is any value of separating the value
and the notifier. You can batch both or not batch both. In addition, as I
mentioned, having two values seems racy.


I have identified two users so far above - may be more to come.
One type needs the value to adjust. Also having the value is necessary
to report it to users and oom. There are options with callbacks and so
on but it will complicate things with no real gain. You are right about
the atomicity but i guess if that's a problem for some user it could
find a way to ensure it. i am yet to find such place.



I haven't followed the whole discussion, but I just wanted to raise that 
having a generic mechanism to notify on such changes could be valuable.


For example, virtio-mem also uses adjust_managed_page_count() and might 
sometimes not trigger memory hotplug notifiers when adding more memory 
(essentially, when it fake-adds memory part of an already added Linux 
memory block).


What might make sense is schedule some kind of deferred notification on 
adjust_managed_page_count() changes. This way, we could notify without 
caring about locking and would naturally batch notifications.


adjust_managed_page_count() users would not require changes.

--
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: RFC [PATCH v4 2/7] Enable balloon drivers to report inflated memory

2022-10-10 Thread Nadav Amit
On Oct 10, 2022, at 12:24 AM, Alexander Atanasov 
 wrote:

> Hello,
> 
> On 10.10.22 9:18, Nadav Amit wrote:
>> On Oct 7, 2022, at 3:58 AM, Alexander Atanasov 
>>  wrote:
>>> So all balloon drivers give large amount of RAM on boot , then inflate the 
>>> balloon. But this places have already been initiallized and they know that 
>>> the system have given amount of totalram which is not true the moment they 
>>> start to operate. the result is that too much space gets used and it 
>>> degrades the userspace performance.
>>> example - fs/eventpoll.c:static int __init eventpoll_init(void) - 4% of ram 
>>> for eventpool - when you inflate half of the ram it becomes 8% of the ram - 
>>> do you really need 8% of your ram to be used for eventpool?
>>> 
>>> To solve this you need to register and when notified update - cache size, 
>>> limits and for whatever is the calculated amount of memory used.
>> Hmm.. Not sure about all of that. Most balloon drivers are manually managed,
>> and call adjust_managed_page_count(), and tas a result might want to redo
>> all the calculations that are based on totalram_pages().
> 
> Yes, i will say that it looks like mixed manual - for large changes and 
> automatic for small changes. VMWare and HyperV have automatic and manual/not 
> sure exactly what you can change on a running VM but i guess you can/ - 
> Virtio is only manual. I do not know about dlpar / xen.
> 
> Scenario is like this start a VM with 4GB ram, reduce to 2GB with balloon - 
> vm can be upgraded.
> 
> All we are talking about relates to memory hotplug/unplug /where unplug is  
> close to nonexistant hence the balloons are used./
> 
> All values should be recalculated on memory hotplug too, so you can use the 
> newly available RAM.
> 
> RAM is the most valuable resource of all so i consider using it optimally to 
> be of a great importance.
> 
>> Side-note: That’s not the case for VMware balloon. I actually considered
>> calling adjust_managed_page_count() just to conform with other balloon
>> drivers. But since we use totalram_pages() to communicate to the hypervisor
>> the total-ram, this would create endless (and wrong) feedback loop. I am not
>> claiming it is not possible to VMware balloon driver to call
>> adjust_managed_page_count(), but the chances are that it would create more
>> harm than good.
> 
> Virtio does both - depending on the deflate on OOM option. I suggested 
> already to unify all drivers to inflate the used memory as it seems more 
> logical to me since  no body expects the totalram_pages() to change but the 
> current state is that both ways are accepted and if changed can break 
> existing users.
> See  discussion here 
> https://lore.kernel.org/lkml/20220809095358.2203355-1-alexander.atana...@virtuozzo.com/.

Thanks for the reminder. I wish you can somehow summarize all of that into the
cover-letter and/or the commit messages for these patches.

> 
> 
>> Back to the matter at hand. It seems that you wish that the notifiers would
>> be called following any changes that would be reflected in totalram_pages().
>> So, doesn't it make more sense to call it from adjust_managed_page_count() ?
> 
> It will hurt performance - all drivers work page by page , i.e. they update 
> by +1/-1 and they do so under locks which as you already noted can lead to 
> bad things. The notifier will accumulate the change and let its user know how 
> much changed, so the can decide if they have to recalculate - it even can do 
> so async in order to not disturb the drivers.

So updating the counters by 1 is ok (using atomic operation, which is not
free)? And the reason it is (relatively) cheap is because nobody actually
looks at the value (i.e., nobody actually acts on the value)?

If nobody considers the value, then doesn’t it make sense just to update it
less frequently, and then call the notifiers?

>>> The difference is here:
>>> 
>>> mm/zswap.c: return totalram_pages() * zswap_max_pool_percent / 100 <
>>> mm/zswap.c: return totalram_pages() * zswap_accept_thr_percent / 100
>>> uses percents and you can recalculate easy with
>>> 
>>> +static inline unsigned long totalram_pages_current(void)
>>> +{
>>> +   unsigned long inflated = 0;
>>> +#ifdef CONFIG_MEMORY_BALLOON
>>> +   extern atomic_long_t mem_balloon_inflated_free_kb;
>>> +   inflated = atomic_long_read(_balloon_inflated_free_kb);
>>> +   inflated >>= (PAGE_SHIFT - 10);
>>> +#endif
>>> +   return (unsigned long)atomic_long_read(&_totalram_pages) - inflated;
>>> +}

So we have here two values and it appears there is a hidden assumption that
they are both updated atomically. Otherwise, it appears, inflated
theoretically might be greater that _totalram_pages dn we get negative value
and all hell breaks loose.

But _totalram_pages and mem_balloon_inflated_free_kb are not updated
atomically together (each one is, but not together).

>>> And you are good when you switch to _current version - si_meminfo_current 
>>> is alike .
>>> 

Re: RFC [PATCH v4 2/7] Enable balloon drivers to report inflated memory

2022-10-10 Thread Nadav Amit
On Oct 7, 2022, at 3:58 AM, Alexander Atanasov 
 wrote:

> On 7.10.22 0:07, Nadav Amit wrote:
 
 I was looking through the series and I did not see actual users of the
 notifier. Usually, it is not great to build an API without users.
>>> 
>>> 
>>> You are right. I hope to get some feedback/interest from potential users 
>>> that i mentioned in the cover letter. I will probably split the notifier
>>> in separate series. To make it usefull it will require more changes.
>>> See bellow more about them.
>> Fair, but this is something that is more suitable for RFC. Otherwise, more
>> likely than not - your patches would go in as is.
> 
> Yes, i will remove the notifier and resend both as RFC. I think that every 
> patch is an RFC and RFC tag is used for more general changes that could 
> affect unexpected areas, change functionality, change design and in general 
> can lead to bigger impact. In the case with this it adds functionality that 
> is missing and it could hardly affect anything else.
> In essence it provides information that you can not get without it.
> But i will take your advice and push everything thru RFC from now on.

Jus keep the version numbers as you had before. That’s fine and better to
prevent confusion.

 [snip]
> +
> +static int balloon_notify(unsigned long val)
> +{
> + return srcu_notifier_call_chain(_chain, val, NULL);
 Since you know the inflated_kb value here, why not to use it as an argument
 to the callback? I think casting to (void *) and back is best. But you can
 also provide pointer to the value. Doesn’t it sound better than having
 potentially different notifiers reading different values?
>>> 
>>> My current idea is to have a struct with current and previous value,
>>> may be change in percents. The actual value does not matter to anyone
>>> but the size of change does. When a user gets notified it can act upon
>>> the change - if it is small it can ignore it , if it is above some 
>>> threshold it can act - if it makes sense for some receiver  is can 
>>> accumulate changes from several notification. Other option/addition is to 
>>> have si_meminfo_current(..) and totalram_pages_current(..) that return 
>>> values adjusted with the balloon values.
>>> 
>>> Going further - there are few places that calculate something based on 
>>> available memory that do not have sysfs/proc interface for setting limits. 
>>> Most of them work in percents so they can be converted to do calculations 
>>> when they get notification.
>>> 
>>> The one that have interface for configuration but use memory values can be 
>>> handled in two ways - convert to use percents of what is available or 
>>> extend the notifier to notify userspace which in turn to do calculations 
>>> and update configuration.
>> I really need to see code to fully understand what you have in mind.
> 
> Sure - you can check some of the users with git grep totalram_pages - shows 
> self explanatory results of usage like:
> fs/f2fs/node.c:bool f2fs_available_free_memory(struct f2fs_sb_info *sbi, int 
> type) - calculations in percents - one good example
> fs/ceph/super.h:congestion_kb = (16*int_sqrt(totalram_pages())) << 
> (PAGE_SHIFT-10);
> fs/fuse/inode.c:*limit = ((totalram_pages() << PAGE_SHIFT) >> 
> 13) / 392;
> fs/nfs/write.c: nfs_congestion_kb = (16*int_sqrt(totalram_pages())) << 
> (PAGE_SHIFT-10);
> fs/nfsd/nfscache.c: unsigned long low_pages = totalram_pages() - 
> totalhigh_pages()
> mm/oom_kill.c:  oc->totalpages = totalram_pages() + total_swap_pages;
> 
> 
> So all balloon drivers give large amount of RAM on boot , then inflate the 
> balloon. But this places have already been initiallized and they know that 
> the system have given amount of totalram which is not true the moment they 
> start to operate. the result is that too much space gets used and it degrades 
> the userspace performance.
> example - fs/eventpoll.c:static int __init eventpoll_init(void) - 4% of ram 
> for eventpool - when you inflate half of the ram it becomes 8% of the ram - 
> do you really need 8% of your ram to be used for eventpool?
> 
> To solve this you need to register and when notified update - cache size, 
> limits and for whatever is the calculated amount of memory used.

Hmm.. Not sure about all of that. Most balloon drivers are manually managed,
and call adjust_managed_page_count(), and tas a result might want to redo
all the calculations that are based on totalram_pages().

Side-note: That’s not the case for VMware balloon. I actually considered
calling adjust_managed_page_count() just to conform with other balloon
drivers. But since we use totalram_pages() to communicate to the hypervisor
the total-ram, this would create endless (and wrong) feedback loop. I am not
claiming it is not possible to VMware balloon driver to call
adjust_managed_page_count(), but the chances are that it would create more
harm than good.

Back to the matter at hand. It seems