Your message dated Sun, 22 Nov 2020 21:29:03 +0100
with message-id <[email protected]>
and subject line Re: #880554: max grant frames problem
has caused the Debian Bug report #880554,
regarding max grant frames problem (domu freeze with linux-image-4.9.0-4-amd64)
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
880554: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=880554
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: linux-image-4.9.0-4-amd64
Version: 4.9.51-1
Severity: critical

As I can tell right now, the domu system simply freezes. The logs simply end at some point until the new reboot stuff comes up. Sometimes it's still possible to log on to the system, but nothing really works. It is like all IO to the virtual block devices is suspended indefinitely. Until this happens, the systems seems to work without issues. As the new kernel isn't out that long, I can't tell how often this happens. first time was the day before yesterday and yesterday afternoon it happened twice within two hours.

Something like 'ls' on a directory listed before still gets a result, but everything 'new', i.e. 'vim somefile' will cause the shell to stall. Sadly there is no visible error, services just fails to answer one by one (maybe when the try to read/write something new to the disk, then they simply wait for IO to happen).

For testing I installed the older kernel (last linux-image-4.9.0-3-amd64 from security - 4.9.30-2+deb9u5) and realized immediately that the system boot time is a fraction with the old kernel in opposite to the new one. For the time being, I'm staying with that nn the production system.

To see if anything will be dumped on the console, I started one within a screen on a test machine. Now I have to generate some activity and IO and see if something happens there.

I haven't had the time to test the impact on the dom0 kernel jet, as far as I observed, the dom0 seems to be unaffected by the kernel update.
--- End Message ---
--- Begin Message ---
Hi all,

On 11/28/19 4:21 PM, Hans van Kranenburg wrote:
> On 7/18/19 1:30 AM, Hans van Kranenburg wrote:
>> Hi,
>>
>> On 10/23/18 7:34 PM, Ian Jackson wrote:
>>> Control: retitle -1 max grant frames problem (domu freeze with 
>>> linux-image-4.9.0-4-amd64)
>>> Control: severity -1 important
>>> Control: reassign -1 src:xen 4.8.3+xsa267+shim4.10.1+xsa267-1+deb9u9
>>
>> my last comment in this bts bug was about:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=29d11cfd8698038b87458ba4d1329b9da81150a5
>>
>> ..which is in since linux 4.13-rc2, and buster has 4.19+
>>
>> Is there anyone who would wants to try reproduce the max grant frames
>> problem on buster with Xen 4.11 and Linux 4.19 dom0/domU?
>>
>> The 'xen/grant-table: max_grant_frames reached' should show up on the
>> serial console. I'd like to see a test report of it actually happening.
> 
> I actually just did this, by putting max_grant_frames = 4 in a domU
> config file and starting it (Linux 4.19 domU on Xen 4.11):
> 
> Welcome to Debian GNU/Linux 10 (buster)!
> 
> [    5.499058] systemd[1]: Set hostname to <debug-btrfs-buster>.
> [    5.552968] xen:grant_table: xen/grant-table: max_grant_frames
> reached cur=4 extra=1 limit=4 gnttab_free_count=3 req_entries=1
> [...]
> 
> Yay. Better info for the users!

So, this was already confirmed.

> Also, there's a patch in review that can improve the situation:
> 
> https://lists.xenproject.org/archives/html/xen-devel/2019-11/msg01607.html
> 
> The biggest annoyance in our Xen 4.11 now is that the default value for
> the hypervisor command line of gnttab_max_frames is raised to 64 from 32
> a while ago, but the toolstack overwrites this again with a default of
> 32. The patch attempts to fix that.

That change was included in Xen 4.13. We're about to put Xen 4.14 in
Debian unstable now, which includes the improvement. In Xen 4.11 in
Debian stable, the situation is a bit more annoying, but that's not
going to change any more now. Whoever needs specific settings that are
non-default should have figured out how to set them at this point.

For reference (and for who does not want to look it up), here's the
commit message of the final patch that went in, so, about the new Xen
4.14 behavior:

---- 8< ----

commit f2ae59bc4b9b5c3f12de86aa42cdf413d2c3ffbf
Author: George Dunlap <[email protected]>
Date:   Fri Nov 29 17:24:45 2019 +0000

Rationalize max_grant_frames and max_maptrack_frames handling

Xen used to have single, system-wide limits for the number of grant
frames and maptrack frames a guest was allowed to create. Increasing
or decreasing this single limit on the Xen command-line would change
the limit for all guests on the system.

Later, per-domain limits for these values was created. The system-wide
limits became strict limits: domains could not be created with higher
limits, but could be created with lower limits. However, that change
also introduced a range of different "default" values into various
places in the toolstack:

- The python libxc bindings hard-coded these values to 32 and 1024,
  respectively
- The libxl default values are 32 and 1024 respectively.
- xl will use the libxl default for maptrack, but does its own default
  calculation for grant frames: either 32 or 64, based on the max
  possible mfn.

These defaults interact poorly with the hypervisor command-line limit:

- The hypervisor command-line limit cannot be used to raise the limit
  for all guests anymore, as the default in the toolstack will
  effectively override this.
- If you use the hypervisor command-line limit to *reduce* the limit,
  then the "default" values generated by the toolstack are too high,
  and all guest creations will fail.

In other words, the toolstack defaults require any change to be
effected by having the admin explicitly specify a new value in every
guest.

In order to address this, have grant_table_init treat negative values
for max_grant_frames and max_maptrack_frames as instructions to use the
system-wide default, and have all the above toolstacks default to passing
-1 unless a different value is explicitly configured.

This restores the old behavior in that changing the hypervisor command-line
option can change the behavior for all guests, while retaining the ability
to set per-guest values.  It also removes the bug that reducing the
system-wide max will cause all domains without explicit limits to fail.

NOTE: - The Ocaml bindings require the caller to always specify a value,
  and the code to start a xenstored stubdomain hard-codes these to 4
  and 128 respectively; this behavour will not be modified.

---- >8 ----

So, I'm closing this debian bug now, since there are no actionable items
left to do.

Hans

--- End Message ---

Reply via email to