[Sts-sponsors] Please review and sponsor LP1968805 ec2-hibinit-agent for Amazon AWS

2022-04-20 Thread Matthew Ruffell
Hi everyone,

Could you please review LP #1968805 [1], and sponsor the uploads if it looks
okay?

[1] https://bugs.launchpad.net/ubuntu/+source/ec2-hibinit-agent/+bug/1968805

I think bumping the priority to 32767 is the best solution for this particular
case, since it would be higher than priorities users would be using in the wild,
if they happen to have multiple swapfiles configured.

I have submitted the patches upstream, but no word back. The upstream is AWS
though, so maybe I might be able to put the SLA on hold for feedback there,
but otherwise, we have about one month left on the SLA to get this fixed, so
I wasn't going to wait.

I did ask the CPC team about their feelings on the patches, but I didn't get
much of a response other than from Chris Newcomer in the CPC channel.

If you were going to test this for yourself, best stick with Focal for the
moment, as Jammy is broken on xen instance types, and is being tracked
separately in https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1968062

Let me know if you have any feedback, or think this should be fixed in a
different way.

And yes, Dan, this is probably a duplicate of
https://bugs.launchpad.net/bugs/1910252
where the real root cause is that systemd blindly hibernates to the highest
priority swapfile, as it has no way to know what resume device and offset the
kernel is configured to resume from. SRUing such a change might be difficult
as those who accept the standard behaviour would have to manually update their
configuration on their systems to tell systemd what swapfile to hibernate to.

I think talking with upstream systemd will go past the 1 month left on the SLA,
and so these straightforward patches to ec2-hibinit-agent are probably the best
low risk way to work around the bug, and fix AWS users.

PS, our ec2-hibinit-agent package diverged from upstream long ago, and the
foundations team appear to be happy carrying patches not upstreamed.

Thanks,
Matthew

-- 
Mailing list: https://launchpad.net/~sts-sponsors
Post to : sts-sponsors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~sts-sponsors
More help   : https://help.launchpad.net/ListHelp


[Sts-sponsors] [Bug 1947099] Re: ipconfig does not honour user-requested timeouts in some cases

2022-04-20 Thread Fabio Augusto Miranda Martins
I've tested the new patch from ppa:mfo/lp1947099v2 and I can confirm it
resolves the problem:

- Without the patch:

https://pastebin.ubuntu.com/p/RksNcBGSzn/


It took 396,940865−220,447147 = 176,493718 seconds in the IP-Config section. 
Total boot time: 


ubuntu@gpu48-ubuntu18:~$ sudo systemd-analyze time
Startup finished in 4min 1.355s (firmware) + 2min 24.433s (loader) + 6min 
8.464s (kernel) + 41.466s (userspace) = 13min 15.719s
graphical.target reached after 41.068s in userspace


- With the patch:

https://pastebin.ubuntu.com/p/46nVYCYyDZ/

It took 246,556749−212,019368 = 34,537381 seconds in the IP-Config
section. Total boot time:

ubuntu@gpu48-ubuntu18:~$ sudo systemd-analyze time
Startup finished in 4min 1.246s (firmware) + 2min 24.170s (loader) + 3min 
42.915s (kernel) + 39.010s (userspace) = 10min 47.343s
graphical.target reached after 38.634s in userspace


ubuntu@gpu48-ubuntu18:~$ sudo apt-cache policy klibc-utils
klibc-utils:
  Installed: 2.0.4-9ubuntu2.18.04.1
  Candidate: 2.0.4-9ubuntu2.18.04.1
  Version table:
 *** 2.0.4-9ubuntu2.18.04.1 500
500 http://ppa.launchpad.net/mfo/lp1947099v2/ubuntu bionic/main amd64 
Packages
100 /var/lib/dpkg/status
 2.0.4-9ubuntu2.1 500
500 http://me-dubai-1-ad-1.clouds.archive.ubuntu.com/ubuntu 
bionic-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 
Packages
 2.0.4-9ubuntu2 500
500 http://me-dubai-1-ad-1.clouds.archive.ubuntu.com/ubuntu bionic/main 
amd64 Packages

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1947099

Title:
  ipconfig does not honour user-requested timeouts in some cases

Status in klibc package in Ubuntu:
  New
Status in klibc source package in Bionic:
  Incomplete

Bug description:
  [Impact]
  In some cases, ipconfig can take a longer time than the user-specified 
timeouts, causing unexpected delays.

  [Test Plan]
  Any situation where ipconfig encounters an error sending the DHCP packet, it 
will automatically set a delay of 10 seconds, which could be longer than the 
user-specified timeout. It can be reproduced by creating a dummy interface and 
attempting to run ipconfig on it with a timeout value of less than 10:

  # ip link add eth1 type dummy
  # date; /usr/lib/klibc/bin/ipconfig -t 2 eth1; date
  Thu Nov 18 04:46:13 EST 2021
  IP-Config: eth1 hardware address ae:e0:f5:9d:7e:00 mtu 1500 DHCP RARP
  IP-Config: no response after 2 secs - giving up
  Thu Nov 18 04:46:23 EST 2021

  ^ Notice above, ipconfig thinks that it waited 2 seconds, but the
  timestamps show an actual delay of 10 seconds.

  [Where problems could occur]
  Please see reproduction steps above. We are seeing this in production too 
(see comment #2).

  [Other Info]
  A patch to fix the issue is being proposed here. It is a safe fix - it only 
checks before going into sleep that the timeout never exceeds the 
user-requested value.

  [Original Description]

  In some cases, ipconfig can take longer than the user-specified
  timeouts, causing unexpected delays.

  in main.c, in function loop(), the process can go into
  process_timeout_event() (or process_receive_event() ) and if it
  encounters an error situation, will set an attempt to "try again
  later" at time equal now + 10 seconds by setting

  s->expire = now + 10;

  This can happen at any time during the main event loop, which can end
  up extending the user-specified timeout if "now + 10" is greater than
  "start_time + user-specified-timeout".

  I believe a patch like the following is needed to avoid this problem:

  --- a/usr/kinit/ipconfig/main.c
  +++ b/usr/kinit/ipconfig/main.c
  @@ -437,6 +437,13 @@ static int loop(void)

  if (timeout > s->expire - now.tv_sec)
  timeout = s->expire - now.tv_sec;
  +
  +   /* Compensate for already-lost time */
  +   gettimeofday(, NULL);
  +   if (now.tv_sec + timeout > start + loop_timeout) {
  +   timeout = loop_timeout - (now.tv_sec - start);
  +   printf("Lowered timeout to match user request 
= (%d s) \n", timeout);
  +   }
  }

  I believe the current behaviour is buggy. This is confirmed when the
  following line is executed:

  if (loop_timeout >= 0 &&
  now.tv_sec - start >= loop_timeout) {
  printf("IP-Config: no response after %d "
     "secs - giving up\n", loop_timeout);
  rc = -1;
  goto bail;
  }

  'loop_timeout' is the user-specified time-out. With a value of 2, in
  case of error, this line prints:

  IP-Config: no response after 2 secs -