Public bug reported:

[Impact]

It is not uncommon for users to add a swapfile to their AWS instance, in
case they run short of memory. For users that optionally enable
Hibernation support, the swapfile generated by ec2-hibinit-agent, /swap-
hibinit, needs to always be the highest priority when it comes to
suspend the system, since ec2-hibinit-agent sets up /swap-hibinit to be
the correct way to suspend and resume via the resume=UUID=<uuid> and
resume_offeset=<offset> kernel command line parameters.

ec2-hibinit-agent keeps /swap-hibinit swapoff during normal instance
use, and right before Hibernation occurs, /etc/acpi/actions/sleep.sh
swapon /swap-hibinit, and calls systemctl hibernate:

do_hibernate() {
    if [ -d /run/systemd/system ]; then
        systemctl hibernate

case "$2" in
    SBTN)
        swapon /swap-hibinit && do_hibernate

Something changed between 18.04 and 20.04, such that new swapfiles are
added with a lower priority than the previous swapfile when they are
swapon:

On Focal and later, we see behaviour like if we simply swapon /swap-hibinit 
generated by ec2-hibinit-agent, we
see it is -2:

$ sudo swapon /swap-hibinit
$ swapon --show
NAME          TYPE SIZE USED PRIO
/swap-hibinit file 3.9G   0B   -2

Turning it off:
$ sudo swapoff /swap-hibinit
$ swapon --show
NAME          TYPE SIZE USED PRIO

Lets add /swapfile in:

$ sudo swapon /swapfile
$ swapon --show
NAME      TYPE SIZE USED PRIO
/swapfile file   4G   0B   -2

Now we enable /swap-hibinit again, and see it is -3:

$ sudo swapon /swap-hibinit
$ swapon --show
NAME          TYPE SIZE USED PRIO
/swapfile     file   4G   0B   -2
/swap-hibinit file 3.9G   0B   -3

Lets add in another swapfile, /swapfile-second, and we see -2, -3, -4:

$ sudo swapon /swapfile-second
$ swapon --show
NAME             TYPE SIZE USED PRIO
/swapfile        file   4G   0B   -2
/swap-hibinit    file 3.9G   0B   -3
/swapfile-second file   4G   0B   -4

What happens is that if we have a swapfile, say, /swapfile at default
priority -2, when we go to hibernate, the swapon in
/etc/acpi/actions/sleep.sh will set the priority of /swap-hibinit to -3.
systemd / the kernel will then select the highest priority swapfile to
hibernate to, in this case /swapfile, which is NOT setup for resume= or
resume_offset= on the kernel command line, and hibernation will fail.

Apr 11 21:08:15 ip-172-31-84-225 kernel: [  240.990073] Adding 4095996k
swap on /swap-hibinit.  Priority:-3 extents:6 across:4644860k SSFS

This leaves the instance in the "Stopping" state on the EC2 console
until it hits the 20 minute timeout, at which point it is force stopped.

The fix is to set the priority when we swapon /swap-hibinit to something
higher than any other swapfile, to ensure we hibernate to /swap-hibinit.

[Testcase]

From the EC2 console, select "Launch Instance".

Create a:

- t2.medium
- Ubuntu 20.04, 21.04 or 22.04
- 20gb storage space, advanced > enable encryption > yes.
- Advanced settings > Stop State (Hibernation) Support > Enabled

On boot wait for ec2-hibinit-agent to complete hibinit-agent.service,
and see that /swap-hibinit is created, and swapoff.

$ ll /swap-hibinit

Add a swapfile, and switch it on:

$ sudo fallocate -l 4G /swapfile
$ sudo dd if=/dev/zero of=/swapfile bs=1024 count=4194304
$ sudo chmod 600 /swapfile
$ sudo mkswap /swapfile
$ sudo swapon /swapfile
$ echo "/swapfile swap swap defaults 0 0" | sudo tee -a /etc/fstab
$ swapon --show
NAME      TYPE SIZE USED PRIO
/swapfile file   4G   0B   -2

Go back to EC2 console, "Instance State" > "Hibernate".

You will see this in journalctl:

Mar 15 11:41:54 ip-172-31-27-108 kernel: [ 520.121761] Adding 16095656k swap on 
/swap-hibinit. Priority:-3 extents:13 across:17611176k SSFS
Mar 15 11:41:54 ip-172-31-27-108 root: ACPI action undefined: LNXSLPBN:00

and the instance will not hibernate. EC2 console will report "Stopping"
for 20 minutes until it times out and is force stopped.

If you enable the following ppa and install the test ec2-hibinit-agent
package:

https://launchpad.net/~mruffell/+archive/ubuntu/sf331069-test

Hibernation should succeed within a minute or two.

[Where problems could occur]

This change will only affect users of instances where Hibernation has
been explicitly enabled, either from the EC2 instance launch advanced
settings, or via the "--hibernation-options Configured=true" parameter
to the "aws ec2" command. For all other users, including those with
swapfiles enabled, this change will have no effect.

We are changing the /swap-hibinit file to be maximum priority right
before we hibernate, to ensure it is the swapfile selected to hibernate
to. Since we swapoff /swap-hibinit as soon as we resume, /swap-hibinit
is used solely for hibernation, and not for regular swap space, so it is
unlikely to cause any regressions to users with their own swapfiles
configured with various priorities.

A potential risk is users that do not use /swap-hibinit, and use their
own swapfile for hibernation, and overwrite the changes ec2-hibinit-
agent makes to grub files to set the resume=UUID<uuid> and
resume_offset=<offset> values. I believe such users would likely remove
or purge the ec2-hibinit-agent package, since hibinit-agent.service runs
at startup and re-adds the grub configuration for /swap-hibinit whether
you like it or not, and having /swap-hibinit around would waste disk
space that you would be paying for. Because of this, I believe that this
change will not break users who hibernate to their own swapfiles,
because they would have removed ec2-hibinit-agent on instance creation.

[Other info]

Chris Newcomer came across the above upstream bug, which seems to be the
same issue:

https://github.com/aws/amazon-ec2-hibinit-agent/issues/20

The reporter, Ben Mares, suggests a patch to /etc/acpi/actions/sleep.sh
to either read the value of a bash environment variable swap_priority,
or default to 10.

https://github.com/aws/amazon-ec2-hibinit-agent/pull/21

I'm not exactly on board with the environment variable, or the default
magic number of 10, as we don't know how our users are setting up
swapfiles, and what priorities they set them to. I think we should
instead just set the priority to the maximum, 32767 instead.

** Affects: ec2-hibinit-agent (Ubuntu)
     Importance: Medium
     Assignee: Matthew Ruffell (mruffell)
         Status: In Progress

** Affects: ec2-hibinit-agent (Ubuntu Focal)
     Importance: Medium
     Assignee: Matthew Ruffell (mruffell)
         Status: In Progress

** Affects: ec2-hibinit-agent (Ubuntu Impish)
     Importance: Medium
     Assignee: Matthew Ruffell (mruffell)
         Status: In Progress

** Affects: ec2-hibinit-agent (Ubuntu Jammy)
     Importance: Medium
     Assignee: Matthew Ruffell (mruffell)
         Status: In Progress


** Tags: sts

** Also affects: ec2-hibinit-agent (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Also affects: ec2-hibinit-agent (Ubuntu Impish)
   Importance: Undecided
       Status: New

** Also affects: ec2-hibinit-agent (Ubuntu Jammy)
   Importance: Undecided
       Status: New

** Changed in: ec2-hibinit-agent (Ubuntu Focal)
       Status: New => In Progress

** Changed in: ec2-hibinit-agent (Ubuntu Impish)
       Status: New => In Progress

** Changed in: ec2-hibinit-agent (Ubuntu Jammy)
       Status: New => In Progress

** Changed in: ec2-hibinit-agent (Ubuntu Focal)
   Importance: Undecided => Medium

** Changed in: ec2-hibinit-agent (Ubuntu Impish)
   Importance: Undecided => Medium

** Changed in: ec2-hibinit-agent (Ubuntu Jammy)
   Importance: Undecided => Medium

** Changed in: ec2-hibinit-agent (Ubuntu Focal)
     Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: ec2-hibinit-agent (Ubuntu Impish)
     Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: ec2-hibinit-agent (Ubuntu Jammy)
     Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Description changed:

  [Impact]
  
  It is not uncommon for users to add a swapfile to their AWS instance, in
  case they run short of memory. For users that optionally enable
  Hibernation support, the swapfile generated by ec2-hibinit-agent, /swap-
  hibinit, needs to always be the highest priority when it comes to
  suspend the system, since ec2-hibinit-agent sets up /swap-hibinit to be
  the correct way to suspend and resume via the resume=UUID=<uuid> and
  resume_offeset=<offset> kernel command line parameters.
  
  ec2-hibinit-agent keeps /swap-hibinit swapoff during normal instance
  use, and right before Hibernation occurs, /etc/acpi/actions/sleep.sh
  swapon /swap-hibinit, and calls systemctl hibernate:
  
  do_hibernate() {
-     if [ -d /run/systemd/system ]; then
-         systemctl hibernate
+     if [ -d /run/systemd/system ]; then
+         systemctl hibernate
  
  case "$2" in
-     SBTN)
-         swapon /swap-hibinit && do_hibernate
+     SBTN)
+         swapon /swap-hibinit && do_hibernate
  
  Something changed between 18.04 and 20.04, such that new swapfiles are
  added with a lower priority than the previous swapfile when they are
  swapon:
  
  On Focal and later, we see behaviour like if we simply swapon /swap-hibinit 
generated by ec2-hibinit-agent, we
  see it is -2:
  
  $ sudo swapon /swap-hibinit
  $ swapon --show
  NAME          TYPE SIZE USED PRIO
  /swap-hibinit file 3.9G   0B   -2
  
  Turning it off:
  $ sudo swapoff /swap-hibinit
  $ swapon --show
  NAME          TYPE SIZE USED PRIO
  
  Lets add /swapfile in:
  
  $ sudo swapon /swapfile
  $ swapon --show
  NAME      TYPE SIZE USED PRIO
  /swapfile file   4G   0B   -2
  
  Now we enable /swap-hibinit again, and see it is -3:
  
  $ sudo swapon /swap-hibinit
  $ swapon --show
  NAME          TYPE SIZE USED PRIO
  /swapfile     file   4G   0B   -2
  /swap-hibinit file 3.9G   0B   -3
  
  Lets add in another swapfile, /swapfile-second, and we see -2, -3, -4:
  
  $ sudo swapon /swapfile-second
  $ swapon --show
  NAME             TYPE SIZE USED PRIO
  /swapfile        file   4G   0B   -2
  /swap-hibinit    file 3.9G   0B   -3
  /swapfile-second file   4G   0B   -4
  
  What happens is that if we have a swapfile, say, /swapfile at default
  priority -2, when we go to hibernate, the swapon in
  /etc/acpi/actions/sleep.sh will set the priority of /swap-hibinit to -3.
  systemd / the kernel will then select the highest priority swapfile to
  hibernate to, in this case /swapfile, which is NOT setup for resume= or
  resume_offset= on the kernel command line, and hibernation will fail.
  
  Apr 11 21:08:15 ip-172-31-84-225 kernel: [  240.990073] Adding 4095996k
  swap on /swap-hibinit.  Priority:-3 extents:6 across:4644860k SSFS
  
  This leaves the instance in the "Stopping" state on the EC2 console
  until it hits the 20 minute timeout, at which point it is force stopped.
  
  The fix is to set the priority when we swapon /swap-hibinit to something
  higher than any other swapfile, to ensure we hibernate to /swap-hibinit.
  
  [Testcase]
  
  From the EC2 console, select "Launch Instance".
  
  Create a:
  
  - t2.medium
  - Ubuntu 20.04, 21.04 or 22.04
  - 20gb storage space, advanced > enable encryption > yes.
  - Advanced settings > Stop State (Hibernation) Support > Enabled
  
  On boot wait for ec2-hibinit-agent to complete hibinit-agent.service,
  and see that /swap-hibinit is created, and swapoff.
  
  $ ll /swap-hibinit
  
  Add a swapfile, and switch it on:
  
  $ sudo fallocate -l 4G /swapfile
  $ sudo dd if=/dev/zero of=/swapfile bs=1024 count=4194304
  $ sudo chmod 600 /swapfile
  $ sudo mkswap /swapfile
  $ sudo swapon /swapfile
  $ echo "/swapfile swap swap defaults 0 0" | sudo tee -a /etc/fstab
  $ swapon --show
  NAME      TYPE SIZE USED PRIO
  /swapfile file   4G   0B   -2
  
  Go back to EC2 console, "Instance State" > "Hibernate".
  
  You will see this in journalctl:
  
  Mar 15 11:41:54 ip-172-31-27-108 kernel: [ 520.121761] Adding 16095656k swap 
on /swap-hibinit. Priority:-3 extents:13 across:17611176k SSFS
  Mar 15 11:41:54 ip-172-31-27-108 root: ACPI action undefined: LNXSLPBN:00
  
  and the instance will not hibernate. EC2 console will report "Stopping"
  for 20 minutes until it times out and is force stopped.
  
  If you enable the following ppa and install the test ec2-hibinit-agent
  package:
  
+ https://launchpad.net/~mruffell/+archive/ubuntu/sf331069-test
  
  Hibernation should succeed within a minute or two.
  
  [Where problems could occur]
  
  This change will only affect users of instances where Hibernation has
  been explicitly enabled, either from the EC2 instance launch advanced
  settings, or via the "--hibernation-options Configured=true" parameter
  to the "aws ec2" command. For all other users, including those with
  swapfiles enabled, this change will have no effect.
  
  We are changing the /swap-hibinit file to be maximum priority right
  before we hibernate, to ensure it is the swapfile selected to hibernate
  to. Since we swapoff /swap-hibinit as soon as we resume, /swap-hibinit
  is used solely for hibernation, and not for regular swap space, so it is
  unlikely to cause any regressions to users with their own swapfiles
  configured with various priorities.
  
  A potential risk is users that do not use /swap-hibinit, and use their
  own swapfile for hibernation, and overwrite the changes ec2-hibinit-
  agent makes to grub files to set the resume=UUID<uuid> and
  resume_offset=<offset> values. I believe such users would likely remove
  or purge the ec2-hibinit-agent package, since hibinit-agent.service runs
  at startup and re-adds the grub configuration for /swap-hibinit whether
  you like it or not, and having /swap-hibinit around would waste disk
  space that you would be paying for. Because of this, I believe that this
  change will not break users who hibernate to their own swapfiles,
  because they would have removed ec2-hibinit-agent on instance creation.
  
  [Other info]
  
  Chris Newcomer came across the above upstream bug, which seems to be the
  same issue:
  
  https://github.com/aws/amazon-ec2-hibinit-agent/issues/20
  
  The reporter, Ben Mares, suggests a patch to /etc/acpi/actions/sleep.sh
  to either read the value of a bash environment variable swap_priority,
  or default to 10.
  
  https://github.com/aws/amazon-ec2-hibinit-agent/pull/21
  
  I'm not exactly on board with the environment variable, or the default
  magic number of 10, as we don't know how our users are setting up
  swapfiles, and what priorities they set them to. I think we should
  instead just set the priority to the maximum, 32767 instead.

** Tags added: sts

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1968805

Title:
  Hibernation fails when an additional swapfile is added due to priority
  mismatch

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ec2-hibinit-agent/+bug/1968805/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to