Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase

Zhijian Li (Fujitsu) Wed, 20 Sep 2023 03:04:56 -0700


On 15/09/2023 19:31, William Roche wrote:
> On 9/15/23 05:13, Zhijian Li (Fujitsu) wrote:
>>
>>
>> I'm okay with "RDMA isn't touched".
>> BTW, could you share your reproducing program/hacking to poison the page, so 
>> that
>> i am able to take a look the RDMA part later when i'm free.
>>
>> Not sure it's suitable to acknowledge a not touched part. Anyway
>> Acked-by: Li Zhijian <lizhij...@fujitsu.com> # RDMA
>>
> 
> Thanks.
> As you asked for a procedure to inject memory errors into a running VM,
> I've attached to this email the source code (mce_process_react.c) of a
> program that will help to target the error injection in the VM.



I just tried you hwpoison program and do RDMA migration. Migration failed, but 
fortunately
the source side is still alive :).

(qemu) Failed to register chunk!: Bad address
Chunk details: block: 0 chunk index 671 start 139955096518656 end 
139955097567232 host 139955096518656 local 139954392924160 registrations: 636
qemu-system-x86_64: cannot get lkey
qemu-system-x86_64: rdma migration: write error! -22
qemu-system-x86_64: RDMA is in an error state waiting migration to abort!
qemu-system-x86_64: failed to save SaveStateEntry with id(name): 2(ram): -22
qemu-system-x86_64: Early error. Sending error.


Since current RDMA migration transfers guest memory in a chunk size(1M) by 
default, we may need to

option 1: reduce all chunk size to 1 page
option 2: handle the hwpoison chunk specially

However, because there may be a chance to use another protocol, it's also 
possible to temporarily not fix the issue.

Tested-by: Li Zhijian <lizhij...@fujitsu.com>

Thanks
Zhijian




> 
> (Be careful that error injection is currently nor working on AMD
> platforms -- this is a work in progress is a separate qemu thread)
> 
> 
> The general idea:
> We are going to target a process memory page running inside a VM to see
> what happens when we inject an error on the underlying physical page at
> the platform (hypervisor) level.
> To have a better view of what's going on, we'll use a process made for
> this: It's goal is to allocate a memory page, and create a SIGBUS
> handler to inform when it receives this signal. It will also wait before
> touching this page to see what happens next.
> 
>      Compiling this tool:
>      $ gcc -o mce_process_react_x86 mce_process_react.c
> 
> 
> Let's try that:
> This procedure shows the best case scenario, where an error injected at
> the platform level is reported up to the guest process using it.
> Note that qemu should be started with root privilege.
> 
>      1. Choose a process running in the VM (and identify a memory page
> you want to target, and get its physical address – crash(8) vtop can
> help with that) or run the attached mce_process_react example (compiled
> for your platform mce_process_react_[x86|arm]) with an option to be
> early informed of _AO error (-e) and wait ENTER to continue with reading
> the allocated page (-w 0):
> 
> [root@VM ]# ./mce_process_react_x86 -e -w 0
> Setting Early kill... Ok
> 
> Data pages at 0x7fa0f9b25000  physically 0x200f2fa000
> 
> Press ENTER to continue with page reading
> 
> 
>      2. Go into the VM monitor to get the translation from "Guest
> Physical Address to Host Physical Address" or "Host Virtual Address":
> 
>   (qemu) gpa2hpa 0x200f2fa000'
> Host physical address for 0x200f2fa000 (ram-node1) is 0x46f12fa000
> 
> 
>      3. Before we inject the error, we want to keep track of the VM
> console output (in a separate window).
> If you are using libvirt: # virsh console myvm
> 
> 
>      4. We now prepare for the error injection at the platform level to
> the address we found.  To do so, we'll need to use the hwpoison-inject
> module (x86)
> Be careful, as hwpoison takes Page Frame Numbers and this PFN is not the
> physical address – you need to remove the last 12 bits (the last 3 zeros
> of the above address) !
> 
> [root@hv ]# modprobe hwpoison-inject
> [root@hv ]# echo 0x46f12fa > /sys/kernel/debug/hwpoison/corrupt-pfn
> 
>         If you see "Operation not permitted" error when writing as root
> on corrupt-pfn, you may be facing a "kernel_lockdown(7)" which is
> enabled on SecureBoot systems (can be verified with
> "mokutil --sb-state"). In this case, turn SecureBoot off  (at the UEFI
> level for example)
> 
>      5. Look at the qemu output (either on the terminal where qemu was
> started or  if you are using libvirt:  tail /var/log/libvirt/qemu/myvm
> 
> 2022-08-31T13:52:25.645398Z qemu-system-x86_64: warning: Guest MCE Memory 
> Error at QEMU addr 0x7eeeace00000 and GUEST addr 0x200f200 of type 
> BUS_MCEERR_AO injected
> 
>      6. On the guest console:
> We'll see the VM reaction to the injected error:
> 
> [  155.805149] Disabling lock debugging due to kernel taint
> [  155.806174] mce: [Hardware Error]: Machine check events logged
> [  155.807120] Memory failure: 0x200f200: Killing mce_process_rea:3548 due to 
> hardware memory corruption
> [  155.808877] Memory failure: 0x200f200: recovery action for dirty LRU page: 
> Recovered
> 
>      7. The Guest process that we started at the first step gives:
> 
> Signal 7 received
> BUS_MCEERR_AO on vaddr: 0x7fa0f9b25000
> 
> At this stage, the VM has a poisoned page, and a migration of this VM
> needs to be fixed in order to avoid accessing the poisoned page.
> 
>      8. The process continues to run (as it handled the SIGBUS).
> Now if you press ENTER on this process terminal, it will try to read the
> page which will generate a new MCE (a synchronous one) at VM level which
> will be sent to this process:
> 
> Signal 7 received
> BUS_MCEERR_AR on vaddr: 0x7fa0f9b25000
> Exit from the signal handler on BUS_MCEERR_AR
> 
>      9. The VM console shows:
> [ 2520.895263] MCE: Killing mce_process_rea:3548 due to hardware memory 
> corruption fault at 7f45e5265000
> 
>      10. The VM continues to run...
> With a poisoned page in its address space
> 
> HTH,
> William.

Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase

Reply via email to