On 15/09/2023 19:31, William Roche wrote: > On 9/15/23 05:13, Zhijian Li (Fujitsu) wrote: >> >> >> I'm okay with "RDMA isn't touched". >> BTW, could you share your reproducing program/hacking to poison the page, so >> that >> i am able to take a look the RDMA part later when i'm free. >> >> Not sure it's suitable to acknowledge a not touched part. Anyway >> Acked-by: Li Zhijian <lizhij...@fujitsu.com> # RDMA >> > > Thanks. > As you asked for a procedure to inject memory errors into a running VM, > I've attached to this email the source code (mce_process_react.c) of a > program that will help to target the error injection in the VM.
I just tried you hwpoison program and do RDMA migration. Migration failed, but fortunately the source side is still alive :). (qemu) Failed to register chunk!: Bad address Chunk details: block: 0 chunk index 671 start 139955096518656 end 139955097567232 host 139955096518656 local 139954392924160 registrations: 636 qemu-system-x86_64: cannot get lkey qemu-system-x86_64: rdma migration: write error! -22 qemu-system-x86_64: RDMA is in an error state waiting migration to abort! qemu-system-x86_64: failed to save SaveStateEntry with id(name): 2(ram): -22 qemu-system-x86_64: Early error. Sending error. Since current RDMA migration transfers guest memory in a chunk size(1M) by default, we may need to option 1: reduce all chunk size to 1 page option 2: handle the hwpoison chunk specially However, because there may be a chance to use another protocol, it's also possible to temporarily not fix the issue. Tested-by: Li Zhijian <lizhij...@fujitsu.com> Thanks Zhijian > > (Be careful that error injection is currently nor working on AMD > platforms -- this is a work in progress is a separate qemu thread) > > > The general idea: > We are going to target a process memory page running inside a VM to see > what happens when we inject an error on the underlying physical page at > the platform (hypervisor) level. > To have a better view of what's going on, we'll use a process made for > this: It's goal is to allocate a memory page, and create a SIGBUS > handler to inform when it receives this signal. It will also wait before > touching this page to see what happens next. > > Compiling this tool: > $ gcc -o mce_process_react_x86 mce_process_react.c > > > Let's try that: > This procedure shows the best case scenario, where an error injected at > the platform level is reported up to the guest process using it. > Note that qemu should be started with root privilege. > > 1. Choose a process running in the VM (and identify a memory page > you want to target, and get its physical address – crash(8) vtop can > help with that) or run the attached mce_process_react example (compiled > for your platform mce_process_react_[x86|arm]) with an option to be > early informed of _AO error (-e) and wait ENTER to continue with reading > the allocated page (-w 0): > > [root@VM ]# ./mce_process_react_x86 -e -w 0 > Setting Early kill... Ok > > Data pages at 0x7fa0f9b25000 physically 0x200f2fa000 > > Press ENTER to continue with page reading > > > 2. Go into the VM monitor to get the translation from "Guest > Physical Address to Host Physical Address" or "Host Virtual Address": > > (qemu) gpa2hpa 0x200f2fa000' > Host physical address for 0x200f2fa000 (ram-node1) is 0x46f12fa000 > > > 3. Before we inject the error, we want to keep track of the VM > console output (in a separate window). > If you are using libvirt: # virsh console myvm > > > 4. We now prepare for the error injection at the platform level to > the address we found. To do so, we'll need to use the hwpoison-inject > module (x86) > Be careful, as hwpoison takes Page Frame Numbers and this PFN is not the > physical address – you need to remove the last 12 bits (the last 3 zeros > of the above address) ! > > [root@hv ]# modprobe hwpoison-inject > [root@hv ]# echo 0x46f12fa > /sys/kernel/debug/hwpoison/corrupt-pfn > > If you see "Operation not permitted" error when writing as root > on corrupt-pfn, you may be facing a "kernel_lockdown(7)" which is > enabled on SecureBoot systems (can be verified with > "mokutil --sb-state"). In this case, turn SecureBoot off (at the UEFI > level for example) > > 5. Look at the qemu output (either on the terminal where qemu was > started or if you are using libvirt: tail /var/log/libvirt/qemu/myvm > > 2022-08-31T13:52:25.645398Z qemu-system-x86_64: warning: Guest MCE Memory > Error at QEMU addr 0x7eeeace00000 and GUEST addr 0x200f200 of type > BUS_MCEERR_AO injected > > 6. On the guest console: > We'll see the VM reaction to the injected error: > > [ 155.805149] Disabling lock debugging due to kernel taint > [ 155.806174] mce: [Hardware Error]: Machine check events logged > [ 155.807120] Memory failure: 0x200f200: Killing mce_process_rea:3548 due to > hardware memory corruption > [ 155.808877] Memory failure: 0x200f200: recovery action for dirty LRU page: > Recovered > > 7. The Guest process that we started at the first step gives: > > Signal 7 received > BUS_MCEERR_AO on vaddr: 0x7fa0f9b25000 > > At this stage, the VM has a poisoned page, and a migration of this VM > needs to be fixed in order to avoid accessing the poisoned page. > > 8. The process continues to run (as it handled the SIGBUS). > Now if you press ENTER on this process terminal, it will try to read the > page which will generate a new MCE (a synchronous one) at VM level which > will be sent to this process: > > Signal 7 received > BUS_MCEERR_AR on vaddr: 0x7fa0f9b25000 > Exit from the signal handler on BUS_MCEERR_AR > > 9. The VM console shows: > [ 2520.895263] MCE: Killing mce_process_rea:3548 due to hardware memory > corruption fault at 7f45e5265000 > > 10. The VM continues to run... > With a poisoned page in its address space > > HTH, > William.