tags 988477 - moreinfo found 988477 4.17.2+76-ge1f9cb16e2-1~deb12u1 affects 988477 src:linux severity 988477 critical quit
I am also observing #988477 occur. This machine has a AMD Zen 4 processor. The first observation was when motherboard/processor was swapped out, the older motherboard/processor was several generations old. The pattern which is emerging is Linux MD RAID1 plus recent AMD processor which has full IOMMU functionality. The older machine was believed to have an IOMMU, but the BIOS wasn't creating appropriate ACPI tables (IVRS) and thus Xen was unable to utilize it. This seems to be occuring with a small percentage of write operations. Subsequent read operations appear to be fine. I am not convinced this is a Xen bug. I suspect this is instead a bug in the Linux MD subsystem. In particular if the DMA interface was designed assuming only a single device would ever access any page, but the MD RAID1 driver is reusing the same page for both devices. IOMMU page release could be handled by marking the page unused in a device data structure and later removed by sweeping a table. In such case if the MD-RAID1 driver was to redirect the page to another device between these two steps, the entry for a subsequent device could be wiped out when trying to invalidate an entry for a prior device. Anyway, I'm also observing bug #988477. This could also be a kernel bug. So far no crashes/confirmed data loss have occured, but sweeping the mirror does turn up small numbers of inconsistencies. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sig...@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445