On 2/18/23 11:22, Gregory Price wrote: > Breaking this off into a separate thread for archival sake. > > There's a bug with handling execution of instructions held in CXL > memory - specifically when an instruction crosses a page boundary. > > The result of this is that type-3 devices cannot use KVM at all at the > moment, and require the attached patch to run in TCG-only mode. > > > CXL memory devices are presently emulated as MMIO, and MMIO has no > coherency guarantees, so TCG doesn't cache the results of translating > an instruction, meaning execution is incredibly slow (orders of > magnitude slower than KVM). > > > Request for comments: > > > First there's the stability issue: > > 0) TCG cannot handle instructions across a page boundary spanning ram and > MMIO. See attached patch for hotfix. This basically solves the page > boundary issue by reverting the entire block to MMIO-mode if the > problem is detected. > > 1) KVM needs to be investigated. It's likely the same/similar issue, > but it's not confirmed.
I ran into an issue with KVM as well. However, it wasn't a page boundary spanning issue, since I could hit it when using pure CXL backed memory for a given application. It turned out that (at least) certain AVX instructions didn't handle execution from MMIO when using qemu. This generated an illegal instruction exception for the application. At that point, I switched to tcg, so I didn't investigate if running a non-AVX system would work with KVM. > Second there's the performance issue: > > 0) Do we actually care about performance? How likely are users to > attempt to run software out of CXL memory? > > 1) If we do care, is there a potential for converting CXL away from the > MMIO design? The issue is coherency for shared memory. Emulating > coherency is a) hard, and b) a ton of work for little gain. > > Presently marking CXL memory as MMIO basically enforces coherency by > preventing caching, though it's unclear how this is enforced > by KVM (or if it is, i have to imagine it is). Having the option of doing device specific processing of accesses to a CXL type 3 device (that the MMIO based access allows) is useful for experimentation with device functionality, so I would be sad to see that option go away. Emulating cache line access to a type 3 device would be interesting, and could potentially be implemented in a way that would allow caching of device memory in a shadow page in RAM, but that it a rather large project. > It might be nice to solve this for non-shared memory regions, but > testing functionality >>> performance at this point so it might not > worth the investment. Thanks, Jorgen