On 2/18/23 11:22, Gregory Price wrote:
> Breaking this off into a separate thread for archival sake.
> 
> There's a bug with handling execution of instructions held in CXL
> memory - specifically when an instruction crosses a page boundary.
> 
> The result of this is that type-3 devices cannot use KVM at all at the
> moment, and require the attached patch to run in TCG-only mode.
> 
> 
> CXL memory devices are presently emulated as MMIO, and MMIO has no
> coherency guarantees, so TCG doesn't cache the results of translating
> an instruction, meaning execution is incredibly slow (orders of
> magnitude slower than KVM).
> 
> 
> Request for comments:
> 
> 
> First there's the stability issue:
> 
> 0) TCG cannot handle instructions across a page boundary spanning ram and
>     MMIO. See attached patch for hotfix.  This basically solves the page
>     boundary issue by reverting the entire block to MMIO-mode if the
>     problem is detected.
> 
> 1) KVM needs to be investigated.  It's likely the same/similar issue,
>     but it's not confirmed.

I ran into an issue with KVM as well. However, it wasn't a page boundary 
spanning issue, since I could hit it when using pure CXL backed memory 
for a given application. It turned out that (at least) certain AVX 
instructions didn't handle execution from MMIO when using qemu. This 
generated an illegal instruction exception for the application. At that 
point, I switched to tcg, so I didn't investigate if running a non-AVX 
system would work with KVM.

> Second there's the performance issue:
> 
> 0) Do we actually care about performance? How likely are users to
>     attempt to run software out of CXL memory?
> 
> 1) If we do care, is there a potential for converting CXL away from the
>     MMIO design?  The issue is coherency for shared memory. Emulating
>     coherency is a) hard, and b) a ton of work for little gain.
> 
>     Presently marking CXL memory as MMIO basically enforces coherency by
>     preventing caching, though it's unclear how this is enforced
>     by KVM (or if it is, i have to imagine it is). 

Having the option of doing device specific processing of accesses to a 
CXL type 3 device (that the MMIO based access allows) is useful for 
experimentation with device functionality, so I would be sad to see that 
option go away. Emulating cache line access to a type 3 device would be 
interesting, and could potentially be implemented in a way that would 
allow caching of device memory in a shadow page in RAM, but that it a 
rather large project.

> It might be nice to solve this for non-shared memory regions, but
> testing functionality >>> performance at this point so it might not
> worth the investment.

Thanks,
Jorgen

Reply via email to