Re: [Qemu-devel] How can I understand iotlb (IOMMU)

Rob Landley Sat, 20 Aug 2011 15:03:51 -0700

On 02/20/2011 09:15 PM, wang sheng wrote:
> I can't understand CPUState's  iotlb field , Why we need iotlb ?


A memory management unit (MMU) translates virtual to physical addresses
using page tables.  Page tables are a tree structure are stored in
memory, and can get REALLY BIG.  If the chip had to go out to physical
memory and walk the page tables each time it needed to translate a
virtual address to a physical address, it would perform 3-4 MMU memory
accesses for each access an instruction tried to make that needed
translating.

So the chip has a "Translation Lookaside Buffer", which is an address
cache that records the virtual->physical mappings for the last few pages
it had to look up.  Each TLB entry has the virtual and physical start
addreses, the length of the chunk of memory this represents, and the
permission flags (can I read, can I write, can I execute) that apply to
that memory (as seen through that mapping).  Each virtual instruction
checks the TLB first to see if it has that virtual address lookup
cached, just like it checks the L1 and L2 caches to see if the data is
there before going out to physical memory for it.

When the CPU tries to call a virtual address it hasn't got a TLB entry
for (or has one with the wrong permissions), it generates a "soft page
fault" which calls an interrupt handler to deal with it.  The handler
can resolve the fault by picking an existing TLB entry to discard (to
make room), walking the page tables in physical memory to look up the
virtual->physical mapping (and possibly do fixups on the page tables
like allocating a new page of physical memory if there isn't currently
one assigned; this happens for sparse mappings, copy on write,
swapping...), loading the new TLB entry from the page tables, and then
returning from the interrupt to let the program proceed.  Or it can kill
the program (with a signal) if it made an illegal access.  (The program
can set up a signal handler, normal unix semantics apply.  I think this
is "SIGBUS".)

The Linux kernel actually reserves one or two TLB entries for itself,
using a "huge page".  This is one giant TLB entry (the length field is a
gigabyte or two long) that maps all the kernel's memory to a big linear
range of physical addresses containing the kernel's code and the
kernel's data.  When the CPU goes into userspace it removes the
read/write/execute bits for these entries (so attempts to dereference
those addresses will generate a page fault), but keeps the TLB entry in
place.  Then when you make a system call, it can just switch the access
back on and the kernel data is still cached.

Note that cache entries attach to TLB entries.  If you flush the TLB,
you flush the L1 cache too.  (Note: having a TLB entry doesn't mean you
have the data in L1 or L2 cache.  Normal sized TLB entries are page
sized, I.E. 4k, and cache lines are much smaller (64 bytes or 128 bytes
is fairly common), so one TLB entry may cover 32 cache lines.  But if
the TLB entry isn't there you need to do multiple physical memory
lookups to descend through the page table tree to find it, each costing
hundreds or even thousands of cycles of latency.  When you flush a TLB
entry its cache lines get flushed back and discarded too, because the
CPU no longer knows where they live without the TLB entry.)

When you switch page tables (schedule a process), you flush the TLB
entries belonging to the old page table, which means flushing the cache.
 Huge amounts of modern performance optimization involve TLB and cache
management, only flushing what you actually NEED to and keeping the rest
around if you can to avoid reloading it.  (I.E. defer TLB flushing in
case this process swaps back to the other one quickly.  If you can keep
around cached data you currently can't access in this context because
these page tables have no mapping for it, you don't have to load it back
in when you switch back to the other context that CAN access it.)

Rob

Re: [Qemu-devel] How can I understand iotlb (IOMMU)

Reply via email to