On 02/20/2011 09:15 PM, wang sheng wrote: > I can't understand CPUState's iotlb field , Why we need iotlb ?
A memory management unit (MMU) translates virtual to physical addresses using page tables. Page tables are a tree structure are stored in memory, and can get REALLY BIG. If the chip had to go out to physical memory and walk the page tables each time it needed to translate a virtual address to a physical address, it would perform 3-4 MMU memory accesses for each access an instruction tried to make that needed translating. So the chip has a "Translation Lookaside Buffer", which is an address cache that records the virtual->physical mappings for the last few pages it had to look up. Each TLB entry has the virtual and physical start addreses, the length of the chunk of memory this represents, and the permission flags (can I read, can I write, can I execute) that apply to that memory (as seen through that mapping). Each virtual instruction checks the TLB first to see if it has that virtual address lookup cached, just like it checks the L1 and L2 caches to see if the data is there before going out to physical memory for it. When the CPU tries to call a virtual address it hasn't got a TLB entry for (or has one with the wrong permissions), it generates a "soft page fault" which calls an interrupt handler to deal with it. The handler can resolve the fault by picking an existing TLB entry to discard (to make room), walking the page tables in physical memory to look up the virtual->physical mapping (and possibly do fixups on the page tables like allocating a new page of physical memory if there isn't currently one assigned; this happens for sparse mappings, copy on write, swapping...), loading the new TLB entry from the page tables, and then returning from the interrupt to let the program proceed. Or it can kill the program (with a signal) if it made an illegal access. (The program can set up a signal handler, normal unix semantics apply. I think this is "SIGBUS".) The Linux kernel actually reserves one or two TLB entries for itself, using a "huge page". This is one giant TLB entry (the length field is a gigabyte or two long) that maps all the kernel's memory to a big linear range of physical addresses containing the kernel's code and the kernel's data. When the CPU goes into userspace it removes the read/write/execute bits for these entries (so attempts to dereference those addresses will generate a page fault), but keeps the TLB entry in place. Then when you make a system call, it can just switch the access back on and the kernel data is still cached. Note that cache entries attach to TLB entries. If you flush the TLB, you flush the L1 cache too. (Note: having a TLB entry doesn't mean you have the data in L1 or L2 cache. Normal sized TLB entries are page sized, I.E. 4k, and cache lines are much smaller (64 bytes or 128 bytes is fairly common), so one TLB entry may cover 32 cache lines. But if the TLB entry isn't there you need to do multiple physical memory lookups to descend through the page table tree to find it, each costing hundreds or even thousands of cycles of latency. When you flush a TLB entry its cache lines get flushed back and discarded too, because the CPU no longer knows where they live without the TLB entry.) When you switch page tables (schedule a process), you flush the TLB entries belonging to the old page table, which means flushing the cache. Huge amounts of modern performance optimization involve TLB and cache management, only flushing what you actually NEED to and keeping the rest around if you can to avoid reloading it. (I.E. defer TLB flushing in case this process swaps back to the other one quickly. If you can keep around cached data you currently can't access in this context because these page tables have no mapping for it, you don't have to load it back in when you switch back to the other context that CAN access it.) Rob