On Tue, May 25, 2010 at 5:00 PM, Artyom Tarasenko <atar4q...@googlemail.com> wrote: > 2010/5/21 Blue Swirl <blauwir...@gmail.com>: >> On Fri, May 21, 2010 at 5:23 PM, Artyom Tarasenko >> <atar4q...@googlemail.com> wrote: >>> 2010/5/10 Blue Swirl <blauwir...@gmail.com>: >>>> On 5/10/10, Artyom Tarasenko <atar4q...@googlemail.com> wrote: >>>>> 2010/5/10 Blue Swirl <blauwir...@gmail.com>: >>>>> >>>>> > On 5/10/10, Artyom Tarasenko <atar4q...@googlemail.com> wrote: >>>>> >> 2010/5/9 Blue Swirl <blauwir...@gmail.com>: >>>>> >> > On 5/9/10, Artyom Tarasenko <atar4q...@googlemail.com> wrote: >>>>> >> >> 2010/5/9 Blue Swirl <blauwir...@gmail.com>: >>>>> >> >> >>>>> >> >> > On 5/8/10, Artyom Tarasenko <atar4q...@googlemail.com> wrote: >>>>> >> >> >> On the real hardware (SS-5, LX) the MMU is not padded, but >>>>> aliased. >>>>> >> >> >> Software shouldn't use aliased addresses, neither should it >>>>> crash >>>>> >> >> >> when it uses (on the real hardware it wouldn't). Using >>>>> empty_slot >>>>> >> >> >> instead of aliasing can help with debugging such accesses. >>>>> >> >> > >>>>> >> >> > TurboSPARC Microprocessor User's Manual shows that there are >>>>> >> >> > additional pages after the main IOMMU for AFX registers. So >>>>> this is >>>>> >> >> > not board specific, but depends on CPU/IOMMU versions. >>>>> >> >> >>>>> >> >> >>>>> >> >> I checked it on the real hw: on LX and SS-5 these are aliased MMU >>>>> addresses. >>>>> >> >> SS-20 doesn't have any aliasing. >>>>> >> > >>>>> >> > But are your machines equipped with TurboSPARC or some other CPU? >>>>> >> >>>>> >> >>>>> >> Good point, I must confess, I missed the word "Turbo" in your first >>>>> >> answer. LX and SS-20 don't. >>>>> >> But SS-5 must have a TurboSPARC CPU: >>>>> >> >>>>> >> ok cd /FMI,MB86904 >>>>> >> ok .attributes >>>>> >> context-table 00 00 00 00 03 ff f0 00 00 00 10 00 >>>>> >> psr-implementation 00000000 >>>>> >> psr-version 00000004 >>>>> >> implementation 00000000 >>>>> >> version 00000004 >>>>> >> cache-line-size 00000020 >>>>> >> cache-nlines 00000200 >>>>> >> page-size 00001000 >>>>> >> dcache-line-size 00000010 >>>>> >> dcache-nlines 00000200 >>>>> >> dcache-associativity 00000001 >>>>> >> icache-line-size 00000020 >>>>> >> icache-nlines 00000200 >>>>> >> icache-associativity 00000001 >>>>> >> ncaches 00000002 >>>>> >> mmu-nctx 00000100 >>>>> >> sparc-version 00000008 >>>>> >> mask_rev 00000026 >>>>> >> device_type cpu >>>>> >> name FMI,MB86904 >>>>> >> >>>>> >> and still it behaves the same as TI,TMS390S10 from the LX. This is >>>>> done on SS-5: >>>>> >> >>>>> >> ok 10000000 20 spacel@ . >>>>> >> 4000009 >>>>> >> ok 14000000 20 spacel@ . >>>>> >> 4000009 >>>>> >> ok 14000004 20 spacel@ . >>>>> >> 23000 >>>>> >> ok 1f000004 20 spacel@ . >>>>> >> 23000 >>>>> >> ok 10000008 20 spacel@ . >>>>> >> 4000009 >>>>> >> ok 14000028 20 spacel@ . >>>>> >> 4000009 >>>>> >> ok 1000000c 20 spacel@ . >>>>> >> 23000 >>>>> >> ok 10000010 20 spacel@ . >>>>> >> 4000009 >>>>> >> >>>>> >> >>>>> >> LX is the same except for the IOMMU-version: >>>>> >> >>>>> >> ok 10000000 20 spacel@ . >>>>> >> 4000005 >>>>> >> ok 14000000 20 spacel@ . >>>>> >> 4000005 >>>>> >> ok 18000000 20 spacel@ . >>>>> >> 4000005 >>>>> >> ok 1f000000 20 spacel@ . >>>>> >> 4000005 >>>>> >> ok 1ff00000 20 spacel@ . >>>>> >> 4000005 >>>>> >> ok 1fff0004 20 spacel@ . >>>>> >> 1fe000 >>>>> >> ok 10000004 20 spacel@ . >>>>> >> 1fe000 >>>>> >> ok 10000108 20 spacel@ . >>>>> >> 41000005 >>>>> >> ok 10000040 20 spacel@ . >>>>> >> 41000005 >>>>> >> ok 1fff0040 20 spacel@ . >>>>> >> 41000005 >>>>> >> ok 1fff0044 20 spacel@ . >>>>> >> 1fe000 >>>>> >> ok 1fff0024 20 spacel@ . >>>>> >> 1fe000 >>>>> >> >>>>> >> >>>>> >> >> At what address the additional AFX registers are located? >>>>> >> > >>>>> >> > Here's complete TurboSPARC IOMMU address map: >>>>> >> > PA[30:0] Register Access >>>>> >> > 1000_0000 IOMMU Control R/W >>>>> >> > 1000_0004 IOMMU Base Address R/W >>>>> >> > 1000_0014 Flush All IOTLB Entries W >>>>> >> > 1000_0018 Address Flush W >>>>> >> > 1000_1000 Asynchronous Fault Status R/W >>>>> >> > 1000_1004 Asynchronous Fault Address R/W >>>>> >> > 1000_1010 SBus Slot Configuration 0 R/W >>>>> >> > 1000_1014 SBus Slot Configuration 1 R/W >>>>> >> > 1000_1018 SBus Slot Configuration 2 R/W >>>>> >> > 1000_101C SBus Slot Configuration 3 R/W >>>>> >> > 1000_1020 SBus Slot Configuration 4 R/W >>>>> >> > 1000_1050 Memory Fault Status R/W >>>>> >> > 1000_1054 Memory Fault Address R/W >>>>> >> > 1000_2000 Module Identification R/W >>>>> >> > 1000_3018 Mask Identification R >>>>> >> > 1000_4000 AFX Queue Level W >>>>> >> > 1000_6000 AFX Queue Level R >>>>> >> > 1000_7000 AFX Queue Status R >>>>> >> >>>>> >> >>>>> >> >>>>> >> But if I read it correctly 0x12fff294 (which makes SunOS crash with >>>>> -m 32) is >>>>> >> well above this limit. >>>>> > >>>>> > Oh, so I also misread something. You are not talking about the >>>>> > adjacent pages, but 16MB increments. >>>>> > >>>>> > Earlier I sent a patch for a generic address alias device, would it be >>>>> > useful for this? >>>>> >>>>> >>>>> Should do as well. But I thought empty_slot is less overhead and >>>>> easier to debug. >>>>> >>> >>> Also the aliasing patch would require one more parameter: the size of >>> area which has to be aliased. Except we implement stubs for all >>> missing devices and and do aliasing of the connected port ranges. And >>> then again, SS-20 doesn't have aliasing in this area at all. >>> >>> What do you think about this (empty_slot) solution (except that I >>> missed the SoB line)? Meanwhile it's tested with SunOS 4.1.3U1 too. >> >> I'm slightly against it, of course it would help for this but I think >> we may be missing a bigger problem. >> >>>>>> Maybe we have a general design problem, perhaps unassigned access >>>>>> faults should only be triggered inside SBus slots and ignored >>>>>> elsewhere. If this is true, generic Sparc32 unassigned access handler >>>>>> should just ignore the access and special fault generating slots >>>>>> should be installed for empty SBus address ranges. >>> >>> Agreed that they should be special for SBus, because SS-20 OBP is >>> not happy with the fault we are currently generating. But otherwise I think >>> qemu >>> does it correct. On SS-5: >>> >>> ok f7ff0000 2f spacel@ . >>> Data Access Error >>> ok sfar@ . >>> f7ff0000 >>> ok 20000000 2f spacel@ . >>> Data Access Error >>> ok sfar@ . >>> 20000000 >>> ok 40000000 20 spacel@ . >>> Data Access Error >>> ok sfar@ . >>> 40000000 >>> >>> Neither ff7ff0000 nor f20000000, nor 40000000 are in SBus range, right? >> >> 40000000 is on SS-5. > > Ah. I was only aware of the control space. What ranges does SBus take?
On SS-5, 30000000 to 7fffffff, each slot taking 10000000. There's AFX bus on 20000000. OBP property '/iommu/sbus/ranges' shows these (also other ranges) > >> So is the SBus Control Space in 0x10000000 to >> 0x1fffffff the only area besides DRAM where the accesses won't trap? > > At least some area after ROM is aliased too. Also on SS-10 with a > non-active frame buffer > writing to SX registers makes no visible effect and reading from them > produces no fault but a NMI. Then we should cover the whole area after IOMMU with empty slot device. ROM probably doesn't matter. >>>>> My impression was that SS-5 and SS-20 do unassigned accesses a bit >>>>> differently. >>>>> The current IOMMU implementation fits SS-20, which has no aliasing. >>>> >>>> It's probably rather the board design than just IOMMU. >>> >>> Agreed. That's why I bound the patch to machine hwdef and not to iommu. >>> >>>>> >> >> > One approach would be that IOMMU_NREGS would be increased to >>>>> cover >>>>> >> >> > these registers (with the bump in savevm version field) and >>>>> >> >> > iommu_init1() should check the version field to see how much >>>>> MMIO to >>>>> >> >> > provide. >>>>> >> >> >>>>> >> >> >>>>> >> >> The problem I see here is that we already have too much >>>>> registers: we >>>>> >> >> emulate SS-20 IOMMU (I guess), while SS-5 and LX seem to have >>>>> only >>>>> >> >> 0x20 registers which are aliased all the way. >>>>> >> >> >>>>> >> >> >>>>> >> >> > But in order to avoid the savevm version change, iommu_init1() >>>>> could >>>>> >> >> > just install dummy MMIO (in the TurboSPARC case), if OBP does >>>>> not care >>>>> >> >> > if the read back data matches what has been written earlier. >>>>> Because >>>>> >> >> > from OBP point of view this is identical to what your patch >>>>> results >>>>> >> >> > in, I'd suppose this approach would also work. >>>>> >> >> >>>>> >> >> >>>>> >> >> OBP doesn't seem to care about these addresses at all. It's only >>>>> the "MUNIX" >>>>> >> >> SunOS 4.1.4 kernel who does. The "MUNIX" kernel is the only >>>>> kernel available >>>>> >> >> during the installation, so it is currently not possible to >>>>> install 4.1.4. >>>>> >> >> Surprisingly "GENERIC" kernel which is on the disk after the >>>>> >> >> installation doesn't >>>>> >> >> try to access these address ranges either, so a disk image taken >>>>> from a live >>>>> >> >> system works. >>>>> >> >> >>>>> >> >> Actually access to the non-connected/aliased addresses may also >>>>> be a >>>>> >> >> consequence of phys_page_find bug I mentioned before. When I run >>>>> >> >> install with -m 64 and -m 256 it tries to access different >>>>> >> >> non-connected addresses. May also be a SunOS bug of course. 256m >>>>> used >>>>> >> >> to be a lot back then. >>>>> >> > >>>>> >> > Perhaps with 256MB, memory probing advances blindly from memory to >>>>> >> > IOMMU registers. Proll (used before OpenBIOS) did that once, with >>>>> bad >>>>> >> > results :-). If this is true, 64M, 128M and 192M should show >>>>> identical >>>>> >> > results and only with close or equal to 256M the accesses happen. >>>>> >> >>>>> >> >>>>> >> 32m: 0x12fff294 >>>>> >> 64m: 0x14fff294 >>>>> >> 192m:0x1cfff294 >>>>> >> 256m:0x20fff294 >>>>> >> >>>>> >> Memory probing? It would be strange that OS would do it itself. The >>>>> OS >>>>> >> could just >>>>> >> ask OBP how much does it have. Here is the listing where it happens: >>>>> >> >>>>> >> _swift_vac_rgnflush: rd %psr, %g2 >>>>> >> _swift_vac_rgnflush+4: andn %g2, 0x20, %g5 >>>>> >> _swift_vac_rgnflush+8: mov %g5, %psr >>>>> >> _swift_vac_rgnflush+0xc: nop >>>>> >> _swift_vac_rgnflush+0x10: nop >>>>> >> _swift_vac_rgnflush+0x14: mov 0x100, %g5 >>>>> >> _swift_vac_rgnflush+0x18: lda [%g5] 0x4, %g5 >>>>> >> _swift_vac_rgnflush+0x1c: sll %o2, 0x2, %g1 >>>>> >> _swift_vac_rgnflush+0x20: sll %g5, 0x4, %g5 >>>>> >> _swift_vac_rgnflush+0x24: add %g5, %g1, %g5 >>>>> >> _swift_vac_rgnflush+0x28: lda [%g5] 0x20, %g5 >>>>> >> >>>>> >> _swift_vac_rgnflush+0x28: is the fatal one. >>>>> >> >>>>> >> kadb> $c >>>>> >> _swift_vac_rgnflush(?) >>>>> >> _vac_rgnflush() + 4 >>>>> >> _hat_setup_kas(0xc00,0xf0447000,0x43a000,0x400,0xf043a000,0x3c0) + 70 >>>>> >> _startup(0xfe000000,0x10000000,0xfa000000,0xf00e2bfc,0x10,0xdbc00) + >>>>> 1414 >>>>> >> _main(0xf00e0fb4,0xf0007810,0x293ff49f,0xa805209c,0x200,0xf00d1d18) >>>>> + 14 >>>>> >> >>>>> >> Unfortunately (but not surprisingly) kadb doesn't allow debugging >>>>> >> cache-flush code, so I can't check what is in >>>>> >> [%g5] (aka sfar) on the real machine when this happens. >>>>> > >>>>> > Linux code for Swift/TurboSPARC VAC flush should be similar. >>> >>> Do you have an idea why would anyone try reading a value referenced in sfar? >>> Especially during flushing? I can't imagine a case where it wouldn't >>> produce a fault. >> >> No idea, the fault should be inevitable. An explanation how VAC >> (Virtually Addressed Cache?) works could help. > > Is it available somewhere? An explanation how PAC works is interesting > too, cause when emulating SS-20, Solaris boots hangs where it normally > says that PAC is initialized. > >>>>> >> But the bug in phys_page_find would explain this accesses: sfar gets >>>>> >> the wrong address, and then the secondary access happens on this >>>>> wrong >>>>> >> address instead of the original one. >>>>> > >>>>> > I doubt phys_page_find can be buggy, it is so vital for all >>>>> architecture. >>>>> >>>>> >>>>> But you've seen the example of buggy behaviour I posted last Friday, >>>>> right? >>>>> If it's not phys_page_find, it's either cpu_physical_memory_rw (which >>>>> is also pretty generic), or >>>>> the way SS-20 registers devices. Can it be that all the pages must be >>>>> registered in the proper order? >>>> >>>> How about unassigned access handler, could it be suspected? >>> >>> Doesn't look like it: it gets a physical address as a parameter. How >>> would it know the address is wrong? >> >> It wouldn't, but IIRC Paul claimed earlier that the unassigned memory >> handling in QEMU could have problems. > > But I thought Paul also fixed the problems? There was a patch from him. > >>>>> I think it's a pretty rare use case where you have a memory fault (not >>>>> a translation fault) on an unknown address. You may have such fault >>>>> during device probing, but in such case you know what address you are >>>>> probing, so you don't care about the sync fault address register. >>>>> >>>>> Besides, do all architectures have sync fault address register? >>>> >>>> No, I think system level checks like that and IOMMU-like controls on >>>> most architectures are very poor compared to Sparc32. Server and >>>> mainframe systems may be a bit better. >>> >>> And do we have any mainframe emulated good enough to have a user base >>> and hence bug reports? >> >> The only IOMMU implemented is Sparc32 one so far. I don't know about >> S390x architecture, that should definitely be mainframe class. AMD >> IOMMU may be in QEMU one day. >> >> About bugs, IIRC NetBSD 3.x crash could be related to IOMMU. > > What does indicate it? It happens where the disk sizes are normally > reported, so it could be a scsi/dma/irq/fpu issue as well. IIRC the DVMA address was 0xfc004000, but the mapped entries were for 0xfc000000 to 0xfc003fff. > >>>>> >> fwiw the routine is called only once on the real hardware. It sort of >>>>> >> speaks for your hypothesis about the memory probing. Although it may >>>>> >> not necessarily probe for memory... >>>>> >> > > > -- > Regards, > Artyom Tarasenko > > solaris/sparc under qemu blog: http://tyom.blogspot.com/ >