Hi, resending to ML.
On Sat, Sep 28, 2013 at 1:25 PM, Beniamino Galvani <[email protected]> wrote: > Hi, > > I have a branch (lp:~bengal/helenos/raspberrypi) with support for > Raspberry PI which uses a ARMv6 CPU. I'm trying to merge latest > changes from mainline into my branch, but I'm experiencing crashes > like this at the startup: > > Inflating components ... initrd fat logger vfs rd locsrv init loader ns > kernel . > Booting the kernel... > SPARTAN kernel, release 0.5.0 (Fajtl), revision 1781M > ([email protected]) > Built on 2013-09-28 10:02:31 for arm32 > Copyright (c) 2001-2013 HelenOS project > BCM2835 framebuffer at 0x5c006000 (640x480) > Detected 1 CPU(s), 458720 KiB free memory > Program loader at 0xf0200000 > > ######> Kernel panic on cpu0 due to a bad memory access while loading from > address 0xf0240000. <###### > > THE=0x80862000: pe=0 thr=0x80861000 task=0x80860000 cpu=0x80853800 > as=0x80245000 magic=0xfacefeed > r0 =0xf0240000 r1 =0x80245078 r2 =0x00000000 r3 =0x464c457f > r4 =0xf0240000 r5 =0xf0240000 r6 =0x80245078 r7 =0x80863e14 > r8 =0x80a62984 r9 =0x80a4bd0c r10=0x80a4bd0c fp =0x80863dd4 > r12=0x80863dd8 sp =0x80863d80 lr =0x80a15efc spsr=0xa0000053 > > 0x80863c8c: generic/src/debug/stacktrace.o:stack_trace()+0x0000001c > 0x80863cbc: generic/src/debug/panic.o:panic_common()+0x00000154 > 0x80863cfc: generic/src/mm/as.o:as_page_fault()+0x00000208 > 0x80863d34: generic/src/interrupt/interrupt.o:exc_dispatch()+0x00000100 > 0x80863dd4: > arch/arm32/src/exc_handler.o:data_abort_exception_entry()+0x000000b4 > 0x80863df4: generic/src/proc/program.o:program_create_from_image()+0x00000038 > 0x80863fd4: generic/src/main/kinit.o:kinit()+0x00000264 > 0x80863ff4: generic/src/proc/thread.o:cushion()+0x0000006c > cpu0: halted > > -- > > Looking at the changes, I found that the commit that introduces this > behaviour is: > > http://trac.helenos.org/changeset/mainline%2C1912 > > and, in particular, the specific change that seems to make the difference is: > > - * turn it on for normal memory. */ > - p->shareable = 1; > + * turn it off for normal memory. */ > + p->shareable = 0; > > in kernel/arch/arm32/include/arch/mm/page_armv6.h; setting the bit to > 1 like it was before solves my problem. > > I've searched both the ARM v7-A ARM and the ARM1176JZF-S TRM, but I > still don't understand what's the actual effect of that bit in the > page descriptor, so I'm asking to you if 1 it's the right value also > for a ARMv6 CPU and if you have any clue on what is causing the crash. Thanks for keeping the raspberry pi branch alive! HelenOS currently uses memory attributes without TEX remap, so shareability bit indicates whether the mapped memory can be shared between multiple observers and requires hw to maintain memory coherency. While it sounds nice, CPU implementations are free to implement the hw coherent memory as non-cacheable. So setting the shareable bit effectively disables caches on these implementations (like Cortex-A8). Dealing with shareability domains will be fun when we add support for multiprocessor extensions. The commit you mention merges multiple ARM changes, and among them improvements to enable caches and cacheable memory mapping. Enabling caches led to more than 10x speedup (13x on BBxM). I'm sorry for the breakage, but cache related things are rather tricky even if one has access to hw. You options are: a) disable caches for armv6 (ifdef the shareability setting) b) fix cache usage on armv6. Obviously, it would be better if you could fix helenos to use caches. I don't know armv6 and the BCM chip on raspberry well enough to have definite answers, my suspicion is that enabling caches causes incoherent page tables, here are few hints you can try. pt_coherence (page_armv6.h) macro makes sure pagetable information are flushed to the Point of Unification (PoU) [0]: * you can try changing the pt_coherence macro to flush to the Point of Coherence [1] (DCCMVAU -> DCCMVAC, this is probably the easiest) * you can try using write-through caches instead of writeback-writeallocate[2] (the speedup is noticeable so you should be able to tell, see page_armv6.h:259 for info on what to change and how) * you can try flushing all caches to PoC before flushing TLB on as switch if any of the above fixes the issue, it confirms that memory coherence is the cause. Note than none of the above is a real fix. It might be that I enabled some the necessary actions only for armv7. the basic requirements are that info is present in page-walk accessible memory (pt_coherence macro), and MMU is allowed to use caches for pt-walks (set_ptl0_addr). There might be similar issues with smc_* macros on armv6. I have attached a patch that reuses more of arm7 code for armv6. it looks like arm1176 uses virtually indexed caches, so it probably won't work. The other attached patch fixes PTs on arch with disabled caches. jermar: can you try the second patch (boot_flush_pt.diff) on gta02? Flushing PTs is something we should be doing, it might help with the crashes. You can find detailed info about shareability/cacheability in chapter B3.8 of Arm Architecture Reference Manual ARMv7-A and ARMv7-R edition (rev. C), or chapter B4 of ARM Architecture Reference Manual (rev. I). [0] ARM defines PoU as a place where data and inst caches unite and it's the coherence point for data access, inst access, and pt walks [1] ARM defines PoC as a place in cache hierarchy beyond which all accesses by all observers are coherent (i.e. main memory). [2] the comment in page_armv6.h is wrong 0x10101, is writeback-writeallocate hope this helps, Jan > > Thank you, > Beniamino
caches_armv6.diff
Description: Binary data
boot_flush_pt.diff
Description: Binary data
_______________________________________________ HelenOS-devel mailing list [email protected] http://lists.modry.cz/listinfo/helenos-devel
