Hi,

resending to ML.

On Sat, Sep 28, 2013 at 1:25 PM, Beniamino Galvani <[email protected]> wrote:
> Hi,
>
> I have a branch (lp:~bengal/helenos/raspberrypi) with support for
> Raspberry PI which uses a ARMv6 CPU. I'm trying to merge latest
> changes from mainline into my branch, but I'm experiencing crashes
> like this at the startup:
>
> Inflating components ... initrd fat logger vfs rd locsrv init loader ns 
> kernel .
> Booting the kernel...
> SPARTAN kernel, release 0.5.0 (Fajtl), revision 1781M 
> ([email protected])
> Built on 2013-09-28 10:02:31 for arm32
> Copyright (c) 2001-2013 HelenOS project
> BCM2835 framebuffer at 0x5c006000 (640x480)
> Detected 1 CPU(s), 458720 KiB free memory
> Program loader at 0xf0200000
>
> ######> Kernel panic on cpu0 due to a bad memory access while loading from 
> address 0xf0240000. <######
>
> THE=0x80862000: pe=0 thr=0x80861000 task=0x80860000 cpu=0x80853800 
> as=0x80245000 magic=0xfacefeed
> r0 =0xf0240000  r1 =0x80245078  r2 =0x00000000  r3 =0x464c457f
> r4 =0xf0240000  r5 =0xf0240000  r6 =0x80245078  r7 =0x80863e14
> r8 =0x80a62984  r9 =0x80a4bd0c  r10=0x80a4bd0c  fp =0x80863dd4
> r12=0x80863dd8  sp =0x80863d80  lr =0x80a15efc  spsr=0xa0000053
>
> 0x80863c8c: generic/src/debug/stacktrace.o:stack_trace()+0x0000001c
> 0x80863cbc: generic/src/debug/panic.o:panic_common()+0x00000154
> 0x80863cfc: generic/src/mm/as.o:as_page_fault()+0x00000208
> 0x80863d34: generic/src/interrupt/interrupt.o:exc_dispatch()+0x00000100
> 0x80863dd4: 
> arch/arm32/src/exc_handler.o:data_abort_exception_entry()+0x000000b4
> 0x80863df4: generic/src/proc/program.o:program_create_from_image()+0x00000038
> 0x80863fd4: generic/src/main/kinit.o:kinit()+0x00000264
> 0x80863ff4: generic/src/proc/thread.o:cushion()+0x0000006c
> cpu0: halted
>
> --
>
> Looking at the changes, I found that the commit that introduces this 
> behaviour is:
>
> http://trac.helenos.org/changeset/mainline%2C1912
>
> and, in particular, the specific change that seems to make the difference is:
>
> -         * turn it on for normal memory. */
> -        p->shareable = 1;
> +         * turn it off for normal memory. */
> +        p->shareable = 0;
>
> in kernel/arch/arm32/include/arch/mm/page_armv6.h; setting the bit to
> 1 like it was before solves my problem.
>
> I've searched both the ARM v7-A ARM and the ARM1176JZF-S TRM, but I
> still don't understand what's the actual effect of that bit in the
> page descriptor, so I'm asking to you if 1 it's the right value also
> for a ARMv6 CPU and if you have any clue on what is causing the crash.

Thanks for keeping the raspberry pi branch alive!

HelenOS currently uses memory attributes without TEX remap, so
shareability bit indicates whether the mapped memory can be shared
between multiple observers and requires hw to maintain memory
coherency. While it sounds nice, CPU implementations are free to
implement the hw coherent memory as non-cacheable.
So setting the shareable bit effectively disables caches on these
implementations (like Cortex-A8). Dealing with shareability domains
will be fun when we add support for multiprocessor extensions.

The commit you mention merges multiple ARM changes, and among them
improvements to enable caches and cacheable memory mapping.
Enabling caches led to more than 10x speedup (13x on BBxM). I'm sorry
for the breakage, but cache related things are rather tricky even if
one has access to hw.
You options are:
a) disable caches for armv6 (ifdef the shareability setting)
b) fix cache usage on armv6.

Obviously, it would be better if you could fix helenos to use caches.
I don't know armv6 and the BCM chip on raspberry well enough to have
definite answers,
my suspicion is that enabling caches causes incoherent page tables,
here are few hints you can try.

pt_coherence (page_armv6.h) macro makes sure pagetable information are
flushed to the Point of Unification (PoU) [0]:
 * you can try changing the pt_coherence macro to flush to the Point
of Coherence [1] (DCCMVAU -> DCCMVAC, this is probably the easiest)
 * you can try using write-through caches instead of
writeback-writeallocate[2] (the speedup is noticeable so you should be
able to tell, see page_armv6.h:259 for info on what to change and how)
 * you can try flushing all caches to PoC before flushing TLB on as switch

if any of the above fixes the issue, it confirms that memory coherence
is the cause. Note than none of the above is a real fix.
It might be that I enabled some the necessary actions only for armv7.
the basic requirements are that info is present in page-walk
accessible memory (pt_coherence macro), and MMU is allowed to use
caches for pt-walks (set_ptl0_addr). There might be similar issues
with smc_* macros on armv6.

I have attached a patch that reuses more of arm7 code for armv6. it
looks like arm1176 uses virtually indexed caches, so it probably won't
work.
The other attached patch fixes PTs on arch with disabled caches.

jermar: can you try the second patch (boot_flush_pt.diff) on gta02?
Flushing PTs is something we should be doing, it might help with the
crashes.

You can find detailed info about shareability/cacheability in chapter
B3.8 of Arm Architecture Reference Manual ARMv7-A and ARMv7-R edition
(rev. C),
or chapter B4 of ARM Architecture Reference Manual (rev. I).

[0] ARM defines PoU as a place where data and inst caches unite and
it's the coherence point for data access, inst access, and pt walks
[1] ARM defines PoC as a place in cache hierarchy beyond which all
accesses by all observers are coherent (i.e. main memory).
[2] the comment in page_armv6.h is wrong 0x10101, is writeback-writeallocate


hope this helps,
Jan


>
> Thank you,
> Beniamino

Attachment: caches_armv6.diff
Description: Binary data

Attachment: boot_flush_pt.diff
Description: Binary data

_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/listinfo/helenos-devel

Reply via email to