On Mon, Jun 1, 2015 at 6:50 AM, Geert Uytterhoeven <ge...@linux-m68k.org> wrote:
> Hi Russell,
>
> On Mon, Jun 1, 2015 at 12:53 PM, Russell King - ARM Linux
> <li...@arm.linux.org.uk> wrote:
>> On Mon, Jun 01, 2015 at 12:41:01PM +0200, Geert Uytterhoeven wrote:
>>> FWIW, I have the feeling this has a slight influence on boot reliability on
>>> two of my boards:
>>>   - r8a7740/armadillo, which is known to suffer from a cache-related bug in
>>>     its bootloader, seems to have a higher change of booting successfully on
>>>     cold boot,
>>>   - sh73a0/kzm9g, which has known cache-issues with secondary CPU boot up,
>>>     seems to have a lower chance of booting successfully.
>>>
>>> No time to spend all week turning this into a statistical significant test
>>> project... The reset button is my friend...
>>
>> Damn it, you sent this right after I merged and pushed out this change in
>> my for-arm-soc branch, and was just about to send it to the arm-soc people.
>> What excellent timing you have. :)
>
> Don't worry, I didn't send that email to make you postpone this change.
> Giving the fuzziness of reproduction, and the flakiness (esp. on Armadillo)
> of the boot loader, and these are old SoCs, please go ahead.
>
>> What happens on the kzm9g if you revert the mach-shmobile changes?
>
> Seems to make no difference.
>
>> For armadillo, do you use the decompressor?  That should be doing all the
>> cache cleaning already, prior to the kernel being entered.
>
> I think so.
>
> Corruption pattern ranges from lock up, over "Error: unrecognized/unsupported
> machine ID", to booting almost completely, but lacking a few devices due to
> a corrupted DTB. Been like that as long as I remember, i.e. since I got the
> board ca. 1 year ago. Boots fine (100%) with kexec.
>

It seems like this patch is causing the SoCFPGA to not boot with SMP
reliably. About 1 out of every 10 reboots, I'm seeing the boot failure
below. The error seems to only happen when I do a cold or warm reboot,
but never occurs during a power-up. If I revert this patch, or put
back the call to v7_invalidate_l1 in socfpga_secondary_startup , then
its able to boot 100% of the time.

Just wondering if anyone else is seeing something similar?

I am testing this on both linux-next and arm-soc/rmk/for-arm-soc. When
the failure happens, here's the log:

Booting Linux on physical CPU 0x0
Initializing cgroup subsys cpuset
Linux version 4.1.0-rc8-next-20150617-00002-gdd1f624
(dinguyen@linux-builds1) (gcc version 4.7.3 20130226 (prerelease)
(crosstool-NG linaro-1.13.1-4.7-2013.03-20130313 - Linaro GCC 2013.03)
) #1 SMP Wed Jun 17 14:22:59 CDT 2015
CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=10c5387d
CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
Machine model: Altera SOCFPGA Cyclone V SoC Development Kit
Truncating RAM at 0x00000000-0x40000000 to -0x2f800000
Memory policy: Data cache writealloc
On node 0 totalpages: 194560
free_area_init_node: node 0, pgdat c0692640, node_mem_map ef20b000
  Normal zone: 1520 pages used for memmap
  Normal zone: 0 pages reserved
  Normal zone: 194560 pages, LIFO batch:31
PERCPU: Embedded 12 pages/cpu @ef1e1000 s19648 r8192 d21312 u49152
pcpu-alloc: s19648 r8192 d21312 u49152 alloc=12*4096
pcpu-alloc: [0] 0 [0] 1
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 193040
Kernel command line: console=ttyS0,115200 root=/dev/mmcblk0p2 rw
rootwait ip=dhcp earlyprintk
PID hash table entries: 4096 (order: 2, 16384 bytes)
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 764288K/778240K available (4782K kernel code, 286K rwdata,
1344K rodata, 304K init, 135K bss, 13952K reserved, 0K cma-reserved)
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
    vmalloc : 0xf0000000 - 0xff000000   ( 240 MB)
    lowmem  : 0xc0000000 - 0xef800000   ( 760 MB)
    modules : 0xbf000000 - 0xc0000000   (  16 MB)
      .text : 0xc0008000 - 0xc0603e78   (6128 kB)
      .init : 0xc0604000 - 0xc0650000   ( 304 kB)
      .data : 0xc0650000 - 0xc0697920   ( 287 kB)
       .bss : 0xc0697920 - 0xc06b976c   ( 136 kB)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
Hierarchical RCU implementation.
    Additional per-CPU info printed with stalls.
    Build-time adjustment of leaf fanout to 32.
NR_IRQS:16 nr_irqs:16 16
L2C-310 enabling early BRESP for Cortex-A9
L2C-310 full line of zeros enabled for Cortex-A9
L2C-310 dynamic clock gating enabled, standby mode enabled
L2C-310 cache controller enabled, 8 ways, 512 kB
L2C-310: CACHE_ID 0x410030c9, AUX_CTRL 0x46060001
clocksource: timer1: mask: 0xffffffff max_cycles: 0xffffffff,
max_idle_ns: 19112604467 ns
sched_clock: 32 bits at 100MHz, resolution 10ns, wraps every 21474836475ns
Console: colour dummy device 80x30
Calibrating delay loop... 1836.64 BogoMIPS (lpj=9183232)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 1, 8192 bytes)
Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes)
CPU: Testing write buffer coherency: ok
CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
Setting up static identity map for 0x8280 - 0x82d8
CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted
4.1.0-rc8-next-20150617-00002-gdd1f624 #1
Hardware name: Altera SOCFPGA
task: eecaeac0 ti: eecce000 task.ti: eecce000
PC is at vfp_notifier+0x58/0x12c
LR is at notifier_call_chain+0x44/0x84
pc : [<c000a6bc>]    lr : [<c003d134>]    psr: 80000193
sp : eeccff48  ip : c06563c8  fp : eeccffd4
r10: eecaef80  r9 : ef1f1300  r8 : 00000002
r7 : eecd0000  r6 : c0656bc0  r5 : 00000000  r4 : eecd0000
r3 : c000a664  r2 : eecd0000  r1 : 00000002  r0 : c06563c8
Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c5387d  Table: 0000404a  DAC: 00000015
Process swapper/1 (pid: 0, stack limit = 0xeecce218)
Stack: (0xeeccff48 to 0xeecd0000)
ff40:                   c000a664 ffffffff 00000000 c003d134 eecd0018 eecaeac0
ff60: c06648e0 0b52d2f9 c048cfa8 c003d18c 00000000 f0002100 00000001 c003d1ac
ff80: 00000000 eecaeac0 c064f300 c001369c c064b304 c0013140 00000000 ef1ed328
ffa0: eeccffe8 c001e760 c0486ec4 2eba2000 c06957c0 c06524dc 00000015 c06957c0
ffc0: c048c778 c064b304 c06957c0 00000000 eeccffdc c0486ec4 eeccffe4 c0487138
ffe0: 00000001 c00544e8 c0009494 c0697bc0 00000000 000094ac 7ef5bffd 3f39b3f8
[<c000a6bc>] (vfp_notifier) from [<c003d134>] (notifier_call_chain+0x44/0x84)
[<c003d134>] (notifier_call_chain) from [<c003d18c>]
(__atomic_notifier_call_chain+0x18/0x20)
[<c003d18c>] (__atomic_notifier_call_chain) from [<c003d1ac>]
(atomic_notifier_call_chain+0x18/0x20)
[<c003d1ac>] (atomic_notifier_call_chain) from [<c001369c>]
(__switch_to+0x34/0x58)
Code: e3a03002 e5843208 e3a00000 e8bd8038 (eef85a10)
---[ end trace 9eaea9661b3b550a ]---
Kernel panic - not syncing: Attempted to kill the idle task!
SMP: failed to stop secondary CPUs
---[ end Kernel panic - not syncing: Attempted to kill the idle task!

Dinh
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to