Hey Sven,

thanks for you analysis!!

On Mon, Sep 08, 2008 at 11:18:42PM +0200, Sven Eckelmann wrote:
> Ok, I got the /proc/modules file now. Current situation is following: it 
> crashes inside the the batman module add position 0x00000aa4
> 
>     a60:      3c020000        lui     v0,0x0
>      a64:     8c500024        lw      s0,36(v0)
>      a68:     24420024        addiu   v0,v0,36
>      a6c:     12020014        beq     s0,v0,ac0 <cleanup_module+0x610>
>      a70:     3c040000        lui     a0,0x0
>      a74:     3c050000        lui     a1,0x0
>      a78:     3c020000        lui     v0,0x0
>      a7c:     24840000        addiu   a0,a0,0
>      a80:     24a50088        addiu   a1,a1,136
>      a84:     24420000        addiu   v0,v0,0
>      a88:     0040f809        jalr    v0
>      a8c:     24060283        li      a2,643
>      a90:     8e040004        lw      a0,4(s0)
>      a94:     8e030000        lw      v1,0(s0)
>      a98:     3c020010        lui     v0,0x10
>      a9c:     34420100        ori     v0,v0,0x100
>      aa0:     8e110008        lw      s1,8(s0)
>      aa4:     ac830000        sw      v1,0(a0)
>      aa8:     ae020000        sw      v0,0(s0)
>      aac:     3c020020        lui     v0,0x20
>      ab0:     34420200        ori     v0,v0,0x200
>      ab4:     ac640004        sw      a0,4(v1)
> 
> This is part of the compiled version of packet_recv_thread. Due the 
> optimizations done I cannot say were exactly the problem lies.
> 
> I think the code of get_ip_addr() got inlined in packet_recv_thread and we 
> need to search for the crash inside of it at list_del(&entry->list);
> I would also say that the really crash is inside __list_del where prev and 
> next will be set. To check it, look at LIST_POISON1 and LIST_POISON1 inside 
> of 
> poison.h of the current linux kernel. You will notice that the values are 
> 0x00100100 and 0x00200200 == address of the failed paging request. The list 
> poison stuff will be done in in list_del after calling __list_del (it is the 
> sequence lui, ori, sw in the asm snipped). So could it be that we have a 
> poisened entry inside the list?
> This could for example happen when we get scheduled (please notice that the 
> optimizer exchanged many instrictions) while another part of the program is 
> deleting entries. I haven't checked the rest of the code if that really could 
> happen, but that is my current idea.

Mhm, as far as i looked into the issue, there are the following 
points where free_client_list is accessed:

init_module() - INIT_LIST_HEAD()
* called on startup

get_ip_addr() - list_del():
* "secured" with a hash_lock spinlock

cleanup_module() - list_del():
* only called when unloading the module

batgat_ioctl() - list_del()
* from IOCREMDEV. This is called when batman shuts down.

packet_recv_thread - list_add():
* also secured in a hash_lock spinlock.

So it seems there should be no concurrency without user interaction 
(module or batman shutdown).
But i don't have a good idea yet where the problem comes from  ... :/

best regards,
        Simon

Attachment: signature.asc
Description: Digital signature

Reply via email to