Both of these functions are pretty innocuous, don't work with shared
data, and shouldn't be architecture-specific. Furthermore, given that
the problem remains essentially the same but moves around between
versions indicates to me that the issue isn't with the code itself.

It sounds to me that there is a larger issue with corruption - either
by something else in memory, running off the end of the stack, etc.
That is obviously difficult to track down but it might explain why the
problem appears to be specific to your environment (I've never heard a
report of this before).

On Thu, Oct 3, 2013 at 1:38 AM, Michele Bozier <mboz...@airspan.com> wrote:
> Jesse,
>
> Many thanks for your suggestions.
> For the openvswitch.ko module built from the Open vSwitch git repository, the 
> line of code causing the kernel oops appears to be the following in method 
> ovs_flow_to_nlattrs():
>         if (nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, output->phy.priority))
>                 goto nla_put_failure;
> This is totally repeatable - happens every time.
>
> For the openvswitch.ko module built from the kernel 3.3 sources, the problem 
> is different, but again totally repeatable.
> The Kernel oops is as follows:
>
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> pgd = de3e8000
> [00000000] *pgd=9e3c7831, *pte=00000000, *ppte=00000000
> Internal error: Oops: 817 [#1] PREEMPT
> Modules linked in:
> CPU: 0    Not tainted  (3.3.0 #1)
> PC is at ovs_flow_tbl_alloc+0x4e/0x94
> LR is at ovs_flow_tbl_alloc+0x4b/0x94
> pc : [<c027f5d6>]    lr : [<c027f5d3>]    psr: 80000033
> sp : de277c50  ip : 6c6c6c6c  fp : 00000000
> r10: 00000004  r9 : de33bf10  r8 : de32f280
> r7 : 00000000  r6 : de342000  r5 : 00000400  r4 : 00000002
> r3 : de342000  r2 : 00000000  r1 : 00000002  r0 : 00000000
> Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment user
> Control: 50c5387d  Table: 9e3e8019  DAC: 00000015
> Process ovs-vswitchd (pid: 454, stack limit = 0xde2762e8)
> Stack: (0xde277c50 to 0xde278000)
> …
> [<c027f5d6>] (ovs_flow_tbl_alloc+0x4e/0x94) from [<c027eda5>] 
> (ovs_dp_cmd_new+0x51/0x130)
> [<c027eda5>] (ovs_dp_cmd_new+0x51/0x130) from [<c01c9347>] 
> (genl_rcv_msg+0x15f/0x17c)
> [<c01c9347>] (genl_rcv_msg+0x15f/0x17c) from [<c01c8c39>] 
> (netlink_rcv_skb+0x65/0x70)
> [<c01c8c39>] (netlink_rcv_skb+0x65/0x70) from [<c01c91df>] 
> (genl_rcv+0x17/0x20)
> [<c01c91df>] (genl_rcv+0x17/0x20) from [<c01c888f>] 
> (netlink_unicast+0x117/0x150)
> [<c01c888f>] (netlink_unicast+0x117/0x150) from [<c01c8ab1>] 
> (netlink_sendmsg+0x185/0x1cc)
> [<c01c8ab1>] (netlink_sendmsg+0x185/0x1cc) from [<c018f08b>] 
> (sock_sendmsg+0x5f/0x74)
>
> In this case, the line of code causing the problem in ovs_flow_tbl_alloc() is
>     table->buckets = alloc_buckets(new_size);
>
> When I tried to put a printk to dump the new_size property in this second 
> scenario then the problem moved again.
> What else can I try?
> Regards
> Michele Bozier
>
>
> -----Original Message-----
> From: Jesse Gross [mailto:je...@nicira.com]
> Sent: 01 October 2013 20:35
> To: Michele Bozier
> Cc: discuss@openvswitch.org
> Subject: Re: [ovs-discuss] Kernel oops running Open vSwitch on 3.3 Kernel 
> (ARM)
>
> On Tue, Oct 1, 2013 at 2:25 AM, Michele Bozier <mboz...@airspan.com> wrote:
>> I am having trouble running Open vSwitch on the ARM platform after
>> cross-compiling on an i686 platform.  I am using the latest code from
>> master from the Open vSwitch git repository - commit Sept 26th
>> (6a8a8528acb05d6d0a520e09ad1ec67e62b99e5e) and the Arago Kernel 3.3.
>>
>>
>>
>> The problem I am seeing when running on the target and trying to
>> create a switch is as follows:
>>
>>
>>
>> insmod ./openvswitch.ko
>>
>> The module seems to install fine -on the console I get
>>
>> openvswitch: Open vSwitch switching datapath 2.0.90, built Sep 30 2013
>> 11:33:05
>>
>>
>>
>> ./ovsdb-tool create /usr/local/etc/openvswitch/conf.db
>> ./vswitch.ovsschema ./ovsdb-server --remote=ptcp:6634
>> --remote=db:Open_vSwitch,Open_vSwitch,manager_options
>> --pidfile=/home/opf/server.pid --detach ./ovs-vsctl
>> --db=tcp:127.0.0.1:6634 --no-wait init ./ovs-vswitchd
>> tcp:127.0.0.1:6634 --pidfile=/home/opf/switch.pid
>> --log-file=/home/opf/switch.log --detach
>>
>>
>>
>> On the console I see the following:
>>
>> 1970-01-01T00:01:15Z|00001|vlog|INFO|opened log file
>> /home/opf/switch.log
>>
>> 1970-01-01T00:01:15Z|00002|reconnect|INFO|tcp:127.0.0.1:6634: connecting...
>>
>> 1970-01-01T00:01:15Z|00003|reconnect|INFO|tcp:127.0.0.1:6634:
>> connected
>>
>>
>>
>> I then enter the command to create a switch ./ovs-vsctl
>> --db=tcp:127.0.0.1:6634 add-br opfbr
>>
>>
>>
>> I get the following output to the console
>>
>> device: 'ovs-system': device_add
>>
>> device ovs-system entered promiscuous mode
>>
>> device: 'opfbr0': device_add
>>
>> device opfbr0 entered promiscuous mode
>>
>>
>>
>> Followed shortly afterwards by a kernel oops.
>>
>>
>>
>> [root@synergy opf]# Unable to handle kernel paging request at virtual
>> address 8d10051d pgd = dd840000 [8d10051d] *pgd=00000000 Internal error:
>> Oops: 5 [#1] PREEMPT Modules linked in: openvswitch(O)
>>
>> CPU: 0    Tainted: G           O  (3.3.0 #7)
>>
>> PC is at ovs_flow_to_nlattrs+0x5/0x430 [openvswitch] LR is at
>> ovs_flow_cmd_fill_info+0x114/0x208 [openvswitch]
>>
>> pc : [<bf80524e>]    lr : [<bf801669>]    psr: 80000033
>>
>> sp : de273c30  ip : 00000058  fp : 00000018
>>
>> r10: de36e540  r9 : 0001fffb  r8 : dd8b8000
>>
>> r7 : 00000013  r6 : 000001cd  r5 : dd8b8088  r4 : 00000070
>>
>> r3 : 00000000  r2 : de36e540  r1 : 8d100505  r0 : 0002001b
>>
>> Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment user
>>
>> Control: 50c5387d  Table: 9d840019  DAC: 00000015 Process ovs-vswitchd (pid:
>> 461, stack limit = 0xde2722e8)
>>
>> Stack: (0xde273c30 to 0xde274000)
>>
>> ...
>>
>> [<bf80524e>] (ovs_flow_to_nlattrs+0x5/0x430 [openvswitch]) from
>> [<bf801669>]
>> (ovs_flow_cmd_fill_info+0x114/0x208 [openvswitch]) [<bf801669>]
>> (ovs_flow_cmd_fill_info+0x114/0x208 [openvswitch]) from [<bf80179f>]
>> (ovs_flow_cmd_dump+0x42/0x7c [openvswitch]) [<bf80179f>]
>> (ovs_flow_cmd_dump+0x42/0x7c [openvswitch]) from [<c01c90fb>]
>> (netlink_dump+0x3b/0x130) [<c01c90fb>] (netlink_dump+0x3b/0x130) from
>> [<c01c9983>] (netlink_dump_start+0xc7/0x108) [<c01c9983>]
>> (netlink_dump_start+0xc7/0x108) from [<c01cb069>]
>> (genl_rcv_msg+0xc1/0x17c) [<c01cb069>] (genl_rcv_msg+0xc1/0x17c) from
>> [<c01ca9f9>]
>> (netlink_rcv_skb+0x65/0x70) [<c01ca9f9>] (netlink_rcv_skb+0x65/0x70)
>> from [<c01caf9f>] (genl_rcv+0x17/0x20) [<c01caf9f>]
>> (genl_rcv+0x17/0x20) from [<c01ca64f>] (netlink_unicast+0x117/0x150)
>> [<c01ca64f>]
>> (netlink_unicast+0x117/0x150) from [<c01ca871>]
>> (netlink_sendmsg+0x185/0x1cc) [<c01ca871>]
>> (netlink_sendmsg+0x185/0x1cc) from [<c0190e4b>]
>> (sock_sendmsg+0x5f/0x74) [<c0190e4b>]
>> (sock_sendmsg+0x5f/0x74) from [<c01921c1>] (sys_sendto+0x6d/0x80)
>> [<c01921c1>] (sys_sendto+0x6d/0x80) from [<c01921e3>]
>> (sys_send+0xf/0x14) [<c01921e3>] (sys_send+0xf/0x14) from [<c000c521>]
>> (ret_fast_syscall+0x1/0x46)
>>
>> Code: bf00 e92d 47f0 b086 (698f) ab06
>>
>> ---[ end trace c6309ab77c3d706d ]---
>>
>>
>>
>> The process I followed to cross-compile the code base is as follows:
>>
>>
>>
>> ./boot.sh
>>
>>
>>
>> ./configure CC=arm-none-linux-gnueabi-gcc
>> --host=arm-none-linux-gnueabi --target=arm-none-linux-gnueabi
>> --build=i686-linux --with-linux=/home/mbozier/synergy/kernel/ti
>> KARCH=arm --disable-ssl
>> CPPFLAGS=-I/home/mbozier/tirootfs/usr/inc-L/home/mbozier/tirootfs/usr/
>> lib
>>
>>
>>
>> make CROSS_COMPILE="arm-none-linux-gnueabi-" ARCH="arm"
>> KCC="arm-none-linux-gnueabi-gcc" GCC="arm-none-linux-gnueabi-gcc"
>>
>>
>>
>> The kernel used on the target is built without Open vSwitch support
>> and the 802.1d bridging support is configured to be loaded as a module.
>>
>>
>>
>> I also tried running the OpenvSwitch kernel module built from the
>> sources distributed with the 3.3 kernel but with no success either.
>
> Is it the exact same problem on this kernel or is a different one?
>
> Probably the place to start is to use GDB to find exactly where it is 
> faulting, based on the address in the stack trace. Is the problem 
> reproducible?
> _______________________________________________
> discuss mailing list
> discuss@openvswitch.org
> http://openvswitch.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to