Dear NOX friends, we are facing a nasty bug and we would very much appreciate any help in debugging and understanding the root cause. We have been struggling for some time now... :(
The code base is fairly simple and has worked well for some time, but for some reason it has started to crash: https://github.com/chesteve/RouteFlow/blob/master/rf-controller/src/nox/netapps/routeflowc/routeflowc.cc As fas as I can tell, the code has remained unchanged and only the datapath and application traffic (i.e., payload of packet-in and packet-out packets) has changed. This is the error we are seeing in NOX, a failed assertion: /usr/include/c++/4.5/backward/auto_ptr.h:194: element_type* std::auto_ptr<_Tp>::operator->() const [with _Tp = vigil::Buffer, element_type = vigil::Buffer]: Assertion '_M_ptr != 0' failed. Caught signal 6. 0xb74ae2be 64 (vigil::fault_handler(int)+0x4e) 0xb7748400 3068602152 (__kernel_sigreturn+0x0) 0xb71dc34e 296 (abort+0x17e) 0xb74ecc11 80 (vigil::Openflow_stream_connection::send_tx_buf()+0x121) 0xb74ece21 80 (vigil::Openflow_stream_connection::do_send_openflow(ofp_header const*)+0xc1) 0xb74ed7cf 80 (vigil::Openflow_connection::call_send_openflow(ofp_header const*)+0x2f) 0xb74ee14f 64 (vigil::Openflow_connection::send_openflow(ofp_header const*, bool)+0x5f) 0xb74eedae 96 (vigil::Openflow_connection::send_packet(vigil::Buffer const&, ofp_action_header const*, unsigned int, unsigned short, bool)+0xfe) 0xb74eeeb9 96 (vigil::Openflow_connection::send_packet(vigil::Buffer const&, unsigned short, unsigned short, bool)+0x79) 0xb75f18b4 96 (vigil::nox::send_openflow_packet_out(vigil::datapathid const&, vigil::Buffer const&, unsigned short, unsigned short, bool)+0x74) 0xb75ce7cc 48 (vigil::container::Component::send_openflow_packet(vigil::datapathid const&, vigil::Buffer const&, unsigned short, unsigned short, bool) const+0x3c) Using gdb, the backtrace is as follows: (gdb) bt #0 0xb772c367 in ?? () from /lib/ld-linux.so.2 #1 0xb772c979 in ?? () from /lib/ld-linux.so.2 #2 0xb7730a31 in ?? () from /lib/ld-linux.so.2 #3 0xb7736c40 in ?? () from /lib/ld-linux.so.2 #4 0xb7487dc2 in fgets () at /usr/include/bits/stdio2.h:255 #5 read_mem_map () at ../../../src/lib/fault.cc:79 #6 vigil::dump_backtrace () at ../../../src/lib/fault.cc:180 #7 0xb74882be in vigil::fault_handler (sig_nr=6) at ../../../src/lib/fault.cc:280 #8 <signal handler called> #9 0xb7722424 in __kernel_vsyscall () #10 0xb71b2e71 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #11 0xb71b634e in abort () at abort.c:92 #12 0xb74c6c11 in __replacement_assert (this=0x93d7830) at /usr/include/c++/4.5/i686-linux-gnu/bits/c++config.h:326 #13 operator-> (this=0x93d7830) at /usr/include/c++/4.5/backward/auto_ptr.h:194 #14 vigil::Openflow_stream_connection::send_tx_buf (this=0x93d7830) at ../../../src/lib/openflow.cc:824 #15 0xb74c6e21 in vigil::Openflow_stream_connection::do_send_openflow (this=0x93d7830, oh=0x9104c50) at ../../../src/lib/openflow.cc:844 #16 0xb74c77cf in vigil::Openflow_connection::call_send_openflow (this=0x93d7830, oh=0x9104c50) at ../../../src/lib/openflow.cc:248 #17 0xb74c814f in vigil::Openflow_connection::send_openflow (this=0x93d7830, oh=0x9104c50, block=true) at ../../../src/lib/openflow.cc:232 #18 0xb74c8dae in vigil::Openflow_connection::send_packet (this=0x93d7830, packet=..., actions=0xb64a6618, actions_len=8, in_port=65535, block=true) at ../../../src/lib/openflow.cc:445 #19 0xb74c8eb9 in vigil::Openflow_connection::send_packet (this=0x93d7830, packet=..., out_port=1, in_port=65535, block=true) at ../../../src/lib/openflow.cc:413 #20 0xb75cb8b4 in vigil::nox::send_openflow_packet_out (datapath_id=..., packet=..., out_port=1, in_port=65535, block=true) at ../../../src/builtin/nox.cc:435 #21 0xb75a87cc in vigil::container::Component::send_openflow_packet (this=0x92232f8, datapath_id=..., packet=..., out_port=1, in_port=65535, block=true) at ../../../src/builtin/component.cc:83 #22 0xb64b2876 in process_message (this=0x92232f8) at ../../../../../src/nox/netapps/routeflowc/routeflowc.cc:195 #23 (anonymous namespace)::RouteFlowC::server (this=0x92232f8) at ../../../../../src/nox/netapps/routeflowc/routeflowc.cc:470 #24 0xb64b034d in operator() (function_obj_ptr=...) at /usr/include/boost/bind/mem_fn_template.hpp:49 #25 operator()<boost::_mfi::mf0<void, <unnamed>::RouteFlowC>, boost::_bi::list0> (function_obj_ptr=...) at /usr/include/boost/bind/bind.hpp:253 #26 operator() (function_obj_ptr=...) at /usr/include/boost/bind/bind_template.hpp:20 #27 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, <unnamed>::RouteFlowC>, boost::_bi::list1<boost::_bi::value<<unnamed>::RouteFlowC*> > >, void>::invoke(boost::detail::function::function_buffer &) (function_obj_ptr=...) at /usr/include/boost/function/function_template.hpp:153 #28 0xb74f9f35 in operator() (thread_=0x9352cf0) at /usr/include/boost/function/function_template.hpp:1013 #29 vigil::thread_main (thread_=0x9352cf0) at ../../../src/lib/threads/impl.cc:1359 #30 0xb7174e99 in start_thread (arg=0xb64a8b70) at pthread_create.c:304 #31 0xb725873e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130 (gdb) f 24 #24 0xb64b2876 in process_message (this=0x92232f8) at ../../../../../src/nox/netapps/routeflowc/routeflowc.cc:195 195 (Buffer &) buff, pack_msg.port_out, OFPP_NONE, true) ) (gdb) display buff 12: buff = {<vigil::Buffer> = {_vptr.Buffer = 0xb755ba98, m_data = 0x939a980 "\001", m_size = 78}, base = 0x939a980 "\001", capacity = 78} Inspecting frame 22, which seems to be the starting point of the issue accoding to the NOX log, the variable values seem ok: (gdb) frame 22 #22 0xb64b2876 in process_message (this=0x92232f8) at ../../../../../src/nox/netapps/routeflowc/routeflowc.cc:195 195 (Buffer &) buff, pack_msg.port_out, OFPP_NONE, true) ) (gdb) info args msg = 0xb64a72e0 this = 0x92232f8 (gdb) info locals buff = {<vigil::Buffer> = {_vptr.Buffer = 0xb755ba98, m_data = 0x939a980 "\001", m_size = 78}, base = 0x939a980 "\001", capacity = 78} pack_msg = {datapath_id = 7, port_out = 1, pkt_id = 2697} Inspecting now frame 14 where the problem arises, in line 824 we can see the value of M_ptr = 0x0, which causes the assertion error. Anyone why this may be happening or how to prevent it? (gdb) f 14 #14 vigil::Openflow_stream_connection::send_tx_buf (this=0x93d7830) at ../../../src/lib/openflow.cc:824 824 if (!tx_buf->size()) { (gdb) info args this = 0x93d7830 (gdb) info locals bytes_written = 102 error = <value optimized out> (gdb) print tx_buf $9 = {_M_ptr = 0x0} Thanks in advance for any hint! Christian -- Christian Esteve Rothenberg, Ph.D. Converged Networks Division (DRC) Tel.:+55 19-3705-4479 / Cel.: +55 19-8193-7087 est...@cpqd.com.br www.cpqd.com.br -- Christian
_______________________________________________ nox-dev mailing list nox-dev@noxrepo.org http://noxrepo.org/mailman/listinfo/nox-dev