Dear NOX friends,

we are facing a nasty bug and we would very much appreciate any help
in debugging and understanding the root cause. We have been
struggling for some time now... :(

The code base is fairly simple and has worked well for some time,
but for some reason it has started to crash:
https://github.com/chesteve/RouteFlow/blob/master/rf-controller/src/nox/netapps/routeflowc/routeflowc.cc

As fas as I can tell, the code has remained unchanged and only the datapath
and application traffic (i.e., payload of packet-in and packet-out packets)
has changed.


This is the error we are seeing in NOX, a failed assertion:

/usr/include/c++/4.5/backward/auto_ptr.h:194: element_type*
std::auto_ptr<_Tp>::operator->() const [with _Tp = vigil::Buffer,
element_type = vigil::Buffer]: Assertion '_M_ptr != 0' failed.
Caught signal 6.
  0xb74ae2be   64 (vigil::fault_handler(int)+0x4e)
  0xb7748400 3068602152 (__kernel_sigreturn+0x0)
  0xb71dc34e  296 (abort+0x17e)
  0xb74ecc11   80 (vigil::Openflow_stream_connection::send_tx_buf()+0x121)
  0xb74ece21   80
(vigil::Openflow_stream_connection::do_send_openflow(ofp_header
const*)+0xc1)
  0xb74ed7cf   80
(vigil::Openflow_connection::call_send_openflow(ofp_header
const*)+0x2f)
  0xb74ee14f   64
(vigil::Openflow_connection::send_openflow(ofp_header const*,
bool)+0x5f)
  0xb74eedae   96
(vigil::Openflow_connection::send_packet(vigil::Buffer const&,
ofp_action_header const*, unsigned int, unsigned short, bool)+0xfe)
  0xb74eeeb9   96
(vigil::Openflow_connection::send_packet(vigil::Buffer const&,
unsigned short, unsigned short, bool)+0x79)
  0xb75f18b4   96
(vigil::nox::send_openflow_packet_out(vigil::datapathid const&,
vigil::Buffer const&, unsigned short, unsigned short, bool)+0x74)
  0xb75ce7cc   48
(vigil::container::Component::send_openflow_packet(vigil::datapathid
const&, vigil::Buffer const&, unsigned short, unsigned short, bool)
const+0x3c)


Using  gdb, the backtrace is as follows:


(gdb) bt
#0  0xb772c367 in ?? () from /lib/ld-linux.so.2
#1  0xb772c979 in ?? () from /lib/ld-linux.so.2
#2  0xb7730a31 in ?? () from /lib/ld-linux.so.2
#3  0xb7736c40 in ?? () from /lib/ld-linux.so.2
#4  0xb7487dc2 in fgets () at /usr/include/bits/stdio2.h:255
#5  read_mem_map () at ../../../src/lib/fault.cc:79
#6  vigil::dump_backtrace () at ../../../src/lib/fault.cc:180
#7  0xb74882be in vigil::fault_handler (sig_nr=6) at
../../../src/lib/fault.cc:280
#8  <signal handler called>
#9  0xb7722424 in __kernel_vsyscall ()
#10 0xb71b2e71 in raise (sig=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#11 0xb71b634e in abort () at abort.c:92
#12 0xb74c6c11 in __replacement_assert (this=0x93d7830) at
/usr/include/c++/4.5/i686-linux-gnu/bits/c++config.h:326
#13 operator-> (this=0x93d7830) at
/usr/include/c++/4.5/backward/auto_ptr.h:194
#14 vigil::Openflow_stream_connection::send_tx_buf (this=0x93d7830) at
../../../src/lib/openflow.cc:824
#15 0xb74c6e21 in vigil::Openflow_stream_connection::do_send_openflow
(this=0x93d7830, oh=0x9104c50) at ../../../src/lib/openflow.cc:844
#16 0xb74c77cf in vigil::Openflow_connection::call_send_openflow
(this=0x93d7830, oh=0x9104c50) at ../../../src/lib/openflow.cc:248
#17 0xb74c814f in vigil::Openflow_connection::send_openflow
(this=0x93d7830, oh=0x9104c50, block=true) at
../../../src/lib/openflow.cc:232
#18 0xb74c8dae in vigil::Openflow_connection::send_packet
(this=0x93d7830, packet=..., actions=0xb64a6618, actions_len=8,
in_port=65535, block=true) at ../../../src/lib/openflow.cc:445
#19 0xb74c8eb9 in vigil::Openflow_connection::send_packet
(this=0x93d7830, packet=..., out_port=1, in_port=65535, block=true) at
../../../src/lib/openflow.cc:413
#20 0xb75cb8b4 in vigil::nox::send_openflow_packet_out
(datapath_id=..., packet=..., out_port=1, in_port=65535, block=true)
at ../../../src/builtin/nox.cc:435
#21 0xb75a87cc in vigil::container::Component::send_openflow_packet
(this=0x92232f8, datapath_id=..., packet=..., out_port=1,
in_port=65535, block=true) at ../../../src/builtin/component.cc:83
#22 0xb64b2876 in process_message (this=0x92232f8) at
../../../../../src/nox/netapps/routeflowc/routeflowc.cc:195
#23 (anonymous namespace)::RouteFlowC::server (this=0x92232f8) at
../../../../../src/nox/netapps/routeflowc/routeflowc.cc:470
#24 0xb64b034d in operator() (function_obj_ptr=...) at
/usr/include/boost/bind/mem_fn_template.hpp:49
#25 operator()<boost::_mfi::mf0<void, <unnamed>::RouteFlowC>,
boost::_bi::list0> (function_obj_ptr=...) at
/usr/include/boost/bind/bind.hpp:253
#26 operator() (function_obj_ptr=...) at
/usr/include/boost/bind/bind_template.hpp:20
#27
boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,
boost::_mfi::mf0<void, <unnamed>::RouteFlowC>,
boost::_bi::list1<boost::_bi::value<<unnamed>::RouteFlowC*> > >,
void>::invoke(boost::detail::function::function_buffer &)
(function_obj_ptr=...) at
/usr/include/boost/function/function_template.hpp:153
#28 0xb74f9f35 in operator() (thread_=0x9352cf0) at
/usr/include/boost/function/function_template.hpp:1013
#29 vigil::thread_main (thread_=0x9352cf0) at
../../../src/lib/threads/impl.cc:1359
#30 0xb7174e99 in start_thread (arg=0xb64a8b70) at pthread_create.c:304
#31 0xb725873e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

(gdb) f 24
#24 0xb64b2876 in process_message (this=0x92232f8) at
../../../../../src/nox/netapps/routeflowc/routeflowc.cc:195
195                                             (Buffer &) buff,
pack_msg.port_out, OFPP_NONE, true) )
(gdb) display buff
12: buff = {<vigil::Buffer> = {_vptr.Buffer = 0xb755ba98, m_data =
0x939a980 "\001", m_size = 78}, base = 0x939a980 "\001", capacity =
78}



Inspecting frame 22, which seems to be the starting point of the issue
accoding to the NOX log, the variable values seem ok:


(gdb) frame 22
#22 0xb64b2876 in process_message (this=0x92232f8) at
../../../../../src/nox/netapps/routeflowc/routeflowc.cc:195
195 (Buffer &) buff, pack_msg.port_out, OFPP_NONE, true) )
(gdb) info args
msg = 0xb64a72e0
this = 0x92232f8
(gdb) info locals
buff = {<vigil::Buffer> = {_vptr.Buffer = 0xb755ba98, m_data =
0x939a980 "\001", m_size = 78}, base = 0x939a980 "\001", capacity =
78}
pack_msg = {datapath_id = 7, port_out = 1, pkt_id = 2697}



Inspecting now frame 14 where the problem arises, in line 824 we can
see the value of M_ptr = 0x0, which causes the assertion error.

Anyone why this may be happening or how to prevent it?


(gdb) f 14
#14 vigil::Openflow_stream_connection::send_tx_buf (this=0x93d7830) at
../../../src/lib/openflow.cc:824
824    if (!tx_buf->size()) {
(gdb) info args
this = 0x93d7830
(gdb) info locals
bytes_written = 102
error = <value optimized out>
(gdb) print tx_buf
$9 = {_M_ptr = 0x0}


Thanks in advance for any hint!

Christian

--
Christian Esteve Rothenberg, Ph.D.
Converged Networks Division (DRC)
Tel.:+55 19-3705-4479 / Cel.: +55 19-8193-7087
est...@cpqd.com.br
www.cpqd.com.br



-- 
Christian
_______________________________________________
nox-dev mailing list
nox-dev@noxrepo.org
http://noxrepo.org/mailman/listinfo/nox-dev

Reply via email to