Hi Ben, I’ve included the traces for an ARP request, and a ping. ovs-vswitchd also logs errors for the return traffic, but I didn’t include that as it seems redundant.
root@ip-172-31-2-45:/# ovs-appctl ofproto/trace kelda-int 'arp,in_port=1,vlan_tci=0x0000,dl_src=02:00:0a:c5 :34:1e,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=10.197.52.30,arp_tpa=10.203.4.66,arp_op=1,arp_sha=02:00:0a:c5:34:1e ,arp_tha=ff:ff:ff:ff:ff:ff' Flow: arp,in_port=1,vlan_tci=0x0000,dl_src=02:00:0a:c5:34:1e,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=10.197.52.30,arp_tpa=10.203.4.66,arp_op=1,arp_sha=02:00:0a:c5:34:1e,arp_tha=ff:ff:ff:ff:ff:ff bridge("kelda-int") ------------------- 0. in_port=1,dl_src=02:00:0a:c5:34:1e, priority 32768 load:0x2->NXM_NX_REG0[] resubmit(,1) 1. arp,dl_dst=ff:ff:ff:ff:ff:ff, priority 1000 LOCAL output:NXM_NX_REG0[] -> output port is 2 >>>> Cannot truncate output to patch port <<<< bridge("br-int") ---------------- 0. in_port=1, priority 100 set_field:0x1->reg13 set_field:0x3->reg11 set_field:0x2->reg12 set_field:0x1->metadata set_field:0x1->reg14 resubmit(,8) 8. reg14=0x1,metadata=0x1, priority 50, cookie 0x1d48851d resubmit(,9) 9. metadata=0x1, priority 0, cookie 0xb38c4d37 resubmit(,10) 10. metadata=0x1, priority 0, cookie 0xa1f73f3a resubmit(,11) 11. metadata=0x1, priority 0, cookie 0x8515cdd5 resubmit(,12) 12. metadata=0x1, priority 0, cookie 0x1163ca42 resubmit(,13) 13. metadata=0x1, priority 0, cookie 0xbbb6e5e5 resubmit(,14) 14. metadata=0x1, priority 0, cookie 0x98e3172d resubmit(,15) 15. metadata=0x1, priority 0, cookie 0x10496832 resubmit(,16) 16. metadata=0x1, priority 0, cookie 0xe88c85a resubmit(,17) 17. metadata=0x1, priority 0, cookie 0x6516996b resubmit(,18) 18. arp,metadata=0x1,arp_tpa=10.203.4.66,arp_op=1, priority 50, cookie 0xb33c89ed move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[] -> NXM_OF_ETH_DST[] is now 02:00:0a:c5:34:1e set_field:02:00:0a:cb:04:42->eth_src set_field:2->arp_op move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[] -> NXM_NX_ARP_THA[] is now 02:00:0a:c5:34:1e set_field:02:00:0a:cb:04:42->arp_sha move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[] -> NXM_OF_ARP_TPA[] is now 10.197.52.30 set_field:10.203.4.66->arp_spa move:NXM_NX_REG14[]->NXM_NX_REG15[] -> NXM_NX_REG15[] is now 0x1 load:0x1->NXM_NX_REG10[0] resubmit(,32) 32. priority 0 resubmit(,33) 33. reg15=0x1,metadata=0x1, priority 100 set_field:0x1->reg13 set_field:0x3->reg11 set_field:0x2->reg12 resubmit(,34) 34. priority 0 set_field:0->reg0 set_field:0->reg1 set_field:0->reg2 set_field:0->reg3 set_field:0->reg4 set_field:0->reg5 set_field:0->reg6 set_field:0->reg7 set_field:0->reg8 set_field:0->reg9 resubmit(,40) 40. metadata=0x1, priority 0, cookie 0x69ea0511 resubmit(,41) 41. metadata=0x1, priority 0, cookie 0xe98a052d resubmit(,42) 42. metadata=0x1, priority 0, cookie 0xb9ce92f6 resubmit(,43) 43. metadata=0x1, priority 0, cookie 0x95c9e574 resubmit(,44) 44. metadata=0x1, priority 0, cookie 0xd46c593f resubmit(,45) 45. metadata=0x1, priority 0, cookie 0x2b14b7f8 resubmit(,46) 46. metadata=0x1, priority 0, cookie 0xacf215be resubmit(,47) 47. metadata=0x1, priority 0, cookie 0x390a9bed resubmit(,48) 48. reg15=0x1,metadata=0x1, priority 50, cookie 0x9c81921d resubmit(,64) 64. reg10=0x1/0x1,reg15=0x1,metadata=0x1, priority 100 push:NXM_OF_IN_PORT[] set_field:0->in_port resubmit(,65) 65. reg15=0x1,metadata=0x1, priority 100 output:1 bridge("kelda-int") ------------------- 0. in_port=2, priority 32768 output:1 pop:NXM_OF_IN_PORT[] -> NXM_OF_IN_PORT[] is now 1 Final flow: arp,reg0=0x2,in_port=1,vlan_tci=0x0000,dl_src=02:00:0a:c5:34:1e,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=10.197.52.30,arp_tpa=10.203.4.66,arp_op=1,arp_sha=02:00:0a:c5:34:1e,arp_tha=ff:ff:ff:ff:ff:ff Megaflow: recirc_id=0,ct_state=-new-est-rel-rpl-inv-trk,ct_label=0/0x1,eth,arp,in_port=1,vlan_tci=0x0000/0x1000,dl_src=02:00:0a:c5:34:1e,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=10.197.52.30,arp_tpa=10.203.4.66,arp_op=1,arp_sha=02:00:0a:c5:34:1e,arp_tha=ff:ff:ff:ff:ff:ff Datapath actions: 1,set(eth(src=02:00:0a:cb:04:42,dst=02:00:0a:c5:34:1e)),set(arp(sip=10.203.4.66,tip=10.197.52.30,op=2/0xff,sha=02:00:0a:cb:04:42,tha=02:00:0a:c5:34:1e)),3 This flow is handled by the userspace slow path because it: - Uses action(s) not supported by datapath. root@ip-172-31-2-45:/# ovs-appctl ofproto/trace kelda-int 'icmp,in_port=1,vlan_tci=0x0000,dl_src=02:00:0a:c5:34:1e,dl_dst=02:00:0a:cb:04:42,nw_src=10.197.52.30,nw_dst=10.203.4.66,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0' Flow: icmp,in_port=1,vlan_tci=0x0000,dl_src=02:00:0a:c5:34:1e,dl_dst=02:00:0a:cb:04:42,nw_src=10.197.52.30,nw_dst=10.203.4.66,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0 bridge("kelda-int") ------------------- 0. in_port=1,dl_src=02:00:0a:c5:34:1e, priority 32768 load:0x2->NXM_NX_REG0[] resubmit(,1) 1. priority 800 output:NXM_NX_REG0[] -> output port is 2 >>>> Cannot truncate output to patch port <<<< bridge("br-int") ---------------- 0. in_port=1, priority 100 set_field:0x1->reg13 set_field:0x3->reg11 set_field:0x2->reg12 set_field:0x1->metadata set_field:0x1->reg14 resubmit(,8) 8. reg14=0x1,metadata=0x1, priority 50, cookie 0x1d48851d resubmit(,9) 9. metadata=0x1, priority 0, cookie 0xb38c4d37 resubmit(,10) 10. metadata=0x1, priority 0, cookie 0xa1f73f3a resubmit(,11) 11. ip,metadata=0x1, priority 100, cookie 0xca11a09b load:0x1->NXM_NX_XXREG0[96] resubmit(,12) 12. metadata=0x1, priority 0, cookie 0x1163ca42 resubmit(,13) 13. ip,reg0=0x1/0x1,metadata=0x1, priority 100, cookie 0x3790c287 ct(table=14,zone=NXM_NX_REG13[0..15]) drop -> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 14. Final flow: icmp,reg0=0x2,in_port=1,vlan_tci=0x0000,dl_src=02:00:0a:c5:34:1e,dl_dst=02:00:0a:cb:04:42,nw_src=10.197.52.30,nw_dst=10.203.4.66,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0 Megaflow: recirc_id=0,eth,ip,in_port=1,vlan_tci=0x0000/0x1000,dl_src=02:00:0a:c5:34:1e,dl_dst=02:00:0a:cb:04:42,nw_frag=no Datapath actions: ct(zone=1),recirc(0xd) =============================================================================== recirc(0xd) - resume conntrack with default ct_state=trk|new (use --ct-next to customize) =============================================================================== Flow: recirc_id=0xd,ct_state=new|trk,ct_zone=1,eth,icmp,reg0=0x1,reg11=0x3,reg12=0x2,reg13=0x1,reg14=0x1,metadata=0x1,in_port=1,vlan_tci=0x0000,dl_src=02:00:0a:c5:34:1e,dl_dst=02:00:0a:cb:04:42,nw_src=10.197.52.30,nw_dst=10.203.4.66,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0 bridge("kelda-int") ------------------- thaw Resuming from table 14 14. ct_state=+new-est+trk,icmp,metadata=0x1,nw_src=10.197.52.30,nw_dst=10.203.4.66, priority 1001, cookie 0x43c716ed load:0x1->NXM_NX_XXREG0[97] resubmit(,15) 15. metadata=0x1, priority 0, cookie 0x10496832 resubmit(,16) 16. metadata=0x1, priority 0, cookie 0xe88c85a resubmit(,17) 17. ip,reg0=0x2/0x2,metadata=0x1, priority 100, cookie 0x1294b35 ct(commit,zone=NXM_NX_REG13[0..15],exec(load:0->NXM_NX_CT_LABEL[0])) load:0->NXM_NX_CT_LABEL[0] resubmit(,18) 18. metadata=0x1, priority 0, cookie 0xd7f52407 resubmit(,19) 19. metadata=0x1, priority 0, cookie 0x81041aca resubmit(,20) 20. metadata=0x1, priority 0, cookie 0x2ba72f51 resubmit(,21) 21. metadata=0x1, priority 0, cookie 0x3acfedb2 resubmit(,22) 22. metadata=0x1, priority 0, cookie 0xf4a7aa32 resubmit(,23) 23. metadata=0x1,dl_dst=02:00:0a:cb:04:42, priority 50, cookie 0x5d36571c set_field:0x3->reg15 resubmit(,32) 32. priority 0 resubmit(,33) 33. reg15=0x3,metadata=0x1, priority 100 set_field:0x6->reg13 set_field:0x3->reg11 set_field:0x2->reg12 resubmit(,34) 34. priority 0 set_field:0->reg0 set_field:0->reg1 set_field:0->reg2 set_field:0->reg3 set_field:0->reg4 set_field:0->reg5 set_field:0->reg6 set_field:0->reg7 set_field:0->reg8 set_field:0->reg9 resubmit(,40) 40. metadata=0x1, priority 0, cookie 0x69ea0511 resubmit(,41) 41. ip,metadata=0x1, priority 100, cookie 0x6f45d6f5 load:0x1->NXM_NX_XXREG0[96] resubmit(,42) 42. ip,reg0=0x1/0x1,metadata=0x1, priority 100, cookie 0x5da27571 ct(table=43,zone=NXM_NX_REG13[0..15]) drop -> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 43. Final flow: recirc_id=0xd,eth,icmp,reg0=0x1,reg11=0x3,reg12=0x2,reg13=0x6,reg14=0x1,reg15=0x3,metadata=0x1,in_port=1,vlan_tci=0x0000,dl_src=02:00:0a:c5:34:1e,dl_dst=02:00:0a:cb:04:42,nw_src=10.197.52.30,nw_dst=10.203.4.66,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0 Megaflow: recirc_id=0xd,ct_state=+new-est-rel-rpl-inv+trk,ct_label=0/0x1,eth,icmp,in_port=1,dl_dst=02:00:0a:cb:04:42,nw_src=10.197.52.30,nw_dst=10.203.4.66,nw_frag=no Datapath actions: ct(commit,zone=1,label=0/0x1),ct(zone=6),recirc(0xe) =============================================================================== recirc(0xe) - resume conntrack with default ct_state=trk|new (use --ct-next to customize) =============================================================================== Flow: recirc_id=0xe,ct_state=new|trk,ct_zone=6,eth,icmp,reg0=0x1,reg11=0x3,reg12=0x2,reg13=0x6,reg14=0x1,reg15=0x3,metadata=0x1,in_port=1,vlan_tci=0x0000,dl_src=02:00:0a:c5:34:1e,dl_dst=02:00:0a:cb:04:42,nw_src=10.197.52.30,nw_dst=10.203.4.66,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0 bridge("kelda-int") ------------------- thaw Resuming from table 43 43. metadata=0x1, priority 0, cookie 0x95c9e574 resubmit(,44) 44. ct_state=+new-est+trk,icmp,metadata=0x1,nw_src=10.197.52.30,nw_dst=10.203.4.66, priority 1001, cookie 0x6468c276 load:0x1->NXM_NX_XXREG0[97] resubmit(,45) 45. metadata=0x1, priority 0, cookie 0x2b14b7f8 resubmit(,46) 46. ip,reg0=0x2/0x2,metadata=0x1, priority 100, cookie 0xac371eb4 ct(commit,zone=NXM_NX_REG13[0..15],exec(load:0->NXM_NX_CT_LABEL[0])) load:0->NXM_NX_CT_LABEL[0] resubmit(,47) 47. metadata=0x1, priority 0, cookie 0x390a9bed resubmit(,48) 48. reg15=0x3,metadata=0x1, priority 50, cookie 0x8ebb8a35 resubmit(,64) 64. priority 0 resubmit(,65) 65. reg15=0x3,metadata=0x1, priority 100 output:2 bridge("kelda-int") ------------------- 0. in_port=4, priority 32768 output:3 Final flow: recirc_id=0xe,eth,icmp,reg0=0x3,reg11=0x3,reg12=0x2,reg13=0x6,reg14=0x1,reg15=0x3,metadata=0x1,in_port=1,vlan_tci=0x0000,dl_src=02:00:0a:c5:34:1e,dl_dst=02:00:0a:cb:04:42,nw_src=10.197.52.30,nw_dst=10.203.4.66,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0 Megaflow: recirc_id=0xe,ct_state=+new-est-rel-rpl-inv+trk,ct_label=0/0x1,eth,icmp,in_port=1,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00,nw_src=10.197.52.30,nw_dst=10.203.4.66,nw_frag=no Datapath actions: ct(commit,zone=6,label=0/0x1),4 —Kevin > On Dec 7, 2017, at 1:03 PM, Ben Pfaff <b...@ovn.org> wrote: > > On Thu, Dec 07, 2017 at 09:26:14AM -0800, Kevin Lin wrote: >> Hi, >> >> I work on Kelda (kelda.io <http://kelda.io/>) with Ethan Jackson. We run a >> containerized, distributed version of OVN. The master branch of >> openvswitch/ovs (commit 07754b23ee5027508d64804d445e617b017cc2d1) fails with >> the following assertion in ovs-vswitchd: >> >> ovs-vswitchd(handler2): ofproto/ofproto-dpif-xlate.c:3704: assertion >> !truncate failed in compose_output_action__() >> >> whenever we try to use the OVN network. A little background on our setup: >> We’re a container orchestrator that uses OVN for the container network. >> One machine in our cluster runs ovn-northd and ovsdb-server. The network is >> mostly configured from here (creating the logical ports, creating ACLs etc). >> Another machine runs ovn-controller, ovs-vswitchd, and ovsdb-server. We >> install some container-specific OpenFlow rules by connecting directly to >> ovs-vswitchd, and of course ovs-vswitchd also receives rules from OVN. >> ovs-vswitchd does not crash immediately after the rules are installed. But >> it crashes as soon as the network is used (e.g. a ping from one container to >> another). >> >> The commit before the commit that introduced the assertion works for us >> (https://github.com/openvswitch/ovs/commit/48f704f4768d13f85252bac4f93c8d45d8ab3eea >> >> <https://github.com/openvswitch/ovs/commit/48f704f4768d13f85252bac4f93c8d45d8ab3eea>). >> >> I’ve attached the ovs-vswitchd logs. I’m not sure how helpful the output of >> ovs-bugtool will be given our containerized setup, but I’ve also attached >> the output of running that from within the ovs-vswitchd container from >> before and after the crash. Note, because the ovs-vswitchd container >> crashed, the “after” tarball was generated after restarting the container, >> so I’m not sure if any of the commands it ran actually succeeded. >> >> The crash is trivial for me to reproduce, so please let me know if there’s >> anymore information I can give you. > > Thank you for the report. > > I don't see a good reason that this should be a condition that kills > ovs-vswitchd. I think that it will be both easier to debug and less > inherently harmful if we change it to an error message. I sent out a > patch that does that: > https://patchwork.ozlabs.org/patch/845845/ > > It would be great if you could apply the patch and then try to track > down the activity that triggers the error. "ofproto/trace" is the best > way to do that, if you can find the right packet or flow, because it > will give us all the details on how the problem gets triggered. > > Thanks, > > Ben. _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss