Hi Simon, > Hi Francesco, > > On Wednesday, September 12, 2018 10:44:36 AM CEST Francesco Salvatore > [fabbricadigitale] wrote: > > Hi Simon, > > > > > Hi Francesco, > > > > > > On Tuesday, September 11, 2018 4:38:13 PM CEST Francesco Salvatore > > > > > > [fabbricadigitale] wrote: > > > > Hi all, > > > > We're running a mesh network made of a cloud of clients and > > > > multiple gateways on two separate VLANs (on eth0, not on top of > BATMAN). > > > > The setup is similar to the one described in the figure. > > > > https://www.open-> > > > > mesh.org/attachments/download/132/Test_2xLAN.dia.png > > > > > > > We noticed that, sometimes, when new gateways are added to the > > > > already running infrastructure network loops appear on VLANs We > > > > dumped VLANs network traffic during one of these loops and we saw > > > > a storm of BLA frames that collapsed the network. It seems that > > > > the frame (an ANNOUNCE one, in this case) was firstly generated by > > > > a gateway and started to loop inside the LAN, and then even the > > > > others gateways propagated the same frame. After a few seconds > > > > also other frames (coming from different > > > > gateways) started to loop. > > > > > > > > Our hypothesis is that one of gateways directly injects BLA frames > > > > inside mesh and that lead to an unmanageable loop. So, we have 2 > > > > > > questions: > > > > - Are BLA frames (except for LOOP DETECT) allowed to flow only on > > > > > > > > LAN? > > > > > > Yes, all frames except LOOP DETECT are blocked in BATMAN > > > > > > > - If so, is our hypothesis reasonable? > > > > > > > > You can see the situation described above in the screenshot below. > > > > http://oi63.tinypic.com/v7wl1w.jpg > > > > > > Unfortunately the screenshot doesn't describe which packets looped > > > exactly. > > > > > > Are you sure it's an announce frame? It could also be a claim frame > > > where two hosts try to claim hosts from each other. > > > > As you can see here (http://oi66.tinypic.com/ofo5jn.jpg) the frame > > that's looping is an ANNOUNCE one, and so are the ones coming from > > Legra_55:3c:dc The last ANNOUNCE frame from those MACs were sent 10 > > seconds before they started looping, so it seems that at a certain > > time one of the gateways started to forward BLA broadcast traffic from > LAN to mesh. > > That certainly looks like an announce frame. Do you see any other frames in > between, like claim frames? > > Announces are also sent after a couple of claim frames upon a request > (check batadv_bla_answer_request). We actually had a bug where > inconsistencies among the BLA tables could happen, but that was fixed > before 2017.3 ...
BLA traffic seems regular. This (https://mega.nz/#!9ZkmharA!S9mFxvpnnnseu_l8H7MPfoZ7X1Ef0lNrJLVQOpgTg4w) is a dump of the broadcast traffic captured from LAN ports of four gateway (on two separate VLANs). As you can see loop starts at packet 2660. The four gateways are: . 00:0f:00:68:97:e4 (Bridge IP 10.140.0.61) . 00:0f:00:68:9f:4b (Bridge IP 10.140.0.17) . 00:0f:00:68:96:66 (Bridge IP 10.140.16.19) . 00:0f:00:55:3c:dc (Bridge IP 10.140.16.61) > > > BATMAN has a grace period to allow broadcasts from the LAN only > > > after 1 minute of operation. This is done to make sure that the mesh > > > is properly established and other gateways and their claims are > > > detected before > > > > traffic is > > > > > allowed on it, at least potentially looping traffic. Therefore, you > > > should > > > > make > > > > > sure (e.g. in your firmware or setup scripts) that the LAN is > > > operational > > > > once > > > > > batman is brought op. > > > > > > If the mesh isn't fully established or it's actually split due to > > > > different > > > > > channels or similar, then you may run in an unresolved limitation of BLA: > > > > > > https://www.open-mesh.org/projects/batman-adv/wiki/Bridge-loop-> > > > > avoidance-II#Limitations > > > > > > For this reason we have the loop detect packets. If a loop is > > > detected, an uevent is sent to userspace, and the firmware should > > > react appropiately, > > > > e.g. > > > > > by shutting down batman-adv. > > > > We start gateways with this script placed in rc.local > > > > sudo pkill wpa_supplicant > > sudo modprobe batman-adv > > sudo ip link set wlan0 down > > sleep 2s > > sudo iwconfig wlan0 mode ad-hoc > > sudo iwconfig wlan0 essid mesh-network sudo iwconfig wlan0 ap any sudo > > iwconfig wlan0 channel 44 sudo ip link set wlan0 up sudo batctl if add > > wlan0 sleep 1s sudo ip addr flush dev eth0 sudo ip link add name > > br-lan type bridge sudo ip link set dev eth0 master br-lan sudo ip > > link set dev bat0 master br-lan sudo ip link set up dev br-lan sudo > > batctl bl 1 sudo batctl gw server > > > > > > As far as I can see the bridge interface gets IP/connectivity from LAN > > a few seconds after the script quits. Are there steps correct or there > > are possible timing issues? > > We're using the same essid/channel for all originators > > It would be good to do "batctl bl 1" before adding bat0 to the bridge, > otherwise you are not protected. Other than that, it looks fine to me. Am I wrong or "batctl bl 1" is redundant? As far as I can see, according to batctl, BLA is turned on by default in gw mode. > Cheers, > Simon Regards, Francesco
smime.p7s
Description: S/MIME cryptographic signature