Hi Simon,

> Hi Francesco,
> 
> On Wednesday, September 12, 2018 10:44:36 AM CEST Francesco Salvatore
> [fabbricadigitale] wrote:
> > Hi Simon,
> >
> > > Hi Francesco,
> > >
> > > On Tuesday, September 11, 2018 4:38:13 PM CEST Francesco Salvatore
> > >
> > > [fabbricadigitale] wrote:
> > > > Hi all,
> > > > We're running a mesh network made of a cloud of clients and
> > > > multiple gateways on two separate VLANs (on eth0, not on top of
> BATMAN).
> > > > The setup is similar to the one described in the figure.
> > > > https://www.open-> >
> > > mesh.org/attachments/download/132/Test_2xLAN.dia.png
> > >
> > > > We noticed that, sometimes, when new gateways are added to the
> > > > already running infrastructure network loops appear on VLANs We
> > > > dumped VLANs network traffic during one of these loops and we saw
> > > > a storm of BLA frames that collapsed the network. It seems that
> > > > the frame (an ANNOUNCE one, in this case) was firstly generated by
> > > > a gateway and started to loop inside the LAN, and then even the
> > > > others gateways propagated the same frame. After a few seconds
> > > > also other frames (coming from different
> > > > gateways) started to loop.
> > > >
> > > > Our hypothesis is that one of gateways directly injects BLA frames
> > > > inside mesh and that lead to an unmanageable loop. So, we have 2
> > >
> > > questions:
> > > >         - Are BLA frames (except for LOOP DETECT) allowed to flow
only on
> > > >
> > > > LAN?
> > >
> > > Yes, all frames except LOOP DETECT are blocked in BATMAN
> > >
> > > >         - If so, is our hypothesis reasonable?
> > > >
> > > > You can see the situation described above in the screenshot below.
> > > > http://oi63.tinypic.com/v7wl1w.jpg
> > >
> > > Unfortunately the screenshot doesn't describe which packets looped
> > > exactly.
> > >
> > > Are you sure it's an announce frame? It could also be a claim frame
> > > where two hosts try to claim hosts from each other.
> >
> > As you can see here (http://oi66.tinypic.com/ofo5jn.jpg) the frame
> > that's looping is an ANNOUNCE one, and so are the ones coming from
> > Legra_55:3c:dc The last ANNOUNCE frame from those MACs were sent 10
> > seconds before they started looping, so it seems that at a certain
> > time one of the gateways started to forward BLA broadcast traffic from
> LAN to mesh.
> 
> That certainly looks like an announce frame. Do you see any other frames
in
> between, like claim frames?
> 
> Announces are also sent after a couple of claim frames upon a request
> (check batadv_bla_answer_request). We actually had a bug where
> inconsistencies among the BLA tables could happen, but that was fixed
> before 2017.3 ...

BLA traffic seems regular. This
(https://mega.nz/#!9ZkmharA!S9mFxvpnnnseu_l8H7MPfoZ7X1Ef0lNrJLVQOpgTg4w) is
a dump of  the broadcast traffic captured from LAN ports of four gateway (on
two separate VLANs). As you can see loop starts at packet 2660.
The four gateways are:
.       00:0f:00:68:97:e4 (Bridge IP 10.140.0.61)
.       00:0f:00:68:9f:4b (Bridge IP 10.140.0.17)
.       00:0f:00:68:96:66 (Bridge IP 10.140.16.19)
.       00:0f:00:55:3c:dc (Bridge IP 10.140.16.61)

> > > BATMAN has a grace period to allow broadcasts from the LAN only
> > > after 1 minute of operation. This is done to make sure that the mesh
> > > is properly established and other gateways and their claims are
> > > detected before
> >
> > traffic is
> >
> > > allowed on it, at least potentially looping traffic. Therefore, you
> > > should
> >
> > make
> >
> > > sure (e.g. in your firmware or setup scripts) that the LAN is
> > > operational
> >
> > once
> >
> > > batman is brought op.
> > >
> > > If the mesh isn't fully established or it's actually split due to
> >
> > different
> >
> > > channels or similar, then you may run in an unresolved limitation of
BLA:
> > >
> > > https://www.open-mesh.org/projects/batman-adv/wiki/Bridge-loop-> >
> > > avoidance-II#Limitations
> > >
> > > For this reason we have the loop detect packets. If a loop is
> > > detected, an uevent is sent to userspace, and the firmware should
> > > react appropiately,
> >
> > e.g.
> >
> > > by shutting down batman-adv.
> >
> > We start gateways with this script placed in rc.local
> >
> > sudo pkill wpa_supplicant
> > sudo modprobe batman-adv
> > sudo ip link set wlan0 down
> > sleep 2s
> > sudo iwconfig wlan0  mode ad-hoc
> > sudo iwconfig wlan0 essid mesh-network sudo iwconfig wlan0 ap any sudo
> > iwconfig wlan0 channel 44 sudo ip link set wlan0 up sudo batctl if add
> > wlan0 sleep 1s sudo ip addr flush dev eth0 sudo ip link add name
> > br-lan type bridge sudo ip link set dev eth0 master br-lan sudo ip
> > link set dev bat0 master br-lan sudo ip link set up dev br-lan sudo
> > batctl bl 1 sudo batctl gw server
> >
> >
> > As far as I can see the bridge interface gets IP/connectivity from LAN
> > a few seconds after the script quits. Are there steps correct or there
> > are possible timing issues?
> > We're using the same essid/channel for all originators
> 
> It would be good to do "batctl bl 1" before adding bat0 to the bridge,
> otherwise you are not protected. Other than that, it looks fine to me.


Am I wrong or "batctl bl 1" is redundant? As far as I can see, according to
batctl, BLA is turned on by default in gw mode.

> Cheers,
>       Simon

Regards,
Francesco

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to