Re: Restarting MeshPoint – seeking advice on routing for crisis/disaster scenarios

Simon Wunderlich Mon, 05 Jan 2026 05:14:04 -0800

On Monday, January 5, 2026 1:12:43 PM Central European Standard Time 
Valent@MeshPoint wrote:
> Hi Simon,
> 
> Thank you very much for the detailed reply and the practical
> suggestions. The timing is no problem at all – I completely understand
> how busy things get around the holidays.
> 
> I should mention some background: I was an active member of wlan
> slovenija and the Otvorena mreža (Open Network) project in Croatia. We
> are now restarting nodewatcher – our system for node monitoring and
> firmware generation for community networks in Croatia and Slovenia.


Ah nice! I remember the name nodewatcher from WLAN Slovenija, although I must 
admit I haven't used nodewatcher so far. I think Mitar and others were talking 
about it in the past ... Since battlemesh in croatia is coming up for 2026, 
now is a good time to revive. :) We might have met in Battlemesh v8 in 
Maribor, but I must admit, my memory is a bit blurry ...

> 
> Our approach is a bit different from Gluon. Instead of a unified
> firmware image, nodewatcher generates custom firmware per node with all
> parameters pre-configured: subnets, channel assignments, interface
> roles, etc. This lets us handle complexity on the backend so that end
> users just flash the image and everything works – no wizard, no
> configuration choices that might confuse home users. We will certainly
> look at Gluon's technical choices for batman-adv tuning, but we prefer
> this 'keep it simple' deployment model.

OK, that's interesting. Gluon is also made for simplicity, you can get away 
with just flashing the firmware and no further configuration. However, 
typically 
you may want to enter the setup mode to set coordinates and a name of your AP, 
though. IPs/subnets are automatically assigned from a centralized VPN server. 
Either way, keeping it simple is definitely a good idea.

> 
> Some history: when building the mesh networks in Slovenia and Croatia,
> we started with OLSR. It worked well initially, but once we crossed ~300
> nodes we hit serious scaling limits. Around that time (6-7 years ago) we
> were aware that Freifunk communities in Germany were also experiencing
> scaling issues with batman-adv, so we chose to migrate to Babel instead.
> Babel served us well and we never looked back.
> 
> Now, as we restart MeshPoint and consider protocol options again, I am
> genuinely curious:
> 
> 1. How has batman-adv addressed the scaling problem over the past 7
> years? Since it operates at L2, there is inherently more broadcast
> traffic. Do larger Freifunk networks segment into smaller batman-adv
> domains connected via something else, or has the protocol itself
> improved to handle hundreds of nodes in a single domain?

There are a few mechanisms in batman-adv, such as DAT (distributed ARP tables) 
and multicast extensions to keep traffic at an acceptable limit. There are also 
some changes to only send broadcasts only once on VPN links, or omit re-
broadcasts entirely those links.

However, the general scaling issues still apply. Therefore, gluon applies a 
few parameters as outlined in my last mail. Hundreds of nodes are possible 
(we've operated ~300 in our local Freifunk community), but with those numbers 
there is quite a lot of "background" noise. Since we had a few slow DSL links 
within our VPN servers, we saw a bottleneck there already ...

In our Freifunk community, we segment into smaller domains (per city/town), 
and I think others do the same - we are targeting up to ~100 -150 nodes per 
segment.

Gluon also applies various firewall rules to avoid service discovery (e.g. 
Avahi), Multicast DNS, etc over the mesh network, which are often way too 
chatty with a few hundreds or even thousands of users.

> 2. Are there established patterns for combining batman-adv with overlay
> networks or L3 routing? For example, batman-adv for local mesh segments
> with BGP or Babel connecting segments at gateways?

We use a couple of VPN servers which run DHCP with different subnet per segment 
(e.g. per town/city). Subnets are connected with each other using layer 3 
routing on those VPN servers. We use bird to for the IP routing part, and GRE 
tunnels to connect to our upstream Internet.

> 
> 3. For mobile crisis deployments where topology changes constantly, is
> pure batman-adv still recommended, or do experienced operators use
> hybrid approaches?

That I can't answer, maybe someone else has experience. :) Setting up the 
layer 3 network requires quite a bit of engineering I would say.

> 
> These are deeper architectural questions – I understand if the answers
> are 'it depends' or require longer discussion. Any pointers to
> documentation, mailing list threads, or real-world deployment writeups
> would be very helpful.
> 
> Thank you again for your time and the work you and the team have put
> into batman-adv over the years.

Those answers were mostly around gluon and our local Freifunk network 
(Freifunk Vogtland [1]). Since this is open source, you can easily review and 
perhaps adopt some parts for your case. Other gluon and Freifunk networks may 
operate differently. Perhaps there are other mailing list readers who want to 
chime in with other projects. :)

Cheers,
       Simon

[1] https://github.com/FreifunkVogtland

> 
> Best regards,
> Valent
> 
> 
> ------ Original Message ------
> From "Simon Wunderlich" <[email protected]>
> To [email protected]
> Cc "Valent@MeshPoint" <[email protected]>
> Date 5.1.2026. 10:07:44
> Subject Re: Restarting MeshPoint – seeking advice on routing for
> crisis/disaster scenarios
> 
> >Hi Valent,
> >
> >thank you for your interest and sorry for the late reply. The time before
> >Christmas is usually a bit hectic ...
> >
> >I would suggest to look into the "gluon" Freifunk Firmware [1], including
> >the batman-adv parameters made there. There are setups with a couple of
> >hundred nodes, although some sparsely connected over cities. Those setups
> >have been used and tested for a long time on different types of hardware.
> >
> >A few general suggestions for tuning for those scenarios are:
> >
> >* set up a high multicast rate, at least 12 MBit/s, perhaps 24 or more. You
> >will trade scalability with range
> >
> >* choose a higher than default OGM interval, e.g. 5 seconds instead of 1
> >second. This makes batman-adv reaction time slower, but helps scaling with
> >many nodes. Each node would repeat any other nodes OGM messages, which
> >results in O(N^2) OGM messages per interval.
> >
> >* if you don't need encryption (SAE), turn it off. SAE by default does a
> >peer- to-peer handshake, which can kill a dense network with many
> >participants in one place, if everyone wants to handshake with everyone
> >else at the same time.
> >
> >There are a few more things (e.g. reducing basic rates) which you may find
> >in the gluon firmware and other places.
> >
> >As you can see, some of those suggestions are more Wi-Fi layer specific
> >than batman-adv specific, and would help with other protocols (e.g. babel)
> >as well. From my experience with network simulators/emulators, you may
> >verify protocol specific behavior (e.g. number of messages, failover time)
> >to some extent. But testing Wi-Fi specific scaling effects like failing
> >SAE handshakes, effects of multicast rates, etc is rather hard - even if
> >you use emulators based on mac80211_hwsim or so which partially emulate
> >802.11. For those experiments, it's best to actually set up 10-20 devices
> >...
> >
> >Cheers,
> >
> >         Simon
> >
> >[1] https://gluon.readthedocs.io/en/latest/
> >
> >On Saturday, December 20, 2025 11:43:20 PM Central European Standard Time
> >
> >Valent@MeshPoint wrote:
> >>  Hello,
> >>  
> >>  I wanted to follow up on my previous message. I did not see any replies,
> >>  so I hope it is ok to share one concrete finding from recent testing in
> >>  case it helps the discussion.
> >>  
> >>  To move beyond purely theoretical arguments, I have been running large
> >>  scale tests using meshnet lab
> >>  https://github.com/mwarning/meshnet-lab
> >>  
> >>  The main reason for choosing it is that it allows replaying real world
> >>  community network topologies, including Freifunk graphs, instead of
> >>  relying on synthetic grids or ideal meshes. This makes it easier to
> >>  observe behaviour under sparse, asymmetric, and imperfect conditions
> >>  that are closer to what actually gets deployed.
> >>  
> >>  One interesting observation so far is that results can vary
> >>  significantly depending on how nodes are brought up and how control
> >>  plane load interacts with the topology. In other words, the same
> >>  protocol on the same topology can behave very differently depending on
> >>  timing, churn, and scale effects, even when the underlying links are
> >>  identical. This was not obvious to me before testing at this scale.
> >>  
> >>  I am curious whether others here have used meshnet lab or similar
> >>  namespace based emulation tools for BATMAN adv testing, and if so,
> >>  whether your observations matched real deployments closely, or if there
> >>  are known caveats when interpreting the results.
> >>  
> >>  Any feedback or pointers would be appreciated.
> >>  
> >>  Best regards,
> >>  Valent
> >>  
> >>  
> >>  ------ Original Message ------
> >>  
> >>  >From "Valent Turkovic" <[email protected]>
> >>  
> >>  To [email protected]
> >>  Date 16.12.2025. 16:37:01
> >>  Subject Restarting MeshPoint – seeking advice on routing for
> >>  crisis/disaster scenarios
> >>  
> >>  >Hi everyone,
> >>  >
> >>  >My name is Valent Turkovic.
> >>  >
> >>  >Between 2015 and 2018 I ran the MeshPoint project – a simple, rugged
> >>  >Wi-Fi
> >>  >hotspot designed to work in very tough conditions.
> >>  >
> >>  >During the refugee crisis in Croatia we deployed these devices in camps
> >>  >and
> >>  >transit centers, providing internet connectivity for humanitarian
> >>  >organizations such as the Red Cross, UNICEF, IOM, Greenpeace, and many
> >>  >smaller NGOs. Through these deployments, more than 500,000 people were
> >>  >able to stay connected. The same system was also used in flood response
> >>  >and other emergency situations. The project received the “Best
> >>  >Humanitarian Tech of the Year” award at The Europas in 2016.
> >>  >
> >>  >Unfortunately, financial constraints forced me to pause the project
> >>  >after
> >>  >2018. It was entirely self-funded, and the prolonged stress eventually
> >>  >led
> >>  >to long-term health issues.
> >>  >
> >>  >Over the years I stayed in contact with first responders and field
> >>  >teams
> >>  >from organizations such as WFP, UNICEF, the Red Cross, and various
> >>  >NGOs.
> >>  >The feedback has remained consistent: when disasters strike, whether
> >>  >earthquakes, floods, or large-scale displacement, teams still struggle
> >>  >to
> >>  >bring up reliable communications quickly. What they need most is a mesh
> >>  >network that works within minutes, not hours or days, and that
> >>  >continues
> >>  >operating on battery power when infrastructure is down.
> >>  >
> >>  >I am fully aware that in active conflict zones Wi-Fi can be jammed or
> >>  >restricted, for example due to drone countermeasures. However, there
> >>  >are
> >>  >many other scenarios where Wi-Fi mesh remains extremely valuable:
> >>  >evacuation centers, field hospitals, temporary shelters, flood-affected
> >>  >villages, and coordination points for responders. In these
> >>  >environments,
> >>  >fast, robust, and easy-to-deploy networking makes a very real
> >>  >difference
> >>  >for coordination, family contact, and medical or logistical data
> >>  >sharing.
> >>  >
> >>  >Because of this, I am now restarting the project as MeshPoint V2. The
> >>  >focus
> >>  >is updated hardware, improved battery life, and even simpler
> >>  >deployment,
> >>  >while keeping the original goal: crisis response and off-grid or
> >>  >underserved communities.
> >>  >
> >>  >In the original MeshPoint we used Babel. This was largely driven by
> >>  >practical constraints at the time: our deployment tooling was based on
> >>  >Nodewatcher, which was Babel-only. Technically, Babel served us very
> >>  >well.
> >>  >It converged fast, was reliable, and worked nicely for small to
> >>  >medium-sized networks.
> >>  >
> >>  >At the same time, I am well aware that many community networks and
> >>  >real-world mesh deployments successfully used batman-adv, often through
> >>  >Gluon or custom firmware builds. In larger, more dynamic, or highly
> >>  >mobile
> >>  >topologies typical for crisis scenarios, the layer-2 approach and
> >>  >seamless
> >>  >mobility properties of batman-adv are very attractive, especially when
> >>  >nodes are frequently moved, powered on and off, or replaced in the
> >>  >field.
> >>  >
> >>  >For MeshPoint V2 I am evaluating batman-adv and would appreciate
> >>  >insights
> >>  >on the following aspects, specifically in the context of crisis and
> >>  >emergency deployments:
> >>  >
> >>  >Behaviour at larger scale in real deployments
> >>  >In crisis scenarios networks often start small but can grow quickly as
> >>  >more
> >>  >nodes are deployed by different teams or organizations. We are
> >>  >interested
> >>  >in how batman-adv behaves when scaling to hundreds or more nodes in
> >>  >non-ideal, real-world conditions, without centralized planning and with
> >>  >limited ability for on-site tuning.
> >>  >
> >>  >Performance in sparse or highly mobile topologies
> >>  >Nodes in the field are frequently moved, turned off, replaced, or
> >>  >temporarily isolated. Vehicles, backpacks, and mobile command posts
> >>  >constantly change network topology. We are looking for practical
> >>  >experience with how well batman-adv handles frequent topology changes,
> >>  >intermittent links, and sparse node placement without requiring
> >>  >constant
> >>  >manual intervention.
> >>  >
> >>  >Suitability for battery-powered and intermittently connected nodes
> >>  >Many nodes run on battery for long periods and may sleep, reboot, or
> >>  >disappear entirely when power is lost. Low overhead, predictable
> >>  >behaviour, and fast recovery after reconnect are essential. We are
> >>  >particularly interested in any known trade-offs between routing
> >>  >performance, control traffic, and power consumption in such
> >>  >environments.
> >>  >
> >>  >If there is existing work, documented limitations, field experience, or
> >>  >design guidance relevant to these constraints, pointers would be
> >>  >greatly
> >>  >appreciated. The goal is to build a system that field teams can deploy
> >>  >and
> >>  >rely on under stress, without requiring deep networking expertise on
> >>  >site.
> >>  >
> >>  >Thank you for your time, and thank you to everyone who has contributed
> >>  >to
> >>  >making mesh networking viable outside of labs and into real-world,
> >>  >high-stakes situations.
> >>  >
> >>  >Best regards,
> >>  >Valent Turkovic
> >>  >https://www.meshpointone.com/
> >>  >
> >>  >Technical specifications of the original MeshPoint (for reference):
> >>  >https://www.meshpointone.com/technical-specifications/

Re: Restarting MeshPoint – seeking advice on routing for crisis/disaster scenarios

Reply via email to