Hi Simon,

Thank you very much for the detailed reply and the practical suggestions. The timing is no problem at all – I completely understand how busy things get around the holidays.

I should mention some background: I was an active member of wlan slovenija and the Otvorena mreža (Open Network) project in Croatia. We are now restarting nodewatcher – our system for node monitoring and firmware generation for community networks in Croatia and Slovenia.

Our approach is a bit different from Gluon. Instead of a unified firmware image, nodewatcher generates custom firmware per node with all parameters pre-configured: subnets, channel assignments, interface roles, etc. This lets us handle complexity on the backend so that end users just flash the image and everything works – no wizard, no configuration choices that might confuse home users. We will certainly look at Gluon's technical choices for batman-adv tuning, but we prefer this 'keep it simple' deployment model.

Some history: when building the mesh networks in Slovenia and Croatia, we started with OLSR. It worked well initially, but once we crossed ~300 nodes we hit serious scaling limits. Around that time (6-7 years ago) we were aware that Freifunk communities in Germany were also experiencing scaling issues with batman-adv, so we chose to migrate to Babel instead. Babel served us well and we never looked back.

Now, as we restart MeshPoint and consider protocol options again, I am genuinely curious:

1. How has batman-adv addressed the scaling problem over the past 7 years? Since it operates at L2, there is inherently more broadcast traffic. Do larger Freifunk networks segment into smaller batman-adv domains connected via something else, or has the protocol itself improved to handle hundreds of nodes in a single domain?

2. Are there established patterns for combining batman-adv with overlay networks or L3 routing? For example, batman-adv for local mesh segments with BGP or Babel connecting segments at gateways?

3. For mobile crisis deployments where topology changes constantly, is pure batman-adv still recommended, or do experienced operators use hybrid approaches?

These are deeper architectural questions – I understand if the answers are 'it depends' or require longer discussion. Any pointers to documentation, mailing list threads, or real-world deployment writeups would be very helpful.

Thank you again for your time and the work you and the team have put into batman-adv over the years.

Best regards,
Valent


------ Original Message ------
From "Simon Wunderlich" <[email protected]>
To [email protected]
Cc "Valent@MeshPoint" <[email protected]>
Date 5.1.2026. 10:07:44
Subject Re: Restarting MeshPoint – seeking advice on routing for crisis/disaster scenarios

Hi Valent,

thank you for your interest and sorry for the late reply. The time before
Christmas is usually a bit hectic ...

I would suggest to look into the "gluon" Freifunk Firmware [1], including the
batman-adv parameters made there. There are setups with a couple of hundred
nodes, although some sparsely connected over cities. Those setups have been
used and tested for a long time on different types of hardware.

A few general suggestions for tuning for those scenarios are:

* set up a high multicast rate, at least 12 MBit/s, perhaps 24 or more. You
will trade scalability with range

* choose a higher than default OGM interval, e.g. 5 seconds instead of 1
second. This makes batman-adv reaction time slower, but helps scaling with
many nodes. Each node would repeat any other nodes OGM messages, which results
in O(N^2) OGM messages per interval.

* if you don't need encryption (SAE), turn it off. SAE by default does a peer-
to-peer handshake, which can kill a dense network with many participants in
one place, if everyone wants to handshake with everyone else at the same time.

There are a few more things (e.g. reducing basic rates) which you may find in
the gluon firmware and other places.

As you can see, some of those suggestions are more Wi-Fi layer specific than
batman-adv specific, and would help with other protocols (e.g. babel) as well.
From my experience with network simulators/emulators, you may verify protocol
specific behavior (e.g. number of messages, failover time) to some extent. But
testing Wi-Fi specific scaling effects like failing SAE handshakes, effects of
multicast rates, etc is rather hard - even if you use emulators based on
mac80211_hwsim or so which partially emulate 802.11. For those experiments,
it's best to actually set up 10-20 devices ...

Cheers,
        Simon

[1] https://gluon.readthedocs.io/en/latest/

On Saturday, December 20, 2025 11:43:20 PM Central European Standard Time
Valent@MeshPoint wrote:
 Hello,

 I wanted to follow up on my previous message. I did not see any replies,
 so I hope it is ok to share one concrete finding from recent testing in
 case it helps the discussion.

 To move beyond purely theoretical arguments, I have been running large
 scale tests using meshnet lab
 https://github.com/mwarning/meshnet-lab

 The main reason for choosing it is that it allows replaying real world
 community network topologies, including Freifunk graphs, instead of
 relying on synthetic grids or ideal meshes. This makes it easier to
 observe behaviour under sparse, asymmetric, and imperfect conditions
 that are closer to what actually gets deployed.

 One interesting observation so far is that results can vary
 significantly depending on how nodes are brought up and how control
 plane load interacts with the topology. In other words, the same
 protocol on the same topology can behave very differently depending on
 timing, churn, and scale effects, even when the underlying links are
 identical. This was not obvious to me before testing at this scale.

 I am curious whether others here have used meshnet lab or similar
 namespace based emulation tools for BATMAN adv testing, and if so,
 whether your observations matched real deployments closely, or if there
 are known caveats when interpreting the results.

 Any feedback or pointers would be appreciated.

 Best regards,
 Valent


 ------ Original Message ------

 >From "Valent Turkovic" <[email protected]>

 To [email protected]
 Date 16.12.2025. 16:37:01
 Subject Restarting MeshPoint – seeking advice on routing for
 crisis/disaster scenarios

 >Hi everyone,
 >
 >My name is Valent Turkovic.
 >
 >Between 2015 and 2018 I ran the MeshPoint project – a simple, rugged Wi-Fi
 >hotspot designed to work in very tough conditions.
 >
 >During the refugee crisis in Croatia we deployed these devices in camps and
 >transit centers, providing internet connectivity for humanitarian
 >organizations such as the Red Cross, UNICEF, IOM, Greenpeace, and many
 >smaller NGOs. Through these deployments, more than 500,000 people were
 >able to stay connected. The same system was also used in flood response
 >and other emergency situations. The project received the “Best
 >Humanitarian Tech of the Year” award at The Europas in 2016.
 >
 >Unfortunately, financial constraints forced me to pause the project after
 >2018. It was entirely self-funded, and the prolonged stress eventually led
 >to long-term health issues.
 >
 >Over the years I stayed in contact with first responders and field teams
 >from organizations such as WFP, UNICEF, the Red Cross, and various NGOs.
 >The feedback has remained consistent: when disasters strike, whether
 >earthquakes, floods, or large-scale displacement, teams still struggle to
 >bring up reliable communications quickly. What they need most is a mesh
 >network that works within minutes, not hours or days, and that continues
 >operating on battery power when infrastructure is down.
 >
 >I am fully aware that in active conflict zones Wi-Fi can be jammed or
 >restricted, for example due to drone countermeasures. However, there are
 >many other scenarios where Wi-Fi mesh remains extremely valuable:
 >evacuation centers, field hospitals, temporary shelters, flood-affected
 >villages, and coordination points for responders. In these environments,
 >fast, robust, and easy-to-deploy networking makes a very real difference
 >for coordination, family contact, and medical or logistical data sharing.
 >
 >Because of this, I am now restarting the project as MeshPoint V2. The focus
 >is updated hardware, improved battery life, and even simpler deployment,
 >while keeping the original goal: crisis response and off-grid or
 >underserved communities.
 >
 >In the original MeshPoint we used Babel. This was largely driven by
 >practical constraints at the time: our deployment tooling was based on
 >Nodewatcher, which was Babel-only. Technically, Babel served us very well.
 >It converged fast, was reliable, and worked nicely for small to
 >medium-sized networks.
 >
 >At the same time, I am well aware that many community networks and
 >real-world mesh deployments successfully used batman-adv, often through
 >Gluon or custom firmware builds. In larger, more dynamic, or highly mobile
 >topologies typical for crisis scenarios, the layer-2 approach and seamless
 >mobility properties of batman-adv are very attractive, especially when
 >nodes are frequently moved, powered on and off, or replaced in the field.
 >
 >For MeshPoint V2 I am evaluating batman-adv and would appreciate insights
 >on the following aspects, specifically in the context of crisis and
 >emergency deployments:
 >
 >Behaviour at larger scale in real deployments
 >In crisis scenarios networks often start small but can grow quickly as more
 >nodes are deployed by different teams or organizations. We are interested
 >in how batman-adv behaves when scaling to hundreds or more nodes in
 >non-ideal, real-world conditions, without centralized planning and with
 >limited ability for on-site tuning.
 >
 >Performance in sparse or highly mobile topologies
 >Nodes in the field are frequently moved, turned off, replaced, or
 >temporarily isolated. Vehicles, backpacks, and mobile command posts
 >constantly change network topology. We are looking for practical
 >experience with how well batman-adv handles frequent topology changes,
 >intermittent links, and sparse node placement without requiring constant
 >manual intervention.
 >
 >Suitability for battery-powered and intermittently connected nodes
 >Many nodes run on battery for long periods and may sleep, reboot, or
 >disappear entirely when power is lost. Low overhead, predictable
 >behaviour, and fast recovery after reconnect are essential. We are
 >particularly interested in any known trade-offs between routing
 >performance, control traffic, and power consumption in such environments.
 >
 >If there is existing work, documented limitations, field experience, or
 >design guidance relevant to these constraints, pointers would be greatly
 >appreciated. The goal is to build a system that field teams can deploy and
 >rely on under stress, without requiring deep networking expertise on site.
 >
 >Thank you for your time, and thank you to everyone who has contributed to
 >making mesh networking viable outside of labs and into real-world,
 >high-stakes situations.
 >
 >Best regards,
 >Valent Turkovic
 >https://www.meshpointone.com/
 >
 >Technical specifications of the original MeshPoint (for reference):
 >https://www.meshpointone.com/technical-specifications/




Reply via email to