Re: [Babel-users] Restarting MeshPoint – seeking advice on routing for crisis/disaster scenarios

Henning Rogge Sat, 20 Dec 2025 08:39:54 -0800

Optimizing the protocol settings to the artificially stable testbed
sounds like a bad idea...


Do you expect everything to be perfectly stable and static?

Henning Rogge

On Sat, Dec 20, 2025 at 9:25 AM Benjamin Henrion <[email protected]> wrote:
>
> Hi,
>
> If your network consists of pretty static nodes (fixed routers on the roof), 
> you can tune your settings to update the routing less frequently.
>
> Adding and removing node can take time (30 minutes) it does not need to be 
> instant.
>
> Best,
>
> --
> Benjamin Henrion (zoobab)
> Email: zoobab at gmail.com
> Mobile: +32-484-566109
> Web: http://www.zoobab.com
> FFII.org Brussels
> "In July 2005, after several failed attempts to legalise software patents in 
> Europe, the patent establishment changed its strategy. Instead of explicitly 
> seeking to sanction the patentability of software, they are now seeking to 
> create a central European patent court, which would establish and enforce 
> patentability rules in their favor, without any possibility of correction by 
> competing courts or democratically elected legislators."
>
> Le ven. 19 déc. 2025, 18:18, Valent@MeshPoint <[email protected]> a 
> écrit :
>>
>> Hi everyone,
>> I'm working on a fair, reproducible benchmark methodology for comparing
>> mesh routing protocols (Babel, BATMAN-adv, Yggdrasil, and others).
>> Before
>> running the full benchmark, I'd like to get feedback from the Babel
>> community on the methodology.
>> BACKGROUND
>> ----------
>> We're using meshnet-lab (https://github.com/mwarning/meshnet-lab) for
>> testing, which creates virtual mesh networks using Linux network
>> namespaces
>> on a single host. This approach has limitations that we've documented,
>> and
>> I'd appreciate input on whether our methodology properly accounts for
>> them.
>> TEST ENVIRONMENT
>> ----------------
>>    Hardware: ThinkPad T14 laptop (12 cores, 16GB RAM)
>>    Software: meshnet-lab with network namespaces
>>    Protocols: babeld 1.13.x, batctl/batman-adv, yggdrasil 0.5.x
>> INFRASTRUCTURE LIMITATIONS DISCOVERED
>> -------------------------------------
>> During development, we found significant limitations when testing larger
>> networks:
>> 1. Supernode/Hub Bottleneck
>> When testing real Freifunk topologies (e.g., Bielefeld with 246 nodes),
>> we discovered that star topologies cause test infrastructure failures,
>> not protocol failures.
>> The issue: If a topology has a supernode (hub) connected to 200+ other
>> nodes, the meshnet-lab bridge for that hub receives ~60 hello
>> packets/second
>> from all neighbors. This causes:
>>    - UDP packet loss at the bridge level
>>    - Apparent "connectivity failures" that are actually infrastructure
>> artifacts
>>    - False negatives that make protocols look broken when they're not
>> Our solution: Cap maximum node degree at 20 and avoid pure star
>> topologies.
>> 2. Scale Limitations
>> We've validated that 100 nodes is a safe limit where:
>>    - CPU stays under 80%
>>    - Memory is not a bottleneck
>>    - Results are reproducible (variance < 10%)
>> For networks larger than ~250 nodes, single-host simulation becomes
>> unreliable regardless of available RAM. The bottleneck is CPU context
>> switching between namespaces and multicast flooding overhead.
>> 3. 1000+ Node Networks
>> We cannot reliably test 1000+ node networks with this methodology.
>> Any attempt would produce infrastructure artifacts, not protocol
>> measurements. For such scales, distributed testing across multiple
>> physical hosts would be needed.
>> PROPOSED TEST SUITE
>> -------------------
>> We've documented a methodology with:
>> 6 Topologies:
>>    T1: Grid 10x10 (100 nodes, max degree 4)
>>    T2: Random mesh (100 nodes, max degree ~10)
>>    T3: Clustered/federated (100 nodes, 4 clusters)
>>    T4: Linear chain (50 nodes, diameter 49)
>>    T5: Small-world Watts-Strogatz (100 nodes)
>>    T6: Sampled real Freifunk (80 nodes, degree capped)
>> 5 Validation Tests (before benchmarks):
>>    V1: 3-node sanity check
>>    V2: Scaling ladder (find breaking point)
>>    V3: Consistency check (reproducibility)
>>    V4: Resource monitoring
>>    V5: Bridge port audit
>> 8 Benchmark Scenarios:
>>    S1: Steady-state convergence
>>    S2: Node failure recovery
>>    S3: Lossy link handling (tc netem)
>>    S4: Mobility/roaming simulation
>>    S5: Network partition and merge
>>    S6: High churn (10% nodes cycling)
>>    S7: Traffic under load (iperf3)
>>    S8: Administrative complexity (subjective)
>> QUESTIONS FOR THE COMMUNITY
>> ---------------------------
>> 1. Missing tests?
>>     Are there scenarios important for Babel that we should add?
>> 2. Unrealistic tests?
>>     Should we skip any tests that don't make sense for real-world
>> evaluation?
>> 3. Babel-specific considerations?
>>     Any configuration parameters or behaviors we should specifically
>> measure?
>> 4. Large-scale alternatives?
>>     Does anyone have experience with distributed mesh testing across
>>     multiple hosts? How do you handle the coordination and measurement?
>> 5. Known limitations?
>>     Are there known Babel behaviors at scale that we should document
>> upfront?
>> INITIAL RESULTS
>> ---------------
>> Our initial tests with babeld show:
>>    Grid 100 nodes:       100% connectivity, ~14s convergence
>>    Chain 50 nodes:       100% connectivity, ~5s convergence
>>    Small-world 100 nodes: 100% connectivity, ~12s convergence
>> These results validate that the test infrastructure works correctly
>> for Babel at this scale.
>> FULL METHODOLOGY DOCUMENT
>> -------------------------
>> The complete methodology document attached.
>> I'd appreciate any feedback, suggestions, or concerns before we proceed
>> with the full benchmark.
>> Thanks,
>> Valent.
>>
>>
>> ------ Original Message ------
>> From "Juliusz Chroboczek" <[email protected]>
>> To "Linus Lüssing" <[email protected]>
>> Cc "Valent Turkovic" <[email protected]>;
>> [email protected]
>> Date 19.12.2025. 12:45:16
>> Subject Re: [Babel-users] Restarting MeshPoint – seeking advice on
>> routing for crisis/disaster scenarios
>>
>> >>  There's also l3roamd, predating sroamd:
>> >>
>> >>  https://github.com/freifunk-gluon/l3roamd
>> >
>> >That's right, I should have mentioned it.  I'll be sure to give proper
>> >credit if I ever come back to sroamd.
>> >
>> >For the record, sroamd is based on a combination of the ideas in l3roamd
>> >and in the PMIPv6 protocol, plus a fair dose of IS-IS.
>> >
>> >-- Juliusz_______________________________________________
>> Babel-users mailing list
>> [email protected]
>> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
>
> _______________________________________________
> Babel-users mailing list
> [email protected]
> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users

_______________________________________________
Babel-users mailing list
[email protected]
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Restarting MeshPoint – seeking advice on routing for crisis/disaster scenarios

Reply via email to