Hi,

This is a copy from the bugtracker to put it up for discussion here on
the mailing list. I will post it in two parts, corresponding to two
comments. First part:

Structured writeup of the basic algorithm with a calculation of the
upper bound of the load.

(from https://bugs.freenetproject.org/view.php?id=3816#c12182 )

Goal: Deterministic network load by WoT bounded to a low, constant number of 
subscriptions and fetches.


== Variables ==

- N the number of identities the OwnID gave positive trust. Can be assumed to 
be bounded to 150 active IDs (as by Dunbar’s number).⁰
- M a small constant for additional subscriptions, i.e. 10.
- F a small constant for additional fetches per update, i.e. 10.

⁰: https://en.wikipedia.org/wiki/Dunbar's_number


== Process ==

Subscribe to all rank 1 IDs (which have direct trust from your OwnID). These 
are the primary subscriptions. There are N primary subscriptions.

Also subscribe to M rank2 IDs which were most recently updated. These have the 
highest probability of being updated again. This list must be checked whenever 
a rank2 ID is fetched successfully.

Finally subscribe to M rank2 IDs chosen at random (secondary subscriptions) and 
M IDs of rank 3 or higher chosen at random (random subscriptions). When a 
secondary or random subscription yields an update, replace it with another ID 
of the same type (rank2 or rank3+) chosen at random. Also replace one secondary 
and one random subscription every hour. This ensures that WoT will always 
eventually see every update.

If one of of the subscriptions yields an update, process all version hints. 
Queue these as fetches in a separate queue for rank2 and rank3+.

At every update of a subscription (primary, secondary or random), choose F 
fetches from the respective fetch queue at random and process them. This bounds 
the network load to ((N × F) + (3M × F)) × update frequency.


(from the review of bertm: It might be useful to add a queue of frequently 
updating rank3+ IDs, too, to offset the bias of the random queue towards 
inactive IDs. This could increase the cost by about 30%, which would still keep 
it in reasonable bounds)

(also from the review: it’s important to deduplicate fetches and subscriptions: 
If we already have a subscription, there’s no use in starting a fetch, since 
the update will already have been seen)


== Calculating the upper bound of the cost ==

To estimate an upper bound for the fetch frequency, we can use the twitter 
frequency, which is about 5 tweets per day on average and 10 to 50 for people 
with many followers¹ (those are more likely to be rank1 IDs of others).

There are two possible extremes: Very hierarchic trust structure and 
egalitarian trust structure. Reality is likely a power-law structure.

- In a hierarchic trust structure, we can assume that rank1 or rank2 IDs 
(trustee subscriptions) are all people with many followers, so we use 22 
updates per day (as by ¹).
- In an egalitarian trust structure we can assume 5 updates per day (as by ¹).

For high frequency subscriptions we can assume 4 updates per hour for 16 hours 
per day, so 64 updates per day.⁰ For random subscriptions we can assume 5 
updates per day (as by ¹). 

¹: 
blog.hubspot.com/blog/tabid/6307/bid/4594/Is-22-Tweets-Per-Day-the-Optimum.aspx 
← on the first google page, not robust, but should be good enough for this 
usecase.

((N × F) + (M × F)) × trustee update frequency + M × F × high update frequency 
+ M × F × random update frequency. 

For a very hierarchic WoT (primaries are very active) this gives the upper 
bound:

= (150 × 10 × 22) + (10 × 10 × 22) + (10 × 10 × 64) + (10 × 10 × 5)
= (1500 × 22) + (100 × 22) + (100 × 64) + (100 × 5)
= 33000 + 2200 + 6400 + 500 # primary triggered + random rank2 + active rank2 + 
random rank3+
= 42100 fetches per day
~ 30 fetches per minute.

For an egalitarian trust structure (primaries have average activity) this gives 
the upper bound:

= (150 × 10 × 5) + (10 × 10 × 5) + (10 × 10 × 64) + (10 × 10 × 5)
= (1500 × 5) + (100 × 5) + (100 × 64) + (100 × 5)
= 7500 + 500 + 6400 + 500 # primary triggered + random rank2 + active rank2 + 
random rank3+
= 14900 fetches per day
~ 10 fetches per minute.

This gives a plausible upper bound of the network load per day from this 
scheme, assuming a very centralized WoT. The upper bound for a very hierarchic 
trust structure is dominated by the primary subscriptions. The upper bound for 
an egalitarian trust structure is dominated by the primary subscriptions and 
the high frequency subscriptions.

The rank2 subscriptions and the random subscriptions together make up 7% of the 
network load. They are needed to guarantee that the WoT always eventually 
converges to a globally consistent view.

The cost here is higher than the previous estimate, because the previous 
estimate assumed 1 update per day instead of the 22 updates per day or 5 
updates per day used here.

One fetch for an ID should transfer about 1KiB data. For a hierarchic WoT (one 
fetch per two seconds) this results in a maximum bandwidth consumption on a 
given node of 1KiB/s × hops. This is about 5KiB/s for the average of 5 hops — 
50% of our minimum bandwidth. For an egalitarian WoT this results in a maximum 
bandwidth consumption on a given node of 0.3KiB/s × hops. This is about 
1.5KiB/s for the average of 5 hops — 15% of our minimum bandwidth. The real 
bandwidth requirement should be lower, because IDs are cached well.

The average total number of subscriptions to active IDs should be bounded to 
190.


⁰: The cost of active IDs might be overestimated here, because WoT has an upper 
bound of one update per hour. In this case the cost of this algorithm would be 
reduced by about 30% for the egalitarian structure and by about 10% for the 
hierarchic structure.


Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein
ohne es zu merken

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Devl mailing list
[email protected]
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to