Hi, This is a copy from the bugtracker to put it up for discussion here on the mailing list. I will post it in two parts, corresponding to two comments. First part:
Structured writeup of the basic algorithm with a calculation of the upper bound of the load. (from https://bugs.freenetproject.org/view.php?id=3816#c12182 ) Goal: Deterministic network load by WoT bounded to a low, constant number of subscriptions and fetches. == Variables == - N the number of identities the OwnID gave positive trust. Can be assumed to be bounded to 150 active IDs (as by Dunbar’s number).⁰ - M a small constant for additional subscriptions, i.e. 10. - F a small constant for additional fetches per update, i.e. 10. ⁰: https://en.wikipedia.org/wiki/Dunbar's_number == Process == Subscribe to all rank 1 IDs (which have direct trust from your OwnID). These are the primary subscriptions. There are N primary subscriptions. Also subscribe to M rank2 IDs which were most recently updated. These have the highest probability of being updated again. This list must be checked whenever a rank2 ID is fetched successfully. Finally subscribe to M rank2 IDs chosen at random (secondary subscriptions) and M IDs of rank 3 or higher chosen at random (random subscriptions). When a secondary or random subscription yields an update, replace it with another ID of the same type (rank2 or rank3+) chosen at random. Also replace one secondary and one random subscription every hour. This ensures that WoT will always eventually see every update. If one of of the subscriptions yields an update, process all version hints. Queue these as fetches in a separate queue for rank2 and rank3+. At every update of a subscription (primary, secondary or random), choose F fetches from the respective fetch queue at random and process them. This bounds the network load to ((N × F) + (3M × F)) × update frequency. (from the review of bertm: It might be useful to add a queue of frequently updating rank3+ IDs, too, to offset the bias of the random queue towards inactive IDs. This could increase the cost by about 30%, which would still keep it in reasonable bounds) (also from the review: it’s important to deduplicate fetches and subscriptions: If we already have a subscription, there’s no use in starting a fetch, since the update will already have been seen) == Calculating the upper bound of the cost == To estimate an upper bound for the fetch frequency, we can use the twitter frequency, which is about 5 tweets per day on average and 10 to 50 for people with many followers¹ (those are more likely to be rank1 IDs of others). There are two possible extremes: Very hierarchic trust structure and egalitarian trust structure. Reality is likely a power-law structure. - In a hierarchic trust structure, we can assume that rank1 or rank2 IDs (trustee subscriptions) are all people with many followers, so we use 22 updates per day (as by ¹). - In an egalitarian trust structure we can assume 5 updates per day (as by ¹). For high frequency subscriptions we can assume 4 updates per hour for 16 hours per day, so 64 updates per day.⁰ For random subscriptions we can assume 5 updates per day (as by ¹). ¹: blog.hubspot.com/blog/tabid/6307/bid/4594/Is-22-Tweets-Per-Day-the-Optimum.aspx ← on the first google page, not robust, but should be good enough for this usecase. ((N × F) + (M × F)) × trustee update frequency + M × F × high update frequency + M × F × random update frequency. For a very hierarchic WoT (primaries are very active) this gives the upper bound: = (150 × 10 × 22) + (10 × 10 × 22) + (10 × 10 × 64) + (10 × 10 × 5) = (1500 × 22) + (100 × 22) + (100 × 64) + (100 × 5) = 33000 + 2200 + 6400 + 500 # primary triggered + random rank2 + active rank2 + random rank3+ = 42100 fetches per day ~ 30 fetches per minute. For an egalitarian trust structure (primaries have average activity) this gives the upper bound: = (150 × 10 × 5) + (10 × 10 × 5) + (10 × 10 × 64) + (10 × 10 × 5) = (1500 × 5) + (100 × 5) + (100 × 64) + (100 × 5) = 7500 + 500 + 6400 + 500 # primary triggered + random rank2 + active rank2 + random rank3+ = 14900 fetches per day ~ 10 fetches per minute. This gives a plausible upper bound of the network load per day from this scheme, assuming a very centralized WoT. The upper bound for a very hierarchic trust structure is dominated by the primary subscriptions. The upper bound for an egalitarian trust structure is dominated by the primary subscriptions and the high frequency subscriptions. The rank2 subscriptions and the random subscriptions together make up 7% of the network load. They are needed to guarantee that the WoT always eventually converges to a globally consistent view. The cost here is higher than the previous estimate, because the previous estimate assumed 1 update per day instead of the 22 updates per day or 5 updates per day used here. One fetch for an ID should transfer about 1KiB data. For a hierarchic WoT (one fetch per two seconds) this results in a maximum bandwidth consumption on a given node of 1KiB/s × hops. This is about 5KiB/s for the average of 5 hops — 50% of our minimum bandwidth. For an egalitarian WoT this results in a maximum bandwidth consumption on a given node of 0.3KiB/s × hops. This is about 1.5KiB/s for the average of 5 hops — 15% of our minimum bandwidth. The real bandwidth requirement should be lower, because IDs are cached well. The average total number of subscriptions to active IDs should be bounded to 190. ⁰: The cost of active IDs might be overestimated here, because WoT has an upper bound of one update per hour. In this case the cost of this algorithm would be reduced by about 30% for the egalitarian structure and by about 10% for the hierarchic structure. Best wishes, Arne -- Unpolitisch sein heißt politisch sein ohne es zu merken
signature.asc
Description: PGP signature
_______________________________________________ Devl mailing list [email protected] https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
