On Tue, May 26, 2009 at 4:02 PM, xor <xor at gmx.li> wrote: > On Thursday 07 May 2009 11:23:51 Evan Daniel wrote: >> >> > >> > Why exactly? Your post is nice but I do not see how it answers my >> > question. The general problem my post is about: New identities are >> > obtained by taking them from trust lists of known identities. An attacker >> > therefore could put 1000000 identities in his trust list to fill up your >> > database and slow down WoT. Therefore, an decision has to be made when to >> > NOT import new identities from someone's trust list. In the current >> > implementation, it is when he has a negative score. >> > > [...] >> >> I have not examined the WoT code. ?However, the Advogato metric has >> two attributes that I don't think the current WoT method has: no >> negative trust behavior (if there is a trust rating Bob can assign to >> Carol such that Alice will trust Carol less than if Bob had not >> assigned a rating, that's a negative trust behavior), and a >> mathematical proof as to the upper limit on the quantity of spammer >> nodes that get trusted. >> >> The Advogato metric is *specifically* designed to handle the case of >> the attacker creating millions of accounts. ?In that case, his success >> is bounded (linear with modest constant) by the number of confused >> nodes -- that is, legitimate nodes that have (incorrectly) marked his >> accounts as legitimate. ?If you look at the flow computation, it >> follows that for nodes for which the computed trust value is zero, you >> don't have to bother downloading their trust lists, so the number of >> such lists you download is similarly well controlled. > > I have read your messages again and all your new messages and you are so > convinced about advogato that I'd like to ask you more questions about how it > would work, I don't want you to feel like everyone is ignoring you :) > (- I am more of a programmer right now than a designer of algorithms, I > concentrate on spending most available time on *implementing* WoT/FT because > nobody else is doing it and it needs to get done... so I have not talked much > in this discussion)
Well... to be fair, I'm not actually completely certain it will work. I do, however, think that it has a lot of potential. I don't know any way to get the answer short of running the experiment, and I'm very optimistic about the results. I firmly expect them to be good, but not perfect. Your questions are certainly welcome :) > > Consider the following case, using advogato and not the current FMS/WoT > alchemy: > > 1. Identity X is an occasional and trustworthy poster. X has received many > positive trust values from hundreds of identities because it has posted > hundreds of messages over the months, so it has a high score and capacity to > give trust values, and all newbies will know about the identity and it's high > score because it is well-integrated into the trust graph. Careful: Advogato doesn't assign trust "scores" in the same sense that FMS and WoT do. Because X is trusted by many identities, many identities can reach it, and therefore accept it. That is a purely binary consideration -- it does not matter directly that it is reachable by many paths. Because many identities link to X, X is only a short distance away from many identities. When A calculates his trust graph, X is likely to be nearby. However, even if X is poorly connected, this will be true for some identities; the connectivity changes how likely it is. Capacity of a node is determined (in the base algorithm; there are tweaks worth considering) only by distance, nothing else. Whether that capacity actually limits anything or not depends on a variety of factors. If there aren't enough downstream nodes, then it isn't needed. If the upstream nodes spend their capacity elsewhere, there might not be enough available to fill it -- here is the other place that X being well connected matters. > > 2. Now a spammer gets a single identity Y onto the trust list of X by solving > a captcha, his score is very low because he has only solved a captcha but the > score is there. Therefore, any newbie will see Y because X is well-integrated > into the WoT Correct. > > 3. X is gone for quite some time due to inactivity, during that time Y creates > 500 spam identities on his trust list and starts to spam all boards. X will > not remove Y from his trust list because he is *away* for weeks. Several points. First, one of the optimizations worth considering is tightly limiting the capacity of any identity that only has captcha level trust. This means that newbies have to solve captchas from identities that have received manual trust, which is easy enough to determine. It also means that though our spammer lists 500 fake ids, other people will only accept a very small number of them -- possibly as low as zero, if the captcha trust only nodes are limited to capacity 1. So most of those ids are worthless, and spam is contained. This is one of the weaknesses of the simplest implementation (no limits on captcha-only ids, that is use the algorithm exactly as described in the paper). If one of the confused nodes (X) is close to the root, then while the spam problem is still linearly bounded, the coefficient is large. (The precise bound is sum(C_i - 1) for all confused nodes, where C_i is the capacity of the confused node.) > > 4. Newbies will see the 500 spam identities and their spam because everyone > trusts X, and X trusts Y. Newbies will NOT know how to block anything because > they are newbies. Most people understand the spam button in their email clients. I suspect our newbies will as well. They'll mark several spam messages, one per fake id. If Y only has captcha level trust, there won't even be very many of them. If Y has manual trust, it's more complicated. Toad and I both had some thoughts on this problem in other emails. To summarize them: - After marking several of Y's child identities as spammers, the plugin could point out the source of their trust and suggest marking Y as a spammer. This may be nontrivial in the case of a spammer building a complex web downstream of Y instead of spamming directly with Y's children. It's also nontrivial if Y appears to be a valid poster himself. This is a reason to distinguish message trust and trust list trust. (In this context, the algorithm effectively runs solely on trust list trust. Message trust would create a link that permitted flow, but did not allow that flow to proceed past the identity in question.) - Ultimatums help with this, but have their own downsides. With or without ultimatums, the question of what to do about X's trust list is tricky. Among other things, it may be worth ignoring trust lists from inactive identities, or lowering the threshold on ignoring them. - Marking someone a spammer could be a published thing, and when others calculate flow it prevents a flow path from reaching that identity, even through intermediates. So if our newbie A trusts B trusts X, who is out of town and trusting Y, then the first thing that happens is B marks Y as a spammer. Then, when A is calculating trust, flow goes from A to B to X, but then does not reach Y. This limits the flow to Y by reducing the upstream capacity available. However, it probably doesn't have a very profound effect until there is a cut set of such B's that all mark Y (or Y's chldren) as spammers. (I have reservations about this; see my earlier mail for more details.) > > 5. Now the *core* task of the WoT is in question: How can we as the community > make the spam-identities introduced by Y disappear with advogato trust > metrics, without negative trust?? > > - As you've said, we cannot take away the trust which Y receives from X > because that is THE attribute of non-negative-trust-metrics. > - Further, we cannot cause EVERYONE who has trusted X to remove the trust > value because X is in way too many trust lists of idle people, etc. > - So what can we do with advogato, if we are the community and want to mark Y > as the root of evil? That is fundamentally a hard problem. - Advogato is not perfect. I am certain there will be some amount of spam getting through; hopefully it will be a small amount. - With Advogato, the amount of spam possible is well defined. With FMS and WoT it is not. Neither of them have an upper bound on the amount of spam. - Being too good at solving the spam problem means we are too good at mob censorship. Both are problems. In practice, the goal should be to strike an appropriate balance between the two, not simply to eliminate spam. - I believe that Advogato is capable of limiting spam to levels where the system is usable, even in the case of reasonably determined spammers. If the most they can aspire to is being a nuisance, I don't think the spammers will be as interested. If spamming takes work and doesn't do all that much, they'll give up. The actual amount of spam seen in practice should be well below the worst possible case -- if and only if the worst case isn't catastrophic. Evan Daniel
