On Tue, May 26, 2009 at 4:02 PM, xor <xor at gmx.li> wrote:
> On Thursday 07 May 2009 11:23:51 Evan Daniel wrote:
>>
>> >
>> > Why exactly? Your post is nice but I do not see how it answers my
>> > question. The general problem my post is about: New identities are
>> > obtained by taking them from trust lists of known identities. An attacker
>> > therefore could put 1000000 identities in his trust list to fill up your
>> > database and slow down WoT. Therefore, an decision has to be made when to
>> > NOT import new identities from someone's trust list. In the current
>> > implementation, it is when he has a negative score.
>> >
> [...]
>>
>> I have not examined the WoT code. ?However, the Advogato metric has
>> two attributes that I don't think the current WoT method has: no
>> negative trust behavior (if there is a trust rating Bob can assign to
>> Carol such that Alice will trust Carol less than if Bob had not
>> assigned a rating, that's a negative trust behavior), and a
>> mathematical proof as to the upper limit on the quantity of spammer
>> nodes that get trusted.
>>
>> The Advogato metric is *specifically* designed to handle the case of
>> the attacker creating millions of accounts. ?In that case, his success
>> is bounded (linear with modest constant) by the number of confused
>> nodes -- that is, legitimate nodes that have (incorrectly) marked his
>> accounts as legitimate. ?If you look at the flow computation, it
>> follows that for nodes for which the computed trust value is zero, you
>> don't have to bother downloading their trust lists, so the number of
>> such lists you download is similarly well controlled.
>
> I have read your messages again and all your new messages and you are so
> convinced about advogato that I'd like to ask you more questions about how it
> would work, I don't want you to feel like everyone is ignoring you :)
> (- I am more of a programmer right now than a designer of algorithms, I
> concentrate on spending most available time on *implementing* WoT/FT because
> nobody else is doing it and it needs to get done... so I have not talked much
> in this discussion)

Well...  to be fair, I'm not actually completely certain it will work.
 I do, however, think that it has a lot of potential.  I don't know
any way to get the answer short of running the experiment, and I'm
very optimistic about the results.  I firmly expect them to be good,
but not perfect.

Your questions are certainly welcome :)

>
> Consider the following case, using advogato and not the current FMS/WoT
> alchemy:
>
> 1. Identity X is an occasional and trustworthy poster. X has received many
> positive trust values from hundreds of identities because it has posted
> hundreds of messages over the months, so it has a high score and capacity to
> give trust values, and all newbies will know about the identity and it's high
> score because it is well-integrated into the trust graph.

Careful: Advogato doesn't assign trust "scores" in the same sense that
FMS and WoT do.

Because X is trusted by many identities, many identities can reach it,
and therefore accept it.  That is a purely binary consideration -- it
does not matter directly that it is reachable by many paths.  Because
many identities link to X, X is only a short distance away from many
identities.  When A calculates his trust graph, X is likely to be
nearby.  However, even if X is poorly connected, this will be true for
some identities; the connectivity changes how likely it is.  Capacity
of a node is determined (in the base algorithm; there are tweaks worth
considering) only by distance, nothing else.  Whether that capacity
actually limits anything or not depends on a variety of factors.  If
there aren't enough downstream nodes, then it isn't needed.  If the
upstream nodes spend their capacity elsewhere, there might not be
enough available to fill it -- here is the other place that X being
well connected matters.

>
> 2. Now a spammer gets a single identity Y onto the trust list of X by solving
> a captcha, his score is very low because he has only solved a captcha but the
> score is there. Therefore, any newbie will see Y because X is well-integrated
> into the WoT

Correct.

>
> 3. X is gone for quite some time due to inactivity, during that time Y creates
> 500 spam identities on his trust list and starts to spam all boards. X will
> not remove Y from his trust list because he is *away* for weeks.

Several points.  First, one of the optimizations worth considering is
tightly limiting the capacity of any identity that only has captcha
level trust.  This means that newbies have to solve captchas from
identities that have received manual trust, which is easy enough to
determine.  It also means that though our spammer lists 500 fake ids,
other people will only accept a very small number of them -- possibly
as low as zero, if the captcha trust only nodes are limited to
capacity 1.  So most of those ids are worthless, and spam is
contained.

This is one of the weaknesses of the simplest implementation (no
limits on captcha-only ids, that is use the algorithm exactly as
described in the paper).  If one of the confused nodes (X) is close to
the root, then while the spam problem is still linearly bounded, the
coefficient is large.  (The precise bound is sum(C_i - 1) for all
confused nodes, where C_i is the capacity of the confused node.)

>
> 4. Newbies will see the 500 spam identities and their spam because everyone
> trusts X, and X trusts Y. Newbies will NOT know how to block anything because
> they are newbies.

Most people understand the spam button in their email clients.  I
suspect our newbies will as well.  They'll mark several spam messages,
one per fake id.  If Y only has captcha level trust, there won't even
be very many of them.  If Y has manual trust, it's more complicated.
Toad and I both had some thoughts on this problem in other emails.  To
summarize them:

- After marking several of Y's child identities as spammers, the
plugin could point out the source of their trust and suggest marking Y
as a spammer.  This may be nontrivial in the case of a spammer
building a complex web downstream of Y instead of spamming directly
with Y's children.  It's also nontrivial if Y appears to be a valid
poster himself.  This is a reason to distinguish message trust and
trust list trust.  (In this context, the algorithm effectively runs
solely on trust list trust.  Message trust would create a link that
permitted flow, but did not allow that flow to proceed past the
identity in question.)

- Ultimatums help with this, but have their own downsides.  With or
without ultimatums, the question of what to do about X's trust list is
tricky.  Among other things, it may be worth ignoring trust lists from
inactive identities, or lowering the threshold on ignoring them.

- Marking someone a spammer could be a published thing, and when
others calculate flow it prevents a flow path from reaching that
identity, even through intermediates.  So if our newbie A trusts B
trusts X, who is out of town and trusting Y, then the first thing that
happens is B marks Y as a spammer.  Then, when A is calculating trust,
flow goes from A to B to X, but then does not reach Y.  This limits
the flow to Y by reducing the upstream capacity available.  However,
it probably doesn't have a very profound effect until there is a cut
set of such B's that all mark Y (or Y's chldren) as spammers.  (I have
reservations about this; see my earlier mail for more details.)

>
> 5. Now the *core* task of the WoT is in question: How can we as the community
> make the spam-identities introduced by Y disappear with advogato trust
> metrics, without negative trust??
>
> - As you've said, we cannot take away the trust which Y receives from X
> because that is THE attribute of non-negative-trust-metrics.
> - Further, we cannot cause EVERYONE who has trusted X to remove the trust
> value because X is in way too many trust lists of idle people, etc.
> - So what can we do with advogato, if we are the community and want to mark Y
> as the root of evil?

That is fundamentally a hard problem.
- Advogato is not perfect.  I am certain there will be some amount of
spam getting through; hopefully it will be a small amount.
- With Advogato, the amount of spam possible is well defined.  With
FMS and WoT it is not.  Neither of them have an upper bound on the
amount of spam.
- Being too good at solving the spam problem means we are too good at
mob censorship.  Both are problems.  In practice, the goal should be
to strike an appropriate balance between the two, not simply to
eliminate spam.
- I believe that Advogato is capable of limiting spam to levels where
the system is usable, even in the case of reasonably determined
spammers.  If the most they can aspire to is being a nuisance, I don't
think the spammers will be as interested.  If spamming takes work and
doesn't do all that much, they'll give up.  The actual amount of spam
seen in practice should be well below the worst possible case -- if
and only if the worst case isn't catastrophic.

Evan Daniel

Reply via email to