[freenet-dev] Learning from DNFs

Ben Hannigan Mon, 25 Nov 2002 03:16:10 -0800

I have a series of proposals on how to improve the networks usage of DNFs.

First though, I wrote, back in April of this year, a long rambling email on the
evils of the DNF Cache.  I'd like to take this opportunity to apologize for it's
incoherence, but I still believe the DNF Cache, as currently implemented, was a
poor design choice.  Given my distaste for the cache, I've given it a great deal
of thought, and had come up with a number of possible remedies.


When I saw this thread forming, I figured now was as good a time as any to throw
in my two cents, once again.  Hopefully some of the ideas I present here will
have some merit.

I should point out that my understanding of how the routing algorithm functions
is almost entirely conceptual, so my proposals may be completely off base,
and/or impossible, but if not please consider them to be in general terms,
rather than specific.

In these proposals I'm using the following situation, and understanding of how
the network behaves:
Node D requests Key Q from Node E
DATA is the data referenced by Q
R(D->E) D's routing reference for E
R(E->D) E's routing reference for D

What knowledge is gained due to various responses:
E returns DATA:
    D now knows: E has Q, D has Q
    E now knows: E has Q, D has Q
E returns DNF:
    D now knows: E does not have Q; D does not have Q
    E now knows: E does not have Q; D does not have Q; something thought D
SHOULD have Q
E returns RNF:
    D now knows: E does not have Q, but at some later point E may have Q; D does
not have Q; D should (possibly) try again somewhere else
    E now knows: E does not have Q, D does not CURRENTLY have Q, but D may yet
retrieve Q; something thought D SHOULD have Q


I'm going to start with proposal C, as Ian already used A & B.

Proposal C:
All of the "normal" responses from E should affect routing to some degree.

For example:

| E's response | Magnitude of changes |
|              | R(D->E) | R(E->D)    |
|--------------|---------|------------|
|   DATA       |  +1.0   |  +1.0      |
|   DNF        |  -1.0   |  +0.1      |
|   RNF        |  +0.0   |  +0.2      |

Note that I'm referring to the MAGNITUDE of changes to the routing table,
relative to the magnitude of a successful request.

This would necessitate removing the DNF Cache, as otherwise the false negatives
that it generates could inappropriately impact specialization knowledge.

What I can see as disadvantages:
1) If it would require a major rework of the routing algorithm, it's probably
not worth it, at least right now.

Advantages that I can see:
1) Since these adjustments happen internal to a node it doesn't have any minimum
build requirements.
2) Doesn't put any extra load on the network than that which would already be
present.
3) Allows the network to "learn" faster.



Proposal D:
Invert the DNF Cache, by sending the DATA for failed requests when/if it becomes
available.

Replace the DNF Cache code with code that does the following:

    An entry in the cache contains the key requested, and which node made the
request.
        For the originating node, the cache entry should contain the key, and
the "next best" node with regard to that key.
    The cache should be searchable on the key, and should allow for multiple
entries to the same key but with different requesting nodes.
    The number of entries in the cache is some configurable number, possibly
something like 1000, but a more dynamic number based on the average number of
requests going through the node might be a better idea.
    Entries are removed from a full list, by pulling the entry that worst
matches the nodes specialization.
    Entries are added whenever a DNF is returned by the node.

When new data arrives in a node, after the node does all the other processing of
the data, it then searches the cache for all matching entries.
For each matching entry the node sends the data to the referenced requesting
node, and then removes the cache entry

I'm not sure how to deal with the sending of the data, but my thought was
possibly by performing an Insert at 0 HTL, though I'm not sure all the needed
info is available to the node.  And also, whether the sending ends up succeeding
or failing doesn't matter, it just matters that it tried.

What I can see as disadvantages:
1) The cache might be CPU expensive.
2) If sending can't be accomplished with inserts, or some other pre-existing
method, it would need the minimum build to be raised.

Advantages that I can see:
1) If sending can be accomplished with inserts, or some other pre-existing
method, it doesn't have any minimum build requirements.
2) The only extra load on the network is "beneficial", i.e. the data is being
moved closer towards where it's wanted.
3) Flooding apps cease to be as burdensome on the network, e.g. a key which
frost has requested, but which hasn't actually been inserted yet, will probably
end up getting it's insert "boosted" a fair ways towards any requesting nodes,
as soon as it does get inserted, thus making the requests that happen subsequent
to the insert not have to penetrate as far to retrieve the data.




Also, please note that these proposals are potentially mutually applicable,
which might make things even better.

Well, I hope that even if these aren't completely workable, that they at least
awaken some other ways of thinking of this issue.

-------------------------------------------------------------------------------
PS:  Why I'm against the DNF Cache as it is:
It really boils down to this:  When a node returns DNF because of a cache entry,
it's lying.
The blocking cache implementation is based on the premise that if a node has not
been able to successfully retrieve a key, then it will fail again within some
time period (30 minutes, by default).  The problem with this premise is that due
to the constant changing of the routing tables, that's not necessarily true.  A
node can only say definitively that a piece of data is in the network if that
data is in its datastore;  A node can never say definitively that a piece of
data is NOT in the network, the closest it can come to it is to say that it
couldn't find it, which is really not the same thing at all.
If the Cache must remain, at least consider changing the response.  Returning
DNF says: "I couldn't find it."  If instead you return QueryReject of RNF, that,
in essence, says: "I'm not likely to find it, look somewhere else."  The only
problem I see with this, is, since I'm pretty sure that there are many, many
collisions against the cache, that would mean we'd see a whole lot of RNFs
coming back.

There's also the potential usage of the DNF Cache for censorship.
Hypothetical:
Imagine a series of very large nodes, which have been sitting on the network for
long enough to fill their datastores, and get real cozy with the rest of the
network.  There would need to be enough of these nodes that they would likely
become participants in many, if not most, or even all, request chains.
Now imagine if these nodes had some way to selectively return a DNF on a given
kind of data, say mp3s.  Since the data would be spread throughout the keyspace,
they wouldn't tend to lose their status as "good" nodes (and for data they
weren't blocking this would be very true), and thus they would be able to
continue their nefarious ways...

Practical:
Hardware: a bunch of BIG nodes (1T DS; Quad-P4s; lots of memory; on a T3
connection; and dedicated to fred)
Tweak the DNF Cache code so that some flag can be set that means not to delete
that entry.  This could be as simple as 1000 HTL.
Tweak the code so that the cache is checked before the local datastore.
Create an interface so that flagged keys can be fed to the cache.
Run a spider (say, I dunno, Cruft's spider?) which creates a list of keys which
match the given criteria, and then feeds those keys to the blocking interface.
Likewise they could create a frost reader that does the same with frost's files.

Alternately, the bad guys could simply pay a monkey $4 an hour to use freenet
and make a list of appropriate keys, but this would be susceptible to conversion
to the light side.

The only real question is how many nodes would have to be run.  I would guess
that 10-15 would probably get over half the requests, which would be pretty
damaging.

Admittedly this censorship issue is more fundamentally a problem with how the
nodes handle DNFs in general, the DNF Cache just provides a good chunk of the
work ready made for the censors.



_______________________________________________
devl mailing list
devl at freenetproject.org
http://hawk.freenetproject.org/cgi-bin/mailman/listinfo/devl

[freenet-dev] Learning from DNFs

Reply via email to