[Freenet developers - if you'd like us to move this discussion off-list, just
say the word]
I've been thinking about the implementation of a music-sharing client. Here
are some thoughts.
----------------------------------------------------------------------------
Songs are encrypted with a random RC4 key, then hashed. The hash is used as a
key to insert the file into freenet (CHK). The hash and the RC4 key are
concatenated to form the "read key". You need this to retrieve the file and
decrypt it. The CHK is not enough to decrypt the file, so the nodes storing
and handling the file *cannot* know what they're handling.
Note that encrypting the file before hashing stops the redundancy-removing
property of CHKs from working; multiple identical files can exist under
different CHKs. This wastes space, but avoids this possible attack:
You get a copy of the file you wish to censor and hash it to get its
CHK. Now you request a document with a *very similar* CHK. This will
probably be stored on the same node as the file you're looking for,
but because you don't request the file itself, you don't spread it
around the network. You make a note of the node which is reported to
be providing the file you requested. (This may not be the node which
is actually providing it.) Repeat with several CHKs similar to the
CHK of the document you're trying to censor. Because freenet adjusts
its topology to make direct connections between nodes which share a
lot of traffic, you will sooner or later be able to work out which
node is really providing the documents you're requesting. Then you
attack that node by out-of-band means (ping -f or court order).
Encrypting the file before hashing it makes it possible to store multiple
copies, so if one is censored you can easily insert another (and it will be
stored on a different node). It also makes it impossible to guess which CHK
the file is stored under, but that's not much of an advantage because you
have to reveal the CHK at some point so the users can download the file. :)
PROBLEM 1: MULTIPLE ENCODINGS, ONE NAME
The first problem is that we need to be able to retrieve files by giving
*only* the song title and artist's name. The suggestion of requiring a
version number / encoding number to be provided doesn't work - the user would
have to guess the version number. If there are few enough version numbers per
song to make them guessable, they can all be squatted. If there are enough to
make squatting them all impractical, guessing is also impractical. If the
user has to get the version number by out-of-band means, he might as well
just get the CHK of the file and we can forget about names altogether.
So to prevent key squatting and allow searching by name, it needs to be
possible to store any number of files and use a single string to retrieve
them. The users can then decide what's a valid encoding and what's noise (or
misnamed).
This means we need a directory for each artist+song string, stored under the
string's hash (KHK). The directory contains the CHKs of the actual files
(stored separately). A fixed array of CHKs is not enough - a malicious user
could quickly squat all 256 keys for a given artist+song string, for example.
What we need is a dynamic list of CHKs accessed via a single KHK.
It must be possible to add entries to a directory when new encodings of a song
are stored. It must be possible to get the list of entries. Optionally, it
should be possible for the node holding the directory to perform housekeeping
tasks such as removing redundant entries.
PROBLEM 2: DIRECTORIES ARE A SINGLE POINT OF FAILURE
The directory is a single point of failure for the string it represents. If
the node holding the directory is malicious, it can prevent access to all
encodings of the song. This is somewhat mitigated by the fact that the node
doesn't know which song it is preventing access to, because it only knows the
hash of the artist+title string. Nevertheless, a malicious node could prevent
access to some (random) song no matter how many times it was stored, causing
great annoyance to the users. Also, directories will be vulnerable to the
close key attack outlined above. I don't see how this can be avoided - there
must be a single directory for each song, so there is a single point of
failure. This is the biggest problem with the scheme that I can see.
PROBLEM 3: WHEAT AND CHAFF
Let's assume that those who oppose the free movement of information aren't
stupid. Realising that freenet can't be shut down by a court order, sooner or
later they will use technological means to try and close it down or make it
unusable. Obvious attacks include running malicious nodes, running malicious
clients, discovering nodes and attacking them by out-of-band means, and (to
prevent music sharing) submitting dummy encodings of songs which either squat
keys or waste users' bandwidth.
The first three problems have to be dealt with by freenet's design. The
problem of dummy encodings has to be dealt with at the user level - only the
users can separate the wheat from the chaff.
Freenet is designed so that files which are requested a lot spread around the
network; files which are never requested eventually disappear. To exploit this
mechanism, we need to be able to check the quality of an encoding without
downloading it. We need a way for users who have previously downloaded the
encoding to tell us whether it's worth downloading.
My solution is Slashdot-style moderation. This style of moderation is fairly
robust - at least, it does not allow vote-stuffing. You can only moderate
when you are given moderation points, which happens randomly and
infrequently. For Freenet it would work like this:
Moderating a song:
Each time you downloaded a song (by getting a CHK from a directory),
the node storing the song would, with a small probability, hand you
a moderation token (a random number).
Your client would remember that you had been given a moderation token
for that file. Next time you connected to freenet (to give you time
to listen to the song), it would ask you to moderate the file: either
+1 for a good encoding, -1 for a bad one, or 0 for don't know / don't
care. You would also get a text box to enter a short comment on the
song.
Your score, your comment and the file's CHK would be encrypted with
the file's RC4 key (you got that from the directory, remember?) and
sent back to the node which supplied the file (addressed using the
file's CHK, so you don't need a direct connection to the node), with
a plaintext version of the moderation token attached to allow the
node to check that it really asked for your opinion.
The text comment prevents known plaintext attacks and also gives the
users a warm fuzzy feeling of community. :)
The node which supplied the file doesn't know the RC4 key, so it
can't find out how you moderated the file and it can't change your
decision. This prevents malicious nodes from, for example, reversing
all moderation done to a file, or applying moderation decisions for
one file to a different file. The worst the node can do is discard
your decision, leaving the file unmoderated (in which case it won't
get requested often, and another node's copy of the song will be
downloaded instead).
Looking up a moderated song:
When you look up a song's directory, you get the CHKs for a number
of encodings. Instead of requesting one of them straight away, you
send a message to the node holding one of the files, asking for the
moderation results for that file (this message is addressed using the
file's CHK). The node returns a stream of encrypted moderation
comments which it can't read. But you can, since you got the RC4 key
from the directory. You can verify that the comments apply to that
file because they contain the file's CHK. Your client totals up the
file's score, shows you the users' comments, and asks you if you want
to download the file. If you don't want to, your client gets another
CHK from the directory and you repeat the process until you find a
good encoding (or decide from the comments that the song sucks, and
give up).
PROBLEM 4: WE'RE NOT IN FREENET ANY MORE
If this is supposed to be a quick hack to keep Napster fans happy, it won't
work. My design requires the following extensions to nodes:
They must be able to route messages to the node storing a given CHK.
This possibly opens up the network to DoS attacks (?). This is
required for moderation.
They must be able to route messages to the node storing a given KHK as
well, to allow entries to be added to directories. Again, DoS.
They must understand the message "add this read key to the directory
with this KHK".
They should also perform some directory management tasks during idle
moments:
* Check that a new entry really exists by requesting the file
it points to.
* Retrieve two files, decrypt them and compare them. If they
are the same, remove one of the directory entries.
-----------------------------------------------------------------------------
Any thoughts?
Michael
_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev