I hope you guys don't mind my posting on this group. But I've being following this project for about a month now.
>From what I understand, the nodes are to keep the most frequently requested documents on a local cache. So I was wondering, why is the content duplicated so many times consuming valuable storage. Why not work with a token database instead. Basically every time there is a new non-space token (word) simply store the new token and generate a unique binary key for that token, that way instead of moving text messages across the network, all you send is a series of binary keys that can be used to reconstruct the original message. It's a kind of simple compression mechanism, allowing optimization of storage space as well as network throughput. Another advantage would be that, as the content on a node increases, the node's dictionary would increases as well making the addition of new documents a very small transaction, since most of the tokens of the new documents are already there. On the client side you can cache the tokens as well, minimizing the amount of data the client needs to retrieve in order to fetch documents. Benny Millares design at waxcom.com david at aminal.com wrote: > > On Fri, Jul 28, 2000 at 03:47:32PM +0300, Itamar Shtull-Trauring wrote: > > > By file splitting, we meant that there would be a mandatory chunk sizes > > > for files such as 16k, 32k, 64k, 128k, and 256k or perhaps higher. Files > > > would be padded so that they fit a given chunk size. Your proposal would > > > have some might have some routing problems too, I think. > > > > (I stole this from the Freehaven project.) > > > > Instead of splitting files into chunks, why not use Rabin's IDA? > > (http://www.acm.org/pubs/citations/journals/jacm/1989-36-2/p335-rabin/) > > > > Basically it lets you split a file of length L into n parts, where only m > > parts are needed to reconstruct the file. m < n, and the size of the parts > > is L/m. The benefits are higher reliability, since if some chunks are > > missing you can still reconstruct the file, and a harder time reconstructing > > the partial contents of a file as compared to getting bytes 0-K of a file, > > which can give you useful info. > > > > The problem is higher bandwith and storage usage, but that can be balanced > > with the benefits by changing with the n/m ration (and m == n is identical > > to regular file splitting in terms of storage.) > > > > As far as implementation goes this is a client-side issue only, so it > > doesn't really require any work on the server. > > > > Ever since the discussion of file splitting and raid levels, I've been > casually looking around for algorithms. This looks interesting, but > unfortunately I'm not a member of the ACM so I can't download the paper. > Perhaps some kind soul would upload it to Freenet, or make it available in > some other manner. > > > David Schutt > > _______________________________________________ > Freenet-dev mailing list > Freenet-dev at lists.sourceforge.net > http://lists.sourceforge.net/mailman/listinfo/freenet-dev _______________________________________________ Freenet-dev mailing list Freenet-dev at lists.sourceforge.net http://lists.sourceforge.net/mailman/listinfo/freenet-dev
