[Freenet-dev] Proposal: Information Dispersal Algorithm instead of simple file splitting

Benny Millares Mon, 31 Jul 2000 14:05:02 -0400

I hope you guys don't mind my posting on this group. But I've being following
this project for about a month now.


>From what I understand, the nodes are to keep the most frequently requested
documents on a local cache. So I was wondering, why is the content duplicated
so many times consuming valuable storage. Why not work with a token database
instead. Basically every time there is a new non-space token (word) simply store
the new token and generate a unique binary key for that token, that way instead
of moving text messages across the network, all you send is a series of binary
keys that can be used to reconstruct the original message. It's a kind of 
simple 
compression mechanism, allowing optimization of storage space as well as network
throughput. Another advantage would be that, as the content on a node increases,
the node's dictionary would increases as well making the addition of new
documents
a very small transaction, since most of the tokens of the new documents are
already
there.  On the client side you can cache the tokens as well, minimizing the
amount of
data the client needs to retrieve in order to fetch documents. 

Benny Millares
design at waxcom.com


david at aminal.com wrote:
> 
> On Fri, Jul 28, 2000 at 03:47:32PM +0300, Itamar Shtull-Trauring wrote:
> > > By file splitting, we meant that there would be a mandatory chunk sizes
> > > for files such as 16k, 32k, 64k, 128k, and 256k or perhaps higher.  Files
> > > would be padded so that they fit a given chunk size.  Your proposal would
> > > have some might have some routing problems too, I think.
> >
> > (I stole this from the Freehaven project.)
> >
> > Instead of splitting files into chunks, why not use Rabin's IDA?
> > (http://www.acm.org/pubs/citations/journals/jacm/1989-36-2/p335-rabin/)
> >
> > Basically it lets you split a file of length L into n parts, where only m
> > parts are needed to reconstruct the file.  m < n, and the size of the parts
> > is L/m.  The benefits are higher reliability, since if some chunks are
> > missing you can still reconstruct the file, and a harder time reconstructing
> > the partial contents of a file as compared to getting bytes 0-K of a file,
> > which can give you useful info.
> >
> > The problem is higher bandwith and storage usage, but that can be balanced
> > with the benefits by changing with the n/m ration (and m == n is identical
> > to regular file splitting in terms of storage.)
> >
> > As far as implementation goes this is a client-side issue only, so it
> > doesn't really require any work on the server.
> >
> 
> Ever since the discussion of file splitting and raid levels, I've been
> casually looking around for algorithms. This looks interesting, but
> unfortunately I'm not a member of the ACM so I can't download the paper.
> Perhaps some kind soul would upload it to Freenet, or make it available in
> some other manner.
> 
> 
> David Schutt
> 
> _______________________________________________
> Freenet-dev mailing list
> Freenet-dev at lists.sourceforge.net
> http://lists.sourceforge.net/mailman/listinfo/freenet-dev


_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev

[Freenet-dev] Proposal: Information Dispersal Algorithm instead of simple file splitting

Reply via email to