[freenet-devl] Redundant SplitFile Insertion / Retrieval for fproxy.

toad Tue, 6 Nov 2001 00:17:10 +0000

On Tue, Nov 06, 2001 at 12:01:43AM +0000, toad wrote:
> On Mon, Nov 05, 2001 at 07:01:36PM -0500, Scott Young wrote:
> > On Monday 05 November 2001 05:28 pm, you wrote:
> > > On Mon, 05 Nov 2001, Scott Young wrote:
> > > > On Monday 05 November 2001 03:42 pm, you wrote:
> > >
> > > <SNIP>
> > >
> > > > Full blocks might not always be the best way to insert data.  Let's say
> > > > you have file B which is an updated version of A.  B might have its data
> > > > shifted slightly, so its blocks might not align, creating unnecessary
> > > > redundency if inserted into freenet even with the same block size.  Upon
> > > > inserting B, what if FProxy could use the checksums on the segments of 
> > > > A,
> > > > then calculate the checksums at all offsets in B, and create the
> > > > splitfile index of B using the blocks on A that match B, and just insert
> > > > blocks that fill in the changes and the missing data?  This would
> > > > increase reliability of the overlapping blocks, and reduce the total 
> > > > file
> > > > size on freenet.  The only problem with this is that it would make some
> > > > blocks smaller than others, which would make FEC more complicated.  
> > > > Maybe
> > > > an FEC algorithm that can work well with variable-size blocks can be
> > > > used, and maybe it could have some abilities for use with overlapping
> > > > blocks.  Or for simplicity with FEC, the partial blocks could be padded
> > > > and/or combined with other partial blocks and somehow be specially
> > > > indexed in the splitfiles' indexes.
> > >
> > > <SNIP>
> > >
> > > I think this idea is just wack.  It's just not going to happen.
> > >
> > > Thelema
> > 
> > What's wack about it?  I came up with a solution to the partial block 
> > problem 
> > that this has, and otherwise it uses splitfiles in a very effecient way.  
> > What if someone inserts a 600 MB ISO of Linux Mandrake 8.0, then wants to 
> > insert version 8.1?  If 80% of the segments of the first ISO match up to 
> > the 
> > second one, then a node could store BOTH of the 600 MB ISOs with less than 
> > 700MB of storage (not including FEC, which even at 50% would bring it down 
> > to 
> > a total of 1.3 GB instead of 1.8 GB).  If the 8.0 ISO was well-founded in 
> > freenet, then the 8.1 ISO would use this, and requests for it would also 
> > improve the overlapping blocks on the 8.0 ISO.  If this is implemented in a 
> > manner that allows use of a list of splitfiles, then a program in an ISO of 
> > Debian could use the same blocks blocks for the program as an ISO of 
> > Mandrake 
> > and the same blocks as the program itself listed seperately.  The 
> > overlapping 
> > blocks could be obtained, if unavaliable, by using the redundant blocks of 
> > any of the sharing files to reconstruct those files and then caluclate the 
> > blocks; their redundency may be shared.  The biggest problem is in calling 
> > the checksum at every offset.  This could be alleviated by including 
> > rolling 
> > checksums, such as the one used in rsync.  The rolling checksum could also 
> > be 
> > used to quickly validate the contets of the downloaded file so the client 
> > can 
> > tell if it needs to download some redundant blocks.
> > 
> > I got this idea when I was looking at the rsync algorithm and wondering how 
> > that type of functionality could benefit Freenet.  The rsync algorithm 
> > itself 
> > would only allow someone to, for example, synchronize their Mandrake 8.0 
> > image with the newer 8.1 image and only have to transfer a little more than 
> > the changed blocks.  This functionality could also be used in freenet 
> > (although not as effeciently because typical rsync block sizes range from 
> > 300 
> > bytes to 8kb and freenet now ranges from 256kb to 1mb).  Actually, if the 
> > ideas above are implemented and the user has previously downloaded the 8.0 
> > image, the overlapping blocks would already have been downloaded.  If it 
> > was 
> > downloaded off the web, then it could still do it in the classic rsync-like 
> > way (download the checksums, see what's the same, and download the 
> > differences).
> > 
> > Also, next time try and point out why you think ideas won't work.  It makes 
> > your argument stronger and allows the ideas to improve (especially if they 
> > had some significant bad points in the first place).
> Because you'd have to compute the checksum of every possible range. Sure,
> rolling checksums help greatly, but you still end up with O(n^2) execution.
Also, if these are ISO images, you could insert the splitfile with a 2 kb
granularity? Of course, that means 600*500 = 300,000 files :) More realistic
would be to insert the individual files as CHKs... since files will normally
be packed on a CD-ROM, you can use a custom insert client that inserts like
this:
CHK 1   per-CD header info, directory, etc
    2   FILE1.DEB
    3   short buffer to get to the next file
    4   FILE2.DEB
    5   another short buffer
    6   FILE3.DEB


Etc. The individual DEB packages would match up with anywhere else they have
been inserted (for example as part of apt-get-over-freenet), giving better
efficiency, and would also match up with later/earlier/other distributions'
images. And it would all work with current clients (which allow variable part
sizes), with non-redundant splitting. The buffers would all be less than 2k, so
unlikely to go missing. One problem remains: if the packages are large, they
may require redundant splitfile insertion themselves. So, can we have a
non-redundant splitfile containing segments which are redundant splitfiles? Is
this a good idea?
> > 
> > 
> > -Scott Young
> > 

_______________________________________________
Devl mailing list
Devl at freenetproject.org
http://lists.freenetproject.org/mailman/listinfo/devl

[freenet-devl] Redundant SplitFile Insertion / Retrieval for fproxy.

Reply via email to