[freenet-devl] Redundant SplitFile Insertion / Retrieval for fproxy.

Scott Young Mon, 5 Nov 2001 21:43:05 -0500

On Monday 05 November 2001 08:31 pm, you wrote:

> you don't really need rolling checksums; you're going to have to have
> access to both files locally in order to do this.  The checksum that you
> need to match isn't a rolling checksum but the SHA-1 hask used for CHKs.
<snip>
> I agree rsync is an awesome program.  I don't see any use for the
> concepts behind it in a freenet context, as rsync is for when you don't
> have both copies of the file, and there's a rsync process running on
> each copy of the file.  In freenet, you can't have a rsync process
> running on the copy that's in freenet, so you have to have it locally,
> and then you can use more traditional techniques to find similarities.
<snip>
> Rolling checksums aren't useful because you need to match CHKs, and
> there's no way we're dumbing down CHKs to the level needed to do a
> rolling checksum.  The only way to take advantage of existing data in
> freenet would be to calculate the CHK (decently expensive) of every
> section of the file that's the target size long (offset 0, offset 1,
> offset 2, etc.)  This is reasonably non-feasable for 600MB files.


The rolling checksums are to make the every-offset demand resonable.  Rolling 
checksum files would probably be refrenced by header information in the 
splitfile index.  The CHK would be used to verify the rolling checksum since 
rolling checksums have a significant probability of collisions.  This would 
also allow for rsync over freenet (except SHA checksums would be used instead 
of MD4 and a few other differences).

> > Also, if these are ISO images, you could insert the splitfile with a 2 kb
> > granularity? Of course, that means 600*500 = 300,000 files :) More
> > realistic would be to insert the individual files as CHKs... since files
> > will normally be packed on a CD-ROM, you can use a custom insert client
> > that inserts like this:
> > CHK 1       per-CD header info, directory, etc
> >     2       FILE1.DEB
> >     3       short buffer to get to the next file
> >     4       FILE2.DEB
> >     5       another short buffer
> >     6       FILE3.DEB
>
> 300,000 files is no good.  even with redundancy, freenet would take
> forever to download all those.  The average request (excluding xfer
> time) takes 10 seconds to complete.  Even if you saturated all 50
> connections your node could handle, it'd still take 60000 seconds to
> download all 300,000 pieces = about 16 hours.

2kb does sound a little small for such a large file.  One problem I can see 
with linking in this way is that the ISO might be inserted before the DEB, 
and the DEB should be able to use the useful blocks on the ISO.  Header 
information may be different in the DEB files than wanted in the ISO too, so 
grafting data in this way might not work at all.

> > Etc. The individual DEB packages would match up with anywhere else they
> > have been inserted (for example as part of apt-get-over-freenet), giving
> > better efficiency, and would also match up with later/earlier/other
> > distributions' images. And it would all work with current clients (which
> > allow variable part sizes), with non-redundant splitting. The buffers
> > would all be less than 2k, so unlikely to go missing. One problem
> > remains: if the packages are large, they may require redundant splitfile
> > insertion themselves. So, can we have a non-redundant splitfile
> > containing segments which are redundant splitfiles? Is this a good idea?
>
> I don't agree that the small pieces are unlikely to go missing, but
> anyway...
> It's perfectly fine to have a non-redundant splitfile containing
> segments which are redundant splitfiles.  It'd probably be better to
> insert all the .deb files and then have a recipie file that contains all
> the "short buffers" and enough other info to piece together the original
> ISO given all the .debs.

All the .debs would have to be inserted first.  The benefit of my idea is 
that it would somewhat implicitly use already-existing blocks, no matter what 
is inserted first.  It also ensures that the segments would assemble 
correctly, without having to write an extremely complex recipe file.  Another 
example of how my idea would work is with a large Divx file.  Lets say 
someone inserts a 500 MB movie.  Someone else, downloads the movie and wants 
to write a commentary on a segment of the movie, cut that segment out of the 
movie, and insert it into freenet.  The header and footer information of the 
movie segment would not work out perfectly, but the similar data could align 
its blocks to use the ones of the original file.

>
> Thelema


-Scott Young

_______________________________________________
Devl mailing list
Devl at freenetproject.org
http://lists.freenetproject.org/mailman/listinfo/devl

[freenet-devl] Redundant SplitFile Insertion / Retrieval for fproxy.

Reply via email to