On Tue, Nov 06, 2001 at 12:01:43AM +0000, toad wrote:
> On Mon, Nov 05, 2001 at 07:01:36PM -0500, Scott Young wrote:
> > On Monday 05 November 2001 05:28 pm, you wrote:
> > > On Mon, 05 Nov 2001, Scott Young wrote:
> > > > On Monday 05 November 2001 03:42 pm, you wrote:
> > >
> > > <SNIP>
> > >
> > > > Full blocks might not always be the best way to insert data. Let's say
> > > > you have file B which is an updated version of A. B might have its data
> > > > shifted slightly, so its blocks might not align, creating unnecessary
> > > > redundency if inserted into freenet even with the same block size. Upon
> > > > inserting B, what if FProxy could use the checksums on the segments of
> > > > A,
> > > > then calculate the checksums at all offsets in B, and create the
> > > > splitfile index of B using the blocks on A that match B, and just insert
> > > > blocks that fill in the changes and the missing data? This would
> > > > increase reliability of the overlapping blocks, and reduce the total
> > > > file
> > > > size on freenet. The only problem with this is that it would make some
> > > > blocks smaller than others, which would make FEC more complicated.
> > > > Maybe
> > > > an FEC algorithm that can work well with variable-size blocks can be
> > > > used, and maybe it could have some abilities for use with overlapping
> > > > blocks. Or for simplicity with FEC, the partial blocks could be padded
> > > > and/or combined with other partial blocks and somehow be specially
> > > > indexed in the splitfiles' indexes.
> > >
> > > <SNIP>
> > >
> > > I think this idea is just wack. It's just not going to happen.
> > >
> > > Thelema
> >
> > What's wack about it? I came up with a solution to the partial block
> > problem
> > that this has, and otherwise it uses splitfiles in a very effecient way.
> > What if someone inserts a 600 MB ISO of Linux Mandrake 8.0, then wants to
> > insert version 8.1? If 80% of the segments of the first ISO match up to
> > the
> > second one, then a node could store BOTH of the 600 MB ISOs with less than
> > 700MB of storage (not including FEC, which even at 50% would bring it down
> > to
> > a total of 1.3 GB instead of 1.8 GB). If the 8.0 ISO was well-founded in
> > freenet, then the 8.1 ISO would use this, and requests for it would also
> > improve the overlapping blocks on the 8.0 ISO. If this is implemented in a
> > manner that allows use of a list of splitfiles, then a program in an ISO of
> > Debian could use the same blocks blocks for the program as an ISO of
> > Mandrake
> > and the same blocks as the program itself listed seperately. The
> > overlapping
> > blocks could be obtained, if unavaliable, by using the redundant blocks of
> > any of the sharing files to reconstruct those files and then caluclate the
> > blocks; their redundency may be shared. The biggest problem is in calling
> > the checksum at every offset. This could be alleviated by including
> > rolling
> > checksums, such as the one used in rsync. The rolling checksum could also
> > be
> > used to quickly validate the contets of the downloaded file so the client
> > can
> > tell if it needs to download some redundant blocks.
> >
> > I got this idea when I was looking at the rsync algorithm and wondering how
> > that type of functionality could benefit Freenet. The rsync algorithm
> > itself
> > would only allow someone to, for example, synchronize their Mandrake 8.0
> > image with the newer 8.1 image and only have to transfer a little more than
> > the changed blocks. This functionality could also be used in freenet
> > (although not as effeciently because typical rsync block sizes range from
> > 300
> > bytes to 8kb and freenet now ranges from 256kb to 1mb). Actually, if the
> > ideas above are implemented and the user has previously downloaded the 8.0
> > image, the overlapping blocks would already have been downloaded. If it
> > was
> > downloaded off the web, then it could still do it in the classic rsync-like
> > way (download the checksums, see what's the same, and download the
> > differences).
> >
> > Also, next time try and point out why you think ideas won't work. It makes
> > your argument stronger and allows the ideas to improve (especially if they
> > had some significant bad points in the first place).
> Because you'd have to compute the checksum of every possible range. Sure,
> rolling checksums help greatly, but you still end up with O(n^2) execution.
Also, if these are ISO images, you could insert the splitfile with a 2 kb
granularity? Of course, that means 600*500 = 300,000 files :) More realistic
would be to insert the individual files as CHKs... since files will normally
be packed on a CD-ROM, you can use a custom insert client that inserts like
this:
CHK 1 per-CD header info, directory, etc
2 FILE1.DEB
3 short buffer to get to the next file
4 FILE2.DEB
5 another short buffer
6 FILE3.DEB
Etc. The individual DEB packages would match up with anywhere else they have
been inserted (for example as part of apt-get-over-freenet), giving better
efficiency, and would also match up with later/earlier/other distributions'
images. And it would all work with current clients (which allow variable part
sizes), with non-redundant splitting. The buffers would all be less than 2k, so
unlikely to go missing. One problem remains: if the packages are large, they
may require redundant splitfile insertion themselves. So, can we have a
non-redundant splitfile containing segments which are redundant splitfiles? Is
this a good idea?
> >
> >
> > -Scott Young
> >
_______________________________________________
Devl mailing list
Devl at freenetproject.org
http://lists.freenetproject.org/mailman/listinfo/devl