Re: [freenet-dev] Re: rateless codes

Tom Kaitchuck Wed, 19 Nov 2003 15:52:31 -0800

On Saturday 15 November 2003 02:52 pm, Zlatin Balevsky wrote:
> >>This idea may be rather unpopular, but too bad... :))
> >>
> >>--------
> >>Rateless codes are like FEC except that they support infinite number of
> >>checkblocks, and the file becomes a stream.  Simple implementation is
> >>just to wrap the checkblocks from fec in a repeating sequence and insert
> >>them in a channel like frost boards, etc.  More sophisticated
> >>implementations don't exist afaik, the theory is at
> >>
> >>http://www.scs.cs.nyu.edu/~petar/oncodes.pdf
> >>http://www.scs.cs.nyu.edu/~petar/msdtalk.pdf
> >>
> >>and if you have postscript,
> >>http://www.scs.cs.nyu.edu/~petar/msd.ps
> >>
> >>------------
> >>
> >>I believe this will provide a better performance than fec for large
> >>files than the current FEC algorithm as it will ensure that as long as
> >>someone is inserting the file stream it will be retrievable and
> >>reconstructable.  Of course mass adoption may have disastrous effects on
> >>the network (but then again the same has been said for the polling
> >>mechanism frost uses).  It has different uses than the standard FEC,
> >>most notably a large rare file that is not going to be very popular.
> >>When the file-stream stops being requested the inserter will effectively
> >>become a ddos-er, but freenet can handle that ;)  It may be integrated
> >>with the frost feedback mechanism (i.e. turn the source off when x
> >>succesful downloads complete)
> >
> > Why is it better to insert an infinite number of different blocks than
> > to insert the original N blocks? If they cease to be reachable, reinsert
> > them with skipDS, or pull them through with skipDS, or pull them from
> > another node with skipDS. I don't see the advantage.
>
> 1. reachability can be measured only from those nodes to which the
> inserter has fcp access; in a properly functioning routing the blocks
> should be reachable from the inserting node most of the time even with
> skipDS.
> 2.  pulling them after some have ceased to be reachable will only
> refresh them on the path from the inserting node(s) to wherever the
> blocks were stored.
>
> This mechanism is intended for large files that are not likely to be
> popular and ensures that a node which is very distant topologically
> reconstructs the entire file.  As long as the network is not totally
> disjoint, any node will be able to get the entire file exactly because
> the number of chuncks is infinite and they're likely to get routed
> evenly across the network.
>
> If a finite number of chuncks is inserted, and then re-inserted with the
> same CHKs, then if the requesting node cannot reach some of the
> necessary chuncks due to crappy rt it won't be able to reach them ever
> unless the topology of the nodes around the inserter changes (but with
> good routing we expect the same key to go to similar place no matter how
> dynamic the network is)
>
> Having infinite number of chucks ends up being a broadcast of sorts;
> however freenet is designed to handle exactly this kind of ddos.  As
> long as the requesting node has a chance greater than 0 of getting a
> single chunk from the file stream, its guaranteed to get the entire file
>   as long as it is being "seeded".  (infinity * anything over 0 = infinity)


Perhaps I can diswade you. Consider the following. What if the entire network 
were built on this concept? ( For it to work the entire network really must 
be built on this concept otherwise the content that is not constantly being 
reinserted would fall off )

Say there is a maximum range for requests and inserts. Like MAXhtl. Then any 
file should be retrievable, even if the inserter and the requester are 
separated by 2xMaxHTL nodes. This means the largest workable network size is 
much larger. But at what cost? 

Lets assume that the network throughput is bounded by the total upstream 
capacity. This is true of most any network. If it uses 1 unit of bandwidth to 
move one chunk one hop, then the retrieval cost is less than MaxHTL units. 
But what is the insert cost? Well, we don't need to be inserting continuously 
as fast as we can, but we do need to be going as fast as we want others to be 
able to retrieve our file.

Because there is a finite limit on network space, but not on storage time 
other data will fall out of the network at the exact same rate that we are 
putting ours in. So given that the blocks that are dropped are random, then 
at any given time only a roughly fixed number will be available regardless of 
what rate we select for the network to reinsert them at.

So under a normal model with no redundancy. The maximum network length must be 
MaxHtl. OR log(TotalNodes/nodesWithData)/log(newConnectionsPerNode) == 
MaxHtl. If you are trying to overcome the size limitation by adding 
redundancy then:
TotalNodes==nodesWithData * (newConnectionsPerNode^MaxHTL)
As you can see it takes a HUGE redundancy to reduce the HTL even a little bit. 
In fact you have multiply the redundancy by newConnectionsPerNode for each 
hop you want to remove up to the limit of 1/2 of the original MaxHTL at a 
redundancy of 1.

So obviously increasing redundancy does not provide a good way to increase 
scalability. If it is done universally it does not improve the reliability 
ether. Consider the following: If the network has a fixed capacity of 1000 
units, and people want to keep 2000 units of content on it, only half of it 
will be retrievable. However if you up the redundancy on your file to 2x then 
there is a much better chance of retrieving it. However if this is the 
default then people are effectively trying to store 4000 units in that 1000 
space. So overall their reliability remains the same. What about your scheme? 
Well you make each file take an infinitely large number of units. But only 
the most recent 1000 are stored. This means that in principal all data can be 
retrieved provided that you are willing to wait long enough. But how long is 
too long?

Essentially we have already determined that downloading from MAXhtl nodes is 
the longest we are willing to wait. Suppose we Increment MaxHTL by one. That 
would mean the data could be reached from two hops further away. This would 
take 1*dataSize additional traffic while uploading and 1*dataSize additional 
traffic while downloading on the network as a whole. For your proposal in 
order for the data to be retrievable from two hops further, you would need to 
have uploaded newConnectionsPerNode^2 segments. This translates to:
(newConnectionsPerNode^2 - 1)*dataSize*HTL additional traffic for an upload 
and no additional traffic for a download.

So your solution does not provide for less bandwidth useage ether. What it 
does do is allow for a fallback. That is to say that if the data is not 
available right now you may still be able to get it if you are willing to 
wait a long time. However I know a better way to achieve this. Once premix 
routing is done, it would be possible for the original inserter to list a 
number of proxies that it keeps connections open to in the manifest of the 
file. This way if the data was not available on the netwok, you could request 
it from the provider.

This is similar to your solution, in that it requires the provider to be there 
to get the data, but the difference is that it would only be uploaded as 
needed, and even then as little as possible. (Because the requesting node 
would then reinsert those blocks as healing blocks) This would then not 
disturb the other content on the network, by continuously overwriting it with 
new data. 

> >>As soon as I hear back from the author of the paper I will start
> >>implementing a client which does that.  Forward all hatemail to
> >>/dev/null :-pp
> >
> > Hehe, we haven't got to that stage yet, first we have to argue about
> > it, then it degenerates into hatemail. :)
>
> and what happens after its implemented and totally screws everybody's
> freesite browsing experience?  death threats ? ;))))

_______________________________________________
Devl mailing list
[EMAIL PROTECTED]
http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Re: rateless codes

Reply via email to