On Saturday 15 November 2003 02:52 pm, Zlatin Balevsky wrote: > >>This idea may be rather unpopular, but too bad... :)) > >> > >>-------- > >>Rateless codes are like FEC except that they support infinite number of > >>checkblocks, and the file becomes a stream. Simple implementation is > >>just to wrap the checkblocks from fec in a repeating sequence and insert > >>them in a channel like frost boards, etc. More sophisticated > >>implementations don't exist afaik, the theory is at > >> > >>http://www.scs.cs.nyu.edu/~petar/oncodes.pdf > >>http://www.scs.cs.nyu.edu/~petar/msdtalk.pdf > >> > >>and if you have postscript, > >>http://www.scs.cs.nyu.edu/~petar/msd.ps > >> > >>------------ > >> > >>I believe this will provide a better performance than fec for large > >>files than the current FEC algorithm as it will ensure that as long as > >>someone is inserting the file stream it will be retrievable and > >>reconstructable. Of course mass adoption may have disastrous effects on > >>the network (but then again the same has been said for the polling > >>mechanism frost uses). It has different uses than the standard FEC, > >>most notably a large rare file that is not going to be very popular. > >>When the file-stream stops being requested the inserter will effectively > >>become a ddos-er, but freenet can handle that ;) It may be integrated > >>with the frost feedback mechanism (i.e. turn the source off when x > >>succesful downloads complete) > > > > Why is it better to insert an infinite number of different blocks than > > to insert the original N blocks? If they cease to be reachable, reinsert > > them with skipDS, or pull them through with skipDS, or pull them from > > another node with skipDS. I don't see the advantage. > > 1. reachability can be measured only from those nodes to which the > inserter has fcp access; in a properly functioning routing the blocks > should be reachable from the inserting node most of the time even with > skipDS. > 2. pulling them after some have ceased to be reachable will only > refresh them on the path from the inserting node(s) to wherever the > blocks were stored. > > This mechanism is intended for large files that are not likely to be > popular and ensures that a node which is very distant topologically > reconstructs the entire file. As long as the network is not totally > disjoint, any node will be able to get the entire file exactly because > the number of chuncks is infinite and they're likely to get routed > evenly across the network. > > If a finite number of chuncks is inserted, and then re-inserted with the > same CHKs, then if the requesting node cannot reach some of the > necessary chuncks due to crappy rt it won't be able to reach them ever > unless the topology of the nodes around the inserter changes (but with > good routing we expect the same key to go to similar place no matter how > dynamic the network is) > > Having infinite number of chucks ends up being a broadcast of sorts; > however freenet is designed to handle exactly this kind of ddos. As > long as the requesting node has a chance greater than 0 of getting a > single chunk from the file stream, its guaranteed to get the entire file > as long as it is being "seeded". (infinity * anything over 0 = infinity)
Perhaps I can diswade you. Consider the following. What if the entire network were built on this concept? ( For it to work the entire network really must be built on this concept otherwise the content that is not constantly being reinserted would fall off ) Say there is a maximum range for requests and inserts. Like MAXhtl. Then any file should be retrievable, even if the inserter and the requester are separated by 2xMaxHTL nodes. This means the largest workable network size is much larger. But at what cost? Lets assume that the network throughput is bounded by the total upstream capacity. This is true of most any network. If it uses 1 unit of bandwidth to move one chunk one hop, then the retrieval cost is less than MaxHTL units. But what is the insert cost? Well, we don't need to be inserting continuously as fast as we can, but we do need to be going as fast as we want others to be able to retrieve our file. Because there is a finite limit on network space, but not on storage time other data will fall out of the network at the exact same rate that we are putting ours in. So given that the blocks that are dropped are random, then at any given time only a roughly fixed number will be available regardless of what rate we select for the network to reinsert them at. So under a normal model with no redundancy. The maximum network length must be MaxHtl. OR log(TotalNodes/nodesWithData)/log(newConnectionsPerNode) == MaxHtl. If you are trying to overcome the size limitation by adding redundancy then: TotalNodes==nodesWithData * (newConnectionsPerNode^MaxHTL) As you can see it takes a HUGE redundancy to reduce the HTL even a little bit. In fact you have multiply the redundancy by newConnectionsPerNode for each hop you want to remove up to the limit of 1/2 of the original MaxHTL at a redundancy of 1. So obviously increasing redundancy does not provide a good way to increase scalability. If it is done universally it does not improve the reliability ether. Consider the following: If the network has a fixed capacity of 1000 units, and people want to keep 2000 units of content on it, only half of it will be retrievable. However if you up the redundancy on your file to 2x then there is a much better chance of retrieving it. However if this is the default then people are effectively trying to store 4000 units in that 1000 space. So overall their reliability remains the same. What about your scheme? Well you make each file take an infinitely large number of units. But only the most recent 1000 are stored. This means that in principal all data can be retrieved provided that you are willing to wait long enough. But how long is too long? Essentially we have already determined that downloading from MAXhtl nodes is the longest we are willing to wait. Suppose we Increment MaxHTL by one. That would mean the data could be reached from two hops further away. This would take 1*dataSize additional traffic while uploading and 1*dataSize additional traffic while downloading on the network as a whole. For your proposal in order for the data to be retrievable from two hops further, you would need to have uploaded newConnectionsPerNode^2 segments. This translates to: (newConnectionsPerNode^2 - 1)*dataSize*HTL additional traffic for an upload and no additional traffic for a download. So your solution does not provide for less bandwidth useage ether. What it does do is allow for a fallback. That is to say that if the data is not available right now you may still be able to get it if you are willing to wait a long time. However I know a better way to achieve this. Once premix routing is done, it would be possible for the original inserter to list a number of proxies that it keeps connections open to in the manifest of the file. This way if the data was not available on the netwok, you could request it from the provider. This is similar to your solution, in that it requires the provider to be there to get the data, but the difference is that it would only be uploaded as needed, and even then as little as possible. (Because the requesting node would then reinsert those blocks as healing blocks) This would then not disturb the other content on the network, by continuously overwriting it with new data. > >>As soon as I hear back from the author of the paper I will start > >>implementing a client which does that. Forward all hatemail to > >>/dev/null :-pp > > > > Hehe, we haven't got to that stage yet, first we have to argue about > > it, then it degenerates into hatemail. :) > > and what happens after its implemented and totally screws everybody's > freesite browsing experience? death threats ? ;)))) _______________________________________________ Devl mailing list [EMAIL PROTECTED] http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl