Re: [p2p-hackers] paper "rarest first and choke algorithms are enough"

Arnaud Legout Thu, 04 May 2006 02:06:35 -0700

Hi Bryan,

first thanks a lot for your comments.
I am answering them inline.


Bryan Turner wrote:

Arnaud,
Thanks for the new paper, there is a lot of nice data here. Thepresented graphs and analysis indeed suggest that the BitTorrentprotocol's choices achieve near-optimal data distribution in itsparadigm (single file, block-based content distribution). I agreewith your suggestions for fairness criteria, and my own suggestionsare similar (my suggestion is for a reputation-based system to modelleechers and free-riders).

the first point I want to make clear is that we place ourselves in thecontext of P2P file delivery in the current Internet.This context has one important implications: there is a good networkconnectivity and everybody can discuss

with everybody (2 NATed peers cannot communicate

with each other directly, but this is not a network issue, this is anend user issue)

You will see in the following why this is important.

That should be clear in the paper, if it was not tell me so that I canimprove the argumentation.

However, I disagree with several conclusions you make along theway. A brief overview:
- Byte-for-byte fairness is not appropriate

I still claim that in the context of P2P file transfer (see below)

- Network Coding is not justified

in our context. Network coding can be a good solution in the context ofad-hoc networks, or in any other network where connectivityis a problem, i.e., when local rarest first does not work optimally.Note, however, that network coding requires significant computingcapabilities that a sensor network or an ad hoc network based on basicterminals (e.g., mobile phones) cannot afford for now.


There are other scenarios where network coding can help.

But, I still claim that in the current Internet and on a representativeset of reat torrents we monitored, network coding would not improve

the performance significantly enough to be justified.

Fairness:
I am unconvinced by the arguments against byte-for-byte fairness.Your evidence to support this is that tit-for-tat does not take intoaccount the excess capacity of some torrents (ie: more seeders thanleechers). Yet this situation is equivalent under byte-for-byte andchoking rules. Leechers are upholding the byte-for-byte policy(seeders cannot, as you mention) and thus only leechers arerestricting their upload capacities to free-riders, and rightly so.Seeds (who hold the excess capacity) are distributing this capacityfairly to all peers. Thus, choking and byte-for-byte are equivalentin this regard - free riders receive a share of the bandwidth equal tothe fair distribution given by seeders in both cases.

I am not sure I understand your point.

Byte-for-byte (BFB) algorithm and choke algorithm (CA) are far frombeing equivalent.In all the studies that mention BFB I am aware of, they never mentionthe case of seeds. They simply say that peers must not receive more thanthey give.This is the definition of BFB. You can introduce a threshold, but itdoes not change the main idea and there is no proposed solution

to define a dynamic threshold.

It is not correct to say that peers must not receive more than theysend. As I wrote in the report, leechers have an asymmetric capacity mostof the time and seeds offer capacity for free. With BFB you cannotbenefit from this excess capacity. The only one solution is todefine a dynamic threshold, but this is an optimization problem that isnot solved, and I believe to be very hard to solve practically, for nobenefit

compared to CA.

Remember that the main (and single argument) in all the studies thatdiscuss BFB solutions is that free riders can get a highservice from a torrent. This problem in all the studies I am aware ofcomes from the old choke algorithm in seed state thatfavors fast downloaders. With the new choke algorithm in seed state allthe observed unfairness disappear (in the sense of the fairness

I define in the paper).

The more important case, when excess capacity is not available,free riders should be eliminated or diminished. Because leecherparticipation cannot be proven, the only other method of measurementis via equal exchange of services. Byte-for-byte is one suchmeasurement (reputation systems are another). In this scenario, freeriders will only receive a small credit from the leechers in the swarmbefore having to resort to hand-outs from the seeds. With the seedsgiving fair capacity to the swarm, the free riders will receive verylittle of the swarm capacity, while the participating leechers willreceive fair resources.

CA gives the same result from our measurements. It guarantees that apeer cannot receive more from a torrent than a peer that contributesmore than him.Here again, with BFB we will not use all the capacity of the torrentbecause it is based on a fairness criterion that is not correct in a P2Parchitecture.

Free riders always receive more resources under choking than underexchange measurement models.

and that is fine as long as they do not receive more than someone thatcontributes.I understand that you argue in favor of global efficiency. It is truethat if you do not give anything to free riders, you will havemore capacity for the ones that contributes. But it is not clear thatthey can make use of it, and it is not clear that giving a small service

to free riders impact the overall torrent.

I have shown a long time ago in the context of multicast that it ispossible to define a policy that gives more to multicast users without

significantly decreasing the performance of unicast ones.
Even if the context is different, I believe that the issue is equivalent.

Network Coding:
On the topic of Network Coding [1], I did not see any evidence inyour paper refuting the network coding literature. Please correct meif I missed it, but it appears your only rebuttal is in the poormodeling of BitTorrent, not in the application of network codingtechniques. From this I don't see how you conclude that networkcoding is not justified? I agree that most of the simulationliterature does not model BitTorrent correctly, but this doesn'tnegate the positive results of the network coding trials, only theircomparisons to BitTorrent.

That is not my point. We claim that we cannot gain a significantperformance improvement using network coding in the context we arestudying.As I said at the beginning of this email network coding is undoubtedlyuseful in a number of situations, not in our context.However, the context we are studying is very important and there aretons of persons believing that they can improve a P2P clientfor file sharing over the Internet using network coding techniques. Thisis not true and this is due to a misunderstanding of the

dynamics of BitTorrent that I hope the paper help to understand better.

In section 4-A, pp 6-7, you suggest that source or network codingmay not propose interesting pieces to peers during the startup phase.I counter that this is exactly what network coding solves - everypacket has an innovation rate based on the network flows from thelocal node to the source(s). Assuming a densely connected graph withfair sharing from the seeder, it is unlikely that any packet is notinnovative during the startup phase. Unless the local node'sbandwidth is many multiples of the seeder, a problem which is notspecific to network coding.

You did not get the point. consider a seed with a finite upload capacityC Bytes/s and a file to send of size S Bytes.Then the seed needs in the best case S/C seconds to send the entirecontent. Whatever coding technique you use you cannot do it faster.

We identified that when the performance of a torrent is not optimal,then it is in a transient phase. That means that the seed has not yetsent a copy of each piece. If you use network coding of whatever codingyou can imagine, we will not improve that fact that information is missingin the torrent and that is this missing information that causes thedecrease in performance.

Network Coding is superior to Rarest First and otherBitTorrent-like protocols for several reasons:1) The innovation rate is the min-cut of all flows from the local nodeto the source(s), which is likely to be much larger than BitTorrent'sdefault of 4 active connections.2) BitTorrent transfers pieces in blocks-at-a-time, but onlyadvertises in piece-at-a-time. This injects a significant delaybetween downloading of an innovative byte and sharing of that byte (onaverage, one half the download time of an entire piece). Networkcoding imposes no such delay.3) There is no first-blocks or last-blocks problem with networkcoding, they are entirely non-issues. Large portions of your paperare devoted to these issues which simply don't occur under network coding.4) The meta-network utilization is higher for network coding. Thatis, since BitTorrent cannot align to the underlying physicalconnections of the network elements, it is impossible to maximallyutilize them. Network coding (using an innovation rate trackingalgorithm) can rapidly align to the structure of the physical network,eliminating cross-core retransmissions. This saves a significantamount of ISP bandwidth and reduces the overall burden of supportingfilesharing services on the internet.

All you say is theory and is true, but the real difference betweennetwork coding and rarest first depends on the context.All I can tell you is that measurements show that in the context I amdiscussing there is not significant differences between network coding

and Rarest Fist.

Is the full implementation of Network Coding superior enough toBitTorrent that it offers compelling reasons for migration to it as asolution? Your paper argues that BitTorrent is "good enough", and Itend to agree.

I do not believe it makes sense to say that one solution is superior toanother one in all cases.In our specific context, the rarest first and choke algorithms areenough, but not superior with respect to performance. But they are muchsimpler.

In other contexts Network Coding will be superior.

The point of the paper is to show which kind of performance we canachieve using rarest first and choke algorithms, and to stoplegends that say that there is always last pieces problems problems orfairness issues. These legends, as we show with the notion of transientphase,are due to a misunderstanding of the very complex dynamics of P2Pprotocols.


Regards,
Arnaud.


_______________________________________________
p2p-hackers mailing list
p2p-hackers@zgp.org
http://zgp.org/mailman/listinfo/p2p-hackers
_______________________________________________
Here is a web page listing P2P Conferences:
http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences

Re: [p2p-hackers] paper "rarest first and choke algorithms are enough"

Reply via email to