Hi Bryan,
first thanks a lot for your comments.
I am answering them inline.
Bryan Turner wrote:
Arnaud,
Thanks for the new paper, there is a lot of nice data here. The
presented graphs and analysis indeed suggest that the BitTorrent
protocol's choices achieve near-optimal data distribution in its
paradigm (single file, block-based content distribution). I agree
with your suggestions for fairness criteria, and my own suggestions
are similar (my suggestion is for a reputation-based system to model
leechers and free-riders).
the first point I want to make clear is that we place ourselves in the
context of P2P file delivery in the current Internet.
This context has one important implications: there is a good network
connectivity and everybody can discuss
with everybody (2 NATed peers cannot communicate
with each other directly, but this is not a network issue, this is an
end user issue)
You will see in the following why this is important.
That should be clear in the paper, if it was not tell me so that I can
improve the argumentation.
However, I disagree with several conclusions you make along the
way. A brief overview:
- Byte-for-byte fairness is not appropriate
I still claim that in the context of P2P file transfer (see below)
- Network Coding is not justified
in our context. Network coding can be a good solution in the context of
ad-hoc networks, or in any other network where connectivity
is a problem, i.e., when local rarest first does not work optimally.
Note, however, that network coding requires significant computing
capabilities that a sensor network or an ad hoc network based on basic
terminals (e.g., mobile phones) cannot afford for now.
There are other scenarios where network coding can help.
But, I still claim that in the current Internet and on a representative
set of reat torrents we monitored, network coding would not improve
the performance significantly enough to be justified.
Fairness:
I am unconvinced by the arguments against byte-for-byte fairness.
Your evidence to support this is that tit-for-tat does not take into
account the excess capacity of some torrents (ie: more seeders than
leechers). Yet this situation is equivalent under byte-for-byte and
choking rules. Leechers are upholding the byte-for-byte policy
(seeders cannot, as you mention) and thus only leechers are
restricting their upload capacities to free-riders, and rightly so.
Seeds (who hold the excess capacity) are distributing this capacity
fairly to all peers. Thus, choking and byte-for-byte are equivalent
in this regard - free riders receive a share of the bandwidth equal to
the fair distribution given by seeders in both cases.
I am not sure I understand your point.
Byte-for-byte (BFB) algorithm and choke algorithm (CA) are far from
being equivalent.
In all the studies that mention BFB I am aware of, they never mention
the case of seeds. They simply say that peers must not receive more than
they give.
This is the definition of BFB. You can introduce a threshold, but it
does not change the main idea and there is no proposed solution
to define a dynamic threshold.
It is not correct to say that peers must not receive more than they
send. As I wrote in the report, leechers have an asymmetric capacity most
of the time and seeds offer capacity for free. With BFB you cannot
benefit from this excess capacity. The only one solution is to
define a dynamic threshold, but this is an optimization problem that is
not solved, and I believe to be very hard to solve practically, for no
benefit
compared to CA.
Remember that the main (and single argument) in all the studies that
discuss BFB solutions is that free riders can get a high
service from a torrent. This problem in all the studies I am aware of
comes from the old choke algorithm in seed state that
favors fast downloaders. With the new choke algorithm in seed state all
the observed unfairness disappear (in the sense of the fairness
I define in the paper).
The more important case, when excess capacity is not available,
free riders should be eliminated or diminished. Because leecher
participation cannot be proven, the only other method of measurement
is via equal exchange of services. Byte-for-byte is one such
measurement (reputation systems are another). In this scenario, free
riders will only receive a small credit from the leechers in the swarm
before having to resort to hand-outs from the seeds. With the seeds
giving fair capacity to the swarm, the free riders will receive very
little of the swarm capacity, while the participating leechers will
receive fair resources.
CA gives the same result from our measurements. It guarantees that a
peer cannot receive more from a torrent than a peer that contributes
more than him.
Here again, with BFB we will not use all the capacity of the torrent
because it is based on a fairness criterion that is not correct in a P2P
architecture.
Free riders always receive more resources under choking than under
exchange measurement models.
and that is fine as long as they do not receive more than someone that
contributes.
I understand that you argue in favor of global efficiency. It is true
that if you do not give anything to free riders, you will have
more capacity for the ones that contributes. But it is not clear that
they can make use of it, and it is not clear that giving a small service
to free riders impact the overall torrent.
I have shown a long time ago in the context of multicast that it is
possible to define a policy that gives more to multicast users without
significantly decreasing the performance of unicast ones.
Even if the context is different, I believe that the issue is equivalent.
Network Coding:
On the topic of Network Coding [1], I did not see any evidence in
your paper refuting the network coding literature. Please correct me
if I missed it, but it appears your only rebuttal is in the poor
modeling of BitTorrent, not in the application of network coding
techniques. From this I don't see how you conclude that network
coding is not justified? I agree that most of the simulation
literature does not model BitTorrent correctly, but this doesn't
negate the positive results of the network coding trials, only their
comparisons to BitTorrent.
That is not my point. We claim that we cannot gain a significant
performance improvement using network coding in the context we are
studying.
As I said at the beginning of this email network coding is undoubtedly
useful in a number of situations, not in our context.
However, the context we are studying is very important and there are
tons of persons believing that they can improve a P2P client
for file sharing over the Internet using network coding techniques. This
is not true and this is due to a misunderstanding of the
dynamics of BitTorrent that I hope the paper help to understand better.
In section 4-A, pp 6-7, you suggest that source or network coding
may not propose interesting pieces to peers during the startup phase.
I counter that this is exactly what network coding solves - every
packet has an innovation rate based on the network flows from the
local node to the source(s). Assuming a densely connected graph with
fair sharing from the seeder, it is unlikely that any packet is not
innovative during the startup phase. Unless the local node's
bandwidth is many multiples of the seeder, a problem which is not
specific to network coding.
You did not get the point. consider a seed with a finite upload capacity
C Bytes/s and a file to send of size S Bytes.
Then the seed needs in the best case S/C seconds to send the entire
content. Whatever coding technique you use you cannot do it faster.
We identified that when the performance of a torrent is not optimal,
then it is in a transient phase. That means that the seed has not yet
sent a copy of each piece. If you use network coding of whatever coding
you can imagine, we will not improve that fact that information is missing
in the torrent and that is this missing information that causes the
decrease in performance.
Network Coding is superior to Rarest First and other
BitTorrent-like protocols for several reasons:
1) The innovation rate is the min-cut of all flows from the local node
to the source(s), which is likely to be much larger than BitTorrent's
default of 4 active connections.
2) BitTorrent transfers pieces in blocks-at-a-time, but only
advertises in piece-at-a-time. This injects a significant delay
between downloading of an innovative byte and sharing of that byte (on
average, one half the download time of an entire piece). Network
coding imposes no such delay.
3) There is no first-blocks or last-blocks problem with network
coding, they are entirely non-issues. Large portions of your paper
are devoted to these issues which simply don't occur under network coding.
4) The meta-network utilization is higher for network coding. That
is, since BitTorrent cannot align to the underlying physical
connections of the network elements, it is impossible to maximally
utilize them. Network coding (using an innovation rate tracking
algorithm) can rapidly align to the structure of the physical network,
eliminating cross-core retransmissions. This saves a significant
amount of ISP bandwidth and reduces the overall burden of supporting
filesharing services on the internet.
All you say is theory and is true, but the real difference between
network coding and rarest first depends on the context.
All I can tell you is that measurements show that in the context I am
discussing there is not significant differences between network coding
and Rarest Fist.
Is the full implementation of Network Coding superior enough to
BitTorrent that it offers compelling reasons for migration to it as a
solution? Your paper argues that BitTorrent is "good enough", and I
tend to agree.
I do not believe it makes sense to say that one solution is superior to
another one in all cases.
In our specific context, the rarest first and choke algorithms are
enough, but not superior with respect to performance. But they are much
simpler.
In other contexts Network Coding will be superior.
The point of the paper is to show which kind of performance we can
achieve using rarest first and choke algorithms, and to stop
legends that say that there is always last pieces problems problems or
fairness issues. These legends, as we show with the notion of transient
phase,
are due to a misunderstanding of the very complex dynamics of P2P
protocols.
Regards,
Arnaud.
_______________________________________________
p2p-hackers mailing list
p2p-hackers@zgp.org
http://zgp.org/mailman/listinfo/p2p-hackers
_______________________________________________
Here is a web page listing P2P Conferences:
http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences