Re: [gtk-gnutella-devel] Queuing strategies [WAS: problems with THEX downloads via push]

Bill Pringlemeir Wed, 05 Sep 2007 13:58:08 -0700

On 1 Sep 2007, [EMAIL PROTECTED] wrote:
> Here are a few comments.

> Quoting Bill Pringlemeir <[EMAIL PROTECTED]>:
>> There are several deterministic versions of fair.


>> - First come, first serve.
>> - Minimize time from request to completion.
>> - Maximize content diversity.
>> - Maximize throughput.

>> All of these can be on a per request or per peer basis.  There are
>> also bounds on resources such as the number of sockets/file
>> descriptors that will be open at one time.

> Actually, maximize throughput is not fair at all.  It's just a local
> strategy.

Maybe I should clarify throughput.  It is to use all available
bandwidth from the serving host.  This is always globally efficient.
How can it not be?  If every host does this, then the maximum data
requested is being served globally.  Maybe I don't understand
something?

The only difference I see is that throughput might refer to the
bandwidth*delay product.  So some un-efficient TCP/IP implementation
might have the bandwidth, but be left waiting for acks.

> Maximization of uploading bandwidth is already the strategy of GTKG.
> Whether you use 4 slots or 30 slots, the whole uploading bandwidth
> will be used (assuming remote peers can withstand it).

But there is a problem as discussed.  Slow peers will tend to stick in
the queue longer.  So "assuming remote peers", may often not be the
case.  With the current algorithm, I think that about 50-75% of the
time, full bandwidth is being used on my system (using "status" from
the shell).  With the following config,

$ printf "print max_simultaneous_uploads\nprint ul_usage_min_percentage\n
           print bw_ul_usage_enabled\n" | gtkg-shell
Value: 10
Value: 70
Value: TRUE

$ echo uploads | gtkg-shell | wc
     56     330    5851

$ grep POS parq | wc
  13856   27712  130873

This setup helps keep bandwith maximized, but it does start to use
more resources than most people would like.  Considering that each
upload could have a socket and a file this is consuming a considerable
amount of system resources.

>> A better single served file algorithm would require accurate
>> estimates of bandwidth between peers as well as the file size.
>> This is actually the real time for the download.

> This is almost impossible to estimate, and furthermore, you cannot
> guess just how much of a file a remote peer will ask for.

Did you look at the URLs I cited previously?  Van Jacobson is one of
the original people to propose capacity measures with pathchar.

Some definitions,

    Path          - series of links from sender to receiver.
    Capacity      - minimum transfer rate over all links.
    Bandwidth     - minimum available bandwidth over all links.
    Narrow link   - the link with the minimum capacity.
    Tight link    - the link with the minimum bandwidth.
    Cross traffic - existing traffic from other application.

    Capacity = Bandwidth + Cross Traffic

Anyways, the only way to measure bandwidth/capacity is to use timers
and have accurate measures of the packet size.  Capacity measures
typically send two back to back packets.  The packet size divided by
the difference between receive times gives the capacity.  Obviously
buffering, etc will skew the measure, so typically they pick a measure
near the minimum and do different statistical tricks.

For available bandwidth, packets are sent in a stream at precise
intervals.  If they are received with similar intervals, then the
stream is under the bw.  If the received intervals increase, you can
estimate the available bw.  The time increase is due to buffering
through the network.  Some methods only do the later.  The pathload
mechanism seems to be the "state of the art".  It does a binary search
using both methods and takes about 1 minute to get a measure.  It also
doesn't saturate the channel and could piggy back on regular GNET/http
traffic.

  "http://www-static.cc.gatech.edu/fac/Constantinos.Dovrolis/pathload.html";
  Paper in pdf, 
"magnet:?dn=ip_time_measurement-pathload.pdf&xs=http://gtkg69.dyndns.org:23335/uri-res/N2R?urn:bitprint:6MVKGBS62JRON7D6PAW6KRNT62ZRAIF2.FZATOSYD7IUVJLGKTNZ3VMHHZAGEC2UDBYFO4UI&xl=521132";

There are methods that pretend not to need receiver functionality, but
they usually rely on an ICMP reply being sent due to a bad packet [and
typically measure capacity].  They would be useful for a sys admin
setting up network rules.  They are not useful for an application or
users.

Here is those lists again,

  "http://www.caida.org/tools/taxonomy/performance.xml";
  "http://www.cs.ucla.edu/NRL/CapProbe/";

Knowing available bandwidth would be a great asset for many gnutella
related activities.  The GNET overlay could be structured to be more
efficient.  Downloads and uploads could select better matched links.

Consider three sources downloading the same sized files with available
bandwidth greater than the uploader.  The best strategy is to serve
each one individually.  The total time they have to wait is
(1+2+3)*(size/bw) [and we only use one fd and socket].  If we split
the uploads, they have to wait (3*3)*(size/bw).  The only benefit of
splitting is unknown downloaders have multiple sources.  However, it
is still statistically better with 3 or more to use a single transfer
at a time (as the other two might use another source).  Also, with my
naive understanding, it seems better to only allow a single source to
have one slot no matter what (TTH aside).  They get a complete file
faster.

The bandwidth values set by the user are problematic.  Typically the
user will enter a capacity value.  It is difficult to know the exact
available bandwidth as it will vary over time.  Even capacity can vary
over time, such as on a wireless link or as telephones become wet over
DSL, etc.  A bandwidth detection scheme can dynamically adapt to
changing conditions.

I had looked at the TCP timestamp option.  However, this only appears
to be accessible to the stack.  In order to implement the pathload,
send and receive sides need to add timestamp information [and be aware
of packet lengths].  This is typically a 32 bit value, resulting in an
8 byte overhead.  I think this could be inserted before compression
and encryption, but calculation would be easier at the transport
level.  8 bytes/MTU is very little overhead for a critical piece of
information.

It maybe possible to use tcp, recvmsg() and CMSG_NXTHDR to get a
timestamp value.  This would be nice as it wouldn't require both sides
to be aware of the bandwidth measurements.

> Finally, note that upload strategy could be influenced by the
> estimated popularity of a file.  Today, this is hard to achieve, as
> the download mesh can become disconnected.  But one could use
> strategies like "maximize disseminitation" for rare files, accepting
> all the requests for such files or giving more priorities to them.

> My intuition is that optimum local strategies are not necessarily
> optimum global strategies, and vice versa.

I would prefer something better than intuition.  Although lacking
anything else, this is better than nothing.  My intuition is that the
current gnet composition favors popular files.  This is good to a
degree.  Popular content needs more sources to provide the bandwidth.
However, the current system favors popular sources and makes it hard
for rare content to become more popular.  I wouldn't propose an all or
nothing mechanism.  I think this is a distributed control system.
Increasing diversity should just be a factor.

$ echo uploads | gtkg-shell | cut -c 40- | cut -d\" -f 2 | sort | uniq -c \
   | sort -n | cut -c -8 | uniq -c
     24       1
      4       2
      2       3
      1       5

 ** taken at different time period.

So one file has five different downloaders.  Sending to a high
bandwidth pfsp capable host(s) would be a better use of network
bandwidth (and resources).  I think that this would map globally.

I would like to know any issue that would make global strategies
different than local.  Surely a local strategy that optimizes
bandwidth is also globally optimal (given no change in content
served).  Similarly, a local strategy to "Minimize time from request
to completion" is also globally beneficial (unless you assume that
people stop sharing as soon as the file is complete, but that has
other implications).

Regards,
Bill Pringlemeir.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
gtk-gnutella-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gtk-gnutella-devel

Re: [gtk-gnutella-devel] Queuing strategies [WAS: problems with THEX downloads via push]

Reply via email to