Re: [announce] yum: parallel downloading

2012-05-22 Thread James Antill
On Tue, 2012-05-22 at 11:24 +0930, Glen Turner wrote:
> On 21/05/2012, at 5:12 PM, Zdenek Pavlas wrote:
> > 
> > Three connections limit is used when the above is not available
> > (e.g. a baseurl setup with just one mirror).  I don't mind lowering
> > it to just two, as that should work good enough in most cases.
> 
> Yes please. Two is better than three from the server's point of view.

 Do _any_ web browsers have a connection limit lower than 4? Given that
and that we obey the max connections limit of the user and that in the
metalink ... it seems pretty alarmist to say 3 is too many to a single
server.

> Yum doesn't put in the HTTP header (say, appended to User-Agent) why
> it decided to use our mirror, so we have no count of the sites or
> users which manually set baseurl or give our mirror's name to
> anaconda. Mailing list postings often recommend this when people
> complain of slow or expensive downloads.

 You can open an RFE in bugzilla, and we could look at it if it's really
useful information ... one problem though is that we combine all
bareurls+mirrors, and move between them at the urlgrabber layer. So we
couldn't easily just change something on the yum side.

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-22 Thread Zdenek Pavlas
> I'd be happy if yum/urlgrabber/libcurl finally used http keepalives.

It does, indeed. Parallel downloader tries to use keepalives, too.
(we cache and reuse the last idle process instead of closing it)

> Last time I looked (and it has been a while), it didn't, so you
> always paid the TCP slow startup penalty for each package.

/me just checked with tcpflow that we really do.
Please, contact me off-list if you can reproduce it. Thanks!
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-22 Thread Matt Domsch
On Mon, May 21, 2012 at 05:33:51AM -0500, Zdenek Pavlas wrote:
> > The number of concurrent users is now lower because, well, each of
> > them now completes a "yum update" in one third of the time.
> 
> I think Glen's concerns were that the consumed resources
> (flow caches, TCP hash entries, sockets) may scale faster
> than the aggregated downloading speed.
> 
> I am aware of this, and in most cases the downloader in urlgrabber
> will make just 1 concurrent connection to a mirror, because:
> 
> 1) The Nth concurrent connection to the same host is assumed
>to be N times slower than 1st one, so we'll very likely
>not select the same mirror again.
> 
> 2) maxconnections=1 in every metalink I've seen so far.
>This is a hard limit, we block until a download finishes
>and reuse one connection when the limit is maxed out.

That's MirrorManager's doing.  I don't have a box for mirror admins to
tell MM otherwise - that they'd be happy with >1 connection.  I
suppose that could be added, default to 1.

> The reason for NOT banning >1 connections to the same host altogether
> is that (as John Reiser wrote) 2nd connection does help quite a lot
> when downloading many small files and just one mirror is available.
> I agree that using strictly 1 connection and HTTP pipelining would 
> be even better, but we can't do that with libcurl.

I'd be happy if yum/urlgrabber/libcurl finally used http keepalives.
Last time I looked (and it has been a while), it didn't, so you always
paid the TCP slow startup penalty for each package.

-- 
Matt Domsch
Technology Strategist
Dell | Office of the CTO
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-21 Thread Glen Turner

On 21/05/2012, at 5:12 PM, Zdenek Pavlas wrote:
> 
> Three connections limit is used when the above is not available
> (e.g. a baseurl setup with just one mirror).  I don't mind lowering
> it to just two, as that should work good enough in most cases.

Yes please. Two is better than three from the server's point of view.

Yum doesn't put in the HTTP header (say, appended to User-Agent) why it decided 
to use our mirror, so we have no count of the sites or users which manually set 
baseurl or give our mirror's name to anaconda. Mailing list postings often 
recommend this when people complain of slow or expensive downloads.

-- 
 Glen Turner 

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-21 Thread David Malcolm
On Wed, 2012-05-16 at 12:07 -0400, Zdenek Pavlas wrote:
> Hi,
> 
> A new yum and urlgrabber packages have just hit Rawhide.  These releases
> include some new features, including parallel downloading of packages and
> metadata, and a new mirror selection code.  As we plan to include these
> features in RHEL7, I welcome any feedback or bug reports!
This may be a crazy idea, but has anyone looked at making yum's
downloader use bittorrent?

For better or worse, yum is constantly hitting the repo metadata files,
so presumably it could benefit from making that be a peer-to-peer
operation, and grabbing/sharing the packages via bittorrent could help
reduce the load on mirrors when we do a big update.

(obviously there would be concerns about firewalls, asymmetric upload
speeds, impact on our metrics, bittorrent being regarded as evil by
ISPs, etc)

Dave

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-21 Thread Zdenek Pavlas
> The number of concurrent users is now lower because, well, each of
> them now completes a "yum update" in one third of the time.

I think Glen's concerns were that the consumed resources
(flow caches, TCP hash entries, sockets) may scale faster
than the aggregated downloading speed.

I am aware of this, and in most cases the downloader in urlgrabber
will make just 1 concurrent connection to a mirror, because:

1) The Nth concurrent connection to the same host is assumed
   to be N times slower than 1st one, so we'll very likely
   not select the same mirror again.

2) maxconnections=1 in every metalink I've seen so far.
   This is a hard limit, we block until a download finishes
   and reuse one connection when the limit is maxed out.

The reason for NOT banning >1 connections to the same host altogether
is that (as John Reiser wrote) 2nd connection does help quite a lot
when downloading many small files and just one mirror is available.
I agree that using strictly 1 connection and HTTP pipelining would 
be even better, but we can't do that with libcurl.

--
Zdenek
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-21 Thread Zdenek Pavlas
Hi Glen,

> Why is the default three connections rather than one? Is a tripling
> of the number of connections to a mirror on a Fedora release day
> desirable?

$ grep maxconnections /var/cache/yum/*/metalink.xml
/var/cache/yum/fedora/metalink.xml:  
/var/cache/yum/updates/metalink.xml:  

Yum understands this.

> Consider that a large mirror site already sees concurrent connections
> in the multiple 10,000s.

Three connections limit is used when the above is not available
(e.g. a baseurl setup with just one mirror).  I don't mind lowering
it to just two, as that should work good enough in most cases.
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-20 Thread Reindl Harald


Am 20.05.2012 07:02, schrieb Rob K:
> On Sat, May 19, 2012 at 1:34 AM, José Matos  wrote:
> 
>> The total number of connections should be the same, as far as I
>> understand only the number of connections from a single host will be three.
>> Since it should be safe to assume that the downloads are independent
>> events then there should not be any significant difference for busy
>> servers. :-)
> 
> 3 times x is 3x. Tripling the stampede on a busy server for a dubious
> increase in personal speed is kind of rude, from the mirror manager's
> point of view. The 'ftp accelerators' that divvy up a download into a
> hundred ftp range requests is pathological enough to deal with.

i generally do not understand how someone comes to the idea
in context of "parallel downloading" making more than one
connection to the SAME creeping mirror

as i first read "yum: parallel downloading" i was happy and thought
"yeah, the problem with creeping updates will be solved" and as
i saw how this should be implemented i could not believe my yes

> Before this work proceeds, can we please see some hard numbers, and
> get some input from mirror managers?

if they read this idea i fear they stopped breathing :-(

PLEASE: stop this idea as long it doe snot spread the donwloads over
different mirrors because the current direction makes only damage
in the infrastructure with less improvement for the user





signature.asc
Description: OpenPGP digital signature
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-20 Thread Roberto Ragusa
On 05/20/2012 06:10 AM, Glen Turner wrote:
> On 19/05/12 01:04, José Matos wrote:
> 
>> The total number of connections should be the same, as far as I
>> understand only the number of connections from a single host will be three.
> 
> The risk is the rise in the maximum number of concurrent connections. A
> server happily supplying 50,000 concurrent connections should not be
> assumed to remain happy at 150,000 concurrent connections.

Why do you think that there will be 150,000 concurrent connections?
The difference could be that instead of
- 50,000 concurrent users, each downloading one file
you have
- 16,667 concurrent users, each downloading 3 files
The number of concurrent users is now lower because, well, each of them
now completes a "yum update" in one third of the time.

Reality could be different for several reasons (are users bandwidth limited?
if the server bandwidth limited?), but the concept is fine and it has
been perfectly expressed by Jose.

>> Since it should be safe to assume that the downloads are independent
>> events then there should not be any significant difference for busy
>> servers. :-)
> 
> I am afraid that I have missed your point here. I am somewhat blinded by
> the use of the word "independent". I have a statistical background and
> that word carries a meaning similar to "unrelated".

50,000 connections from different users are independent.
50,000 connections from 16,667 users doing 3 connections are almost
as independent as before.
Statistically, consider a random variable which is 0 (not downloading)
or 1 (downloading).
Compare:
  sum of N independent variables
to:
  three times the sum of N/3 independent variables
If N>>3, not only the average is the same, but higher-order statistics are
only slightly higher. It is reasonable to say that the probability distribution
is practically the same.

An example:

- is_downloading_one_file probability p=0.01
- number of users N=1,000,000

--> concurrent downloads: average=10,000 (sigma=~100)

vs

- is_downloading_three_files probability p=(0.01/3)
- number of users N=1,000,000

--> concurrent downloads: average=10,000 (sigma=~170)

-- 
   Roberto Ragusamail at robertoragusa.it
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-19 Thread Rob K
On Sat, May 19, 2012 at 1:34 AM, José Matos  wrote:

> The total number of connections should be the same, as far as I
> understand only the number of connections from a single host will be three.
> Since it should be safe to assume that the downloads are independent
> events then there should not be any significant difference for busy
> servers. :-)

3 times x is 3x. Tripling the stampede on a busy server for a dubious
increase in personal speed is kind of rude, from the mirror manager's
point of view. The 'ftp accelerators' that divvy up a download into a
hundred ftp range requests is pathological enough to deal with.

Before this work proceeds, can we please see some hard numbers, and
get some input from mirror managers? Remember that the project relies
in great part on the generosity of the institutions that provides
mirrors.

-- 
Rob K
http://ningaui.net
I swear, if I collected all seven dragonballs,
I'd bring back Jon Postel. - Raph
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-19 Thread Glen Turner
On 19/05/12 01:04, José Matos wrote:

> The total number of connections should be the same, as far as I
> understand only the number of connections from a single host will be three.

The risk is the rise in the maximum number of concurrent connections. A
server happily supplying 50,000 concurrent connections should not be
assumed to remain happy at 150,000 concurrent connections.

> Since it should be safe to assume that the downloads are independent
> events then there should not be any significant difference for busy
> servers. :-)

I am afraid that I have missed your point here. I am somewhat blinded by
the use of the word "independent". I have a statistical background and
that word carries a meaning similar to "unrelated".

Perhaps you could state your argument with more explanation

Thank you, Glen

-- 
Glen Turner   www.gdt.id.au/~gdt
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-19 Thread Frank Murphy

On 16/05/12 17:36, Antonio Trande wrote:

 > Both packages are compatible with older versions.

Can we use them in Fedora 17 too ?



Currently using the rawhide pks in F17

--
Regards,
Frank
"Jack of all, fubars"
--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-18 Thread Reindl Harald


Am 18.05.2012 20:31, schrieb John Reiser:
> On 05/18/2012 08:59 AM, Reindl Harald wrote:
>> why are making the connections to the SAME mirror at all?
>> it would make much more sense to download packages
>> parallel and each one from a different mirror
> 
> I find that two simultaneous threads to the same one mirror
> gives shortest time to completion for an entire list of downloads,
> particularly when one thread downloads from smallest to largest,
> while the other thread downloads from largest to smallest.
> The latency for setup+takedown of a connection for each package
> represents lost bytes that could have been transferred.  The other
> thread fills that gap much of the time.  When both threads actually
> are sending, then the network algorithms (and/or server policies
> regarding allocation of resources to the same endpoint) work,
> maintaining near-maximal total transfer rate at very low cost.

this doe snot help well if the mirror does not offer more
than 1 MB/sec while my connection can 12 MB/sec

i saw this last week by a KDE update of a co-wroker
download creeping around, CTRL+C one time picks
a mirror which is not really faster and the second
CTRL+C stops yum :-(

is such caes you have > 20 packages with > 1 MB and
could by using a different mirror for each one really
use the 12 MB/sec download rate of the client

with more than one connection you finally only abuse
overloaded servers more and make things worser

in 7 years fedora i saw only one time a dist-upgrade with
10 MB/seconds with yum (in times where each CTRL+C switched
to the next mirror instead stop the download)



signature.asc
Description: OpenPGP digital signature
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-18 Thread John Reiser
On 05/18/2012 08:59 AM, Reindl Harald wrote:
> why are making the connections to the SAME mirror at all?
> it would make much more sense to download packages
> parallel and each one from a different mirror

I find that two simultaneous threads to the same one mirror
gives shortest time to completion for an entire list of downloads,
particularly when one thread downloads from smallest to largest,
while the other thread downloads from largest to smallest.
The latency for setup+takedown of a connection for each package
represents lost bytes that could have been transferred.  The other
thread fills that gap much of the time.  When both threads actually
are sending, then the network algorithms (and/or server policies
regarding allocation of resources to the same endpoint) work,
maintaining near-maximal total transfer rate at very low cost.

My connection is Comcast cable modem: advertised as 12 to 15Mbit/s,
often peaking at 25Mbit/s, never more than 30Mbit/s.
[Network buffer bloat _helps_ me when I download larger packages.]
The median package is around 120KByte: setup+takedown is _longer_
than transfer time to me.  As transfer speed increases, then
the setup+takedown becomes an even larger fraction of total time.
Using http://, it can be very advantageous to perform multiple
consecutive GET commands on the same connection (counting bytes to
separate the returned concatenation), bypassing the takedown+setup
for the next file.

Package size of 1MByte (around 1 second of sending time to me)
occurs at position 2303 of 2674 packages in Fedora 17 [smallest
to largest, and not counting @Languages.]  86% of packages are
1MByte or smaller.  Package size of 5MByte is position 2568
of 2674 packages.

-- 
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-18 Thread Reindl Harald


Am 18.05.2012 17:34, schrieb José Matos:
> On 2012-05-18 15:23, Glen Turner wrote:
>> Hi Zdenek,
>>
>> Why is the default three connections rather than one? Is a tripling of
>> the number of connections to a mirror on a Fedora release day desirable?
>> Consider that a large mirror site already sees concurrent connections in
>> the multiple 10,000s.
>>
>> Cheers, Glen
> 
> The total number of connections should be the same, as far as I
> understand only the number of connections from a single host will be three.
> Since it should be safe to assume that the downloads are independent
> events then there should not be any significant difference for busy
> servers. :-)

why are making the connections to the SAME mirror at all?
it would make much more sense to download packages
parallel and each one from a different mirror

currently the largest problem is that most mirrors does
not reflect as example my 100 Mbit downstream connection
and "yum-plugin-fastest-mirror" makes completly wrong
decisions



signature.asc
Description: OpenPGP digital signature
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-18 Thread Remi Collet

Le 17/05/2012 18:25, Frank Murphy a écrit :

Traceback:
https://bugzilla.redhat.com/show_bug.cgi?id=822632


Another taceback with latest python-urlgrabber-3.9.1-13.fc18.noarch

  File "/usr/lib/python2.7/site-packages/urlgrabber/grabber.py", line 
2308, in update

speed = (k1 * speed + k2 * dl_size / dl_time) / (k1 + k2)
ZeroDivisionError: float division by zero


Despite, this minor issue, this seems a promissing feature :)


Remi

P.S. : bug updated
--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-18 Thread José Matos
On 2012-05-18 15:23, Glen Turner wrote:
> Hi Zdenek,
>
> Why is the default three connections rather than one? Is a tripling of
> the number of connections to a mirror on a Fedora release day desirable?
> Consider that a large mirror site already sees concurrent connections in
> the multiple 10,000s.
>
> Cheers, Glen

The total number of connections should be the same, as far as I
understand only the number of connections from a single host will be three.
Since it should be safe to assume that the downloads are independent
events then there should not be any significant difference for busy
servers. :-)

-- 
José Matos

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-18 Thread Glen Turner
On 17/05/12 01:37, Zdenek Pavlas wrote:
> - mirror limits are honored, too.
> 
> Making many connections to the same mirror usually does not help much, it just
> consumes more resources.  That's why Yum also uses mirror limits from
> metalink.xml.  If no such limit is available, at most 3 simultaneous
> connections are made to any single mirror.

Hi Zdenek,

Why is the default three connections rather than one? Is a tripling of
the number of connections to a mirror on a Fedora release day desirable?
Consider that a large mirror site already sees concurrent connections in
the multiple 10,000s.

Cheers, Glen
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-17 Thread elison.ni...@gmail.com
On Wed, May 16, 2012 at 9:37 PM, Zdenek Pavlas  wrote:
> Hi,
>
> A new yum and urlgrabber packages have just hit Rawhide.  These releases
> include some new features, including parallel downloading of packages and
> metadata, and a new mirror selection code.  As we plan to include these
> features in RHEL7, I welcome any feedback or bug reports!
>
> python-urlgrabber-3.9.1-12.fc18 supports a new API to urlgab() files in
> parallel, and yum-3.4.3-26.fc18 can use this.  Both packages are compatible
> with older versions.
>
> Feature list:
>
> - parallel downloading of packages and metadata
>
> If possible, multiple files are downloaded in parallel.  (see below for the
> limitations that apply)
>
> - configurable 'max_connections' limit in yum.conf
>
> This is the maximum number of simultaneous connections Yum makes.  Purpose of
> this is to limit local resources (number of processes forked).  The default is
> to use urlgrabber's default value of 5.
>
> - mirror limits are honored, too.
>
> Making many connections to the same mirror usually does not help much, it just
> consumes more resources.  That's why Yum also uses mirror limits from
> metalink.xml.  If no such limit is available, at most 3 simultaneous
> connections are made to any single mirror.
>
> - new mirror selection algorithm
>
> The real downloading speed is calculated after each download, and the mirror's
> statistics get updated.  These are in turn used when selecting mirrors for
> further downloads.  This should be more accurate than measuring latencies in
> fastestmirror plugin, but slow mirrors now have to be tried from time to time,
> and the statistics need some time to build up.
>
> - ctrl-c handling
>
> This is a long-standing problem in Yum.  Due to various shortcomings in rpm 
> and
> curl it's impossible to react immediately to SIGINT.  But now the downloader
> runs in a different process, so we can exit even if curl is still stuck.
> The "skip to next mirror" feature is gone (we don't want to restart all
> currently running downloads).
>
> Known limitations:
>
> - metalink.xml and repomd.xml downloads are not parallelized yet.
>
>
> --
> Zdeněk Pavlas
> --
> devel mailing list
> devel@lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/devel

Waiting to get this in Fedora 17. Great work!

Thanks,
Elison
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-17 Thread Frank Murphy

On 17/05/12 09:33, Zdenek Pavlas wrote:

So disable fastestmirror plugin before testing this,
would be the way to go?


The fastestmirror plugin does some initial mirror sorting.
We mostly ignore this, so disabling fastestmirror makes sense
but is not strictly necessary.

--
Zdeněk Pavlas


Traceback:
https://bugzilla.redhat.com/show_bug.cgi?id=822632

--
Regards,
Frank
"Jack of all, fubars"
--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-17 Thread Zdenek Pavlas
> So disable fastestmirror plugin before testing this,
> would be the way to go?

The fastestmirror plugin does some initial mirror sorting.  
We mostly ignore this, so disabling fastestmirror makes sense
but is not strictly necessary.

--
Zdeněk Pavlas
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-17 Thread Zdenek Pavlas
> > Both packages are compatible with older versions.
> 
> Can we use them in Fedora 17 too ?

Yes, I've used it in F14 for some time.

--
Zdeněk Pavlas
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-16 Thread Frank Murphy

On 16/05/12 17:07, Zdenek Pavlas wrote:


- new mirror selection algorithm


This should be more accurate than measuring latencies in

fastestmirror plugin, but slow mirrors now have to be tried from time to time,
and the statistics need some time to build up.



So disable fastestmirror plugin before testing this,
would be the way to go?

--
Regards,
Frank
"Jack of all, fubars"
--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: [announce] yum: parallel downloading

2012-05-16 Thread Antonio Trande
> Both packages are compatible with older versions.

Can we use them in Fedora 17 too ?

2012/5/16 Zdenek Pavlas 

> Hi,
>
> A new yum and urlgrabber packages have just hit Rawhide.  These releases
> include some new features, including parallel downloading of packages and
> metadata, and a new mirror selection code.  As we plan to include these
> features in RHEL7, I welcome any feedback or bug reports!
>
> python-urlgrabber-3.9.1-12.fc18 supports a new API to urlgab() files in
> parallel, and yum-3.4.3-26.fc18 can use this.  Both packages are compatible
> with older versions.
>
> Feature list:
>
> - parallel downloading of packages and metadata
>
> If possible, multiple files are downloaded in parallel.  (see below for the
> limitations that apply)
>
> - configurable 'max_connections' limit in yum.conf
>
> This is the maximum number of simultaneous connections Yum makes.  Purpose
> of
> this is to limit local resources (number of processes forked).  The
> default is
> to use urlgrabber's default value of 5.
>
> - mirror limits are honored, too.
>
> Making many connections to the same mirror usually does not help much, it
> just
> consumes more resources.  That's why Yum also uses mirror limits from
> metalink.xml.  If no such limit is available, at most 3 simultaneous
> connections are made to any single mirror.
>
> - new mirror selection algorithm
>
> The real downloading speed is calculated after each download, and the
> mirror's
> statistics get updated.  These are in turn used when selecting mirrors for
> further downloads.  This should be more accurate than measuring latencies
> in
> fastestmirror plugin, but slow mirrors now have to be tried from time to
> time,
> and the statistics need some time to build up.
>
> - ctrl-c handling
>
> This is a long-standing problem in Yum.  Due to various shortcomings in
> rpm and
> curl it's impossible to react immediately to SIGINT.  But now the
> downloader
> runs in a different process, so we can exit even if curl is still stuck.
> The "skip to next mirror" feature is gone (we don't want to restart all
> currently running downloads).
>
> Known limitations:
>
> - metalink.xml and repomd.xml downloads are not parallelized yet.
>
>
> --
> Zdeněk Pavlas
> --
> devel mailing list
> devel@lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/devel




-- 
*Antonio Trande
"Fedora Ambassador"

**mail*: mailto:sagit...@fedoraproject.org 
*Homepage*: http://www.fedora-os.org
*Sip Address* : sip:sagitter AT ekiga.net
*Jabber * :sagitter AT jabber.org
*GPG Key: 19E6DF27*
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

[announce] yum: parallel downloading

2012-05-16 Thread Zdenek Pavlas
Hi,

A new yum and urlgrabber packages have just hit Rawhide.  These releases
include some new features, including parallel downloading of packages and
metadata, and a new mirror selection code.  As we plan to include these
features in RHEL7, I welcome any feedback or bug reports!

python-urlgrabber-3.9.1-12.fc18 supports a new API to urlgab() files in
parallel, and yum-3.4.3-26.fc18 can use this.  Both packages are compatible
with older versions.

Feature list:

- parallel downloading of packages and metadata

If possible, multiple files are downloaded in parallel.  (see below for the
limitations that apply)

- configurable 'max_connections' limit in yum.conf

This is the maximum number of simultaneous connections Yum makes.  Purpose of
this is to limit local resources (number of processes forked).  The default is
to use urlgrabber's default value of 5.

- mirror limits are honored, too.

Making many connections to the same mirror usually does not help much, it just
consumes more resources.  That's why Yum also uses mirror limits from
metalink.xml.  If no such limit is available, at most 3 simultaneous
connections are made to any single mirror.

- new mirror selection algorithm

The real downloading speed is calculated after each download, and the mirror's
statistics get updated.  These are in turn used when selecting mirrors for
further downloads.  This should be more accurate than measuring latencies in
fastestmirror plugin, but slow mirrors now have to be tried from time to time,
and the statistics need some time to build up.

- ctrl-c handling

This is a long-standing problem in Yum.  Due to various shortcomings in rpm and
curl it's impossible to react immediately to SIGINT.  But now the downloader 
runs in a different process, so we can exit even if curl is still stuck.
The "skip to next mirror" feature is gone (we don't want to restart all
currently running downloads).

Known limitations:

- metalink.xml and repomd.xml downloads are not parallelized yet.


--
Zdeněk Pavlas
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel