On 02/26/13 11:34, Philip Brown wrote:
On 01/23/13 09:46 AM, Philip Brown wrote:
On 01/18/13 05:54 PM, Bart Smaalders wrote:
There's a slight difference, but nothing substantial. Ops Center must
be doing something silly.

Is the Intel kit slow under ops center as well, or just the SPARC?



Unfortunately, I had some issues getting our x86 machines to use
opscenter this week,even though they were working previously.


I've finally been able to get back to investigating the problems we've
been having.
Turns out, it's not SSL... because opscenter doesnt use SSL for the
actual package transfers.
The HUGE slowness problem we saw, seems to be a combination of:
1. something weird opscenter does
2. something weird IPS does
3. a misconfiguration that leaked in from somewhere.

The good news is, I can now positively identify ALL of the above. So I'm
posting a summary of findings to the list.

Background: opscenter is a distributed control system,with a "master"
controller, and assorted proxies, to distribute load. Solaris 11
installation is supposed to be handed off to a "proxy controller".

It turns out that the proxy controller, for purposes of IPS installs, is
a literal apache proxy. With a confusing multi-level httpd
configuration. It's supposed to be a caching proxy, so I think it serves
out the packages from cache, after initial load.

I explored the opscenter http configs, and found this shocking comment:

# The pkg client opens 20 parallel connections to the server when
performing
# network operations.

This turned out to be the key. The MaxClients knob had gotten set too
low, and it was starved for working connections.

Okay, this fixes my immediate problem. but what does that say about IPS?
Seems like there are multiple problems there.

First of all: It shouldn't degrade into glacial speed, when it can't
open 20 full connections!!!

If a web server can't spare at least 20 connections at any point, there's definitely a configuration issue.

Secondly... why is it being so obnoxious about so many connections? I
decided to put it to the test.

20 is hardly "so many connections"; have you seen how many connections a torrent client opens?

I created 20x 100mb files, and downloaded them, first with "wget file1
file2 file3..." and then
"wget file1&; wget file2&"
When dumping the data to /dev/null, I was surprised to find that 20 in
parallel was actually faster. About 30 seconds vs 34 seconds, usually

Yes, which is part of why IPS uses parallel connections. We'd rather use HTTP pipelining, but most HTTP servers implement that poorly, and some proxies are completely busted when it comes to it.

However: IPS doesnt dump downloaded packages to /dev/null. So time for
some more realistic tests!

When I set my tests to save the files to a ZFS filesystem (with
atime=off) I found that the transfer times were much more variable, and
generally speaking, there was no significant difference between the two
methods. They all took around 1 minute.
1:30 on slower(disk) hardware. Results tended to be within 1 second of
each other.

So, I would suggest that IPS be fixed to use fewer connections. It's
currently being obnoxious to any http front end, and for no significant
benefit.
At the very minimum, it needs to handle connection starvation better.

There's no "fix" here; the value of 20 was chosen after careful evaluation for determining the optimal number of parallel connections. In a properly configured environment, it currently provides the best performance.

The current transport system has advantages and disadvantages, and alternatives will be investigated at some point in the future, but for now proper configuration is important for maximal performance.

HTTP in general is not an optimal transport for large amounts of file data.

-Shawn
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to