On 02/28/10 06:10 AM, Shawn Walker wrote:
On 02/27/10 10:29 PM, Anon Y Mous wrote:
Ok, install OpenSolaris on a server, create a zone, and then in your
first zone just running this very basic IPS command:
pkg install SUNWman
takes forever because it downloads each man page file individually.
Downloading a large number of small files via FTP or whatever will
always be slower than downloading one really big file over a TCP/IP
network, and this is the achilles heel of IPS.
The fact that a majority of man pages are delivered in a single
package is a package bug:
http://defect.opensolaris.org/bz/show_bug.cgi?id=1964
With that said, yes, the individual file retrieval system currently
used by pkg(5) does have tradeoffs in transfer performance depending
on the scenario. In the initial install case, a single pre-generated
bundle would be better for transfers, although not nearly as much as
some might believe.
However, a per-file based retrieval system has advantages over one
that relies on pre-generated bundles of package content.
Specifically, it enables additional functionality that would otherwise
be costly (resource-wise) or not as practical, such as:
* multi-variant (and facet) packages
-- Allows package creators to determine package boundaries using
delivered functionality (content) instead of being forced to use
architecture or zone variations.
-- Enhances the user experience by simplifying the management of
packages on a system.
-- Makes it possible for a user to change between or add additional
sets of functionality to their system efficiently. For example,
change from x86 to SPARC, downloading only the files that have changed.
* efficient update operations
-- Because pkg(5) only retrieves the files that have changed between
package versions between updates, the client only downloads exactly
what it needs to perform update operations. Other systems have chosen
to implement this by pre-generating deltas between package versions,
but that also means that their users have to rely on a pre-generated
delta being available for every possible origin point from which they
need to update from and to.
* efficient repair operations
-- Because pkg(5) only has to retrieve the individual files it needs
to perform an operation, this greatly reduces the time needed to
restore missing package content on a system.
* greatly reduced publication resource costs
-- Because pkg(5) stores package content as individual files (by
content hash), files that are identical between packages are shared on
the server. In addition, files that don't change between package
versions are also shared. This can greatly reduce the resource-cost
of publication storage and publication time.
apt-get on Nexenta is really kicking OpenSolaris IPS's butt right now
in terms of performance and this is a real "apples to apples"
comparison because both Nexenta and Oracle OpenSolaris both use the
same SunOS / unix / genunix kernel, the same ZFS file system, etc. so
the problem is obviously with the way IPS is implemented and not with
the Solaris kernel itself (which has blazing fast performance as seen
on Nexenta and Milax).
At the moment, I'm not aware of any posteriors that feel harmed.
With that said, apt and pkg(5) do not have completely equivalent
functionality, so it's difficult to perform an "apples-to-apples"
comparison. As an in-devleopment project that has been around for far
less years (decades?) than apt-get, I believe pkg(5) is doing pretty
well. Performance work remains on-going.
Where are all the dtrace performance improvement people hiding?
Perhaps they can use dtrace to explain why apt-get on Nexenta's
apt-get is so much faster, since Nexenta also has dtrace? Might be a
good topic for a sun.com blog or paper write-up.
When you look for problems, you'll find them. But I'd also say, if
you don't look for *improvements*, you won't find them either.
Quite frankly, when you say things like "is so much faster" without
exactly quantifying what you're talking about, it tends to look a
little vague and hand-wavy.
If you look back at the bug entries for pkg(5) or the mailing list
discussions, you'll find that a significant amount of time during the
last release cycle has been spent on improving performance. In
addition, every time a major change was made, performance has always
been part of the discussion.
I personally have spent easily a few weeks worth of time dtrace
profiling parts of the package system and many more using other
methods of profiling. And I know other team members have spent a lot
of time doing performance analysis as well.
For example, the performance of pkg info and pkg list has greatly
improved:
* pkg info on 2009.06: 3 seconds -> 2010.x: 0.3 seconds
* pkg list on 2009.06: 5.4 seconds -> 2010.x: 0.57 seconds
* pkg list -as on 2009.06: 33 seconds -> 2010.x: 3.56 seconds
Likewise, Johansen replaced pkg(5)'s existing transport mechanism that
relied on core python libraries with libcurl instead, which has given
pkg(5) substantially better performance during transfer operations.
In addition, a new observation-based performance metric system is in
place now that will attempt to automatically select the best "mirror"
for retrieving package content if a user has configured a publisher
appropriately.
If you also compare the performance and feel of the packagemanager in
2009.06 to that of the upcoming release, I think you'll find that it
is significantly faster, more responsive, and more enjoyable to use.
John, Padraig, Michal, and the rest of the packagemanager team have
done wonders with it.
So, in summary, the performance of the pkg(5) system is continually
tracked, and as resources and time have permitted, improvements were
made.
And we're not done yet...
Shawn,
Thank you for taking the time to explain this.
I now feel a bit sorry for having initiated this long thread.
I do not think pkg is a bad tool, some rough edges, some missing
features, certainly.
My purpose was more to help than to complain.
Bruno
PS : Whatever, I bought myself 4G of ram for my laptop today, so
hopefully my pains are over (and, yes, the change feels quite incredible).
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org