On 02/28/10 06:10 AM, Shawn Walker wrote:
On 02/27/10 10:29 PM, Anon Y Mous wrote:
Ok, install OpenSolaris on a server, create a zone, and then in your first zone just running this very basic IPS command:

    pkg install SUNWman

takes forever because it downloads each man page file individually. Downloading a large number of small files via FTP or whatever will always be slower than downloading one really big file over a TCP/IP network, and this is the achilles heel of IPS.

The fact that a majority of man pages are delivered in a single package is a package bug:

http://defect.opensolaris.org/bz/show_bug.cgi?id=1964

With that said, yes, the individual file retrieval system currently used by pkg(5) does have tradeoffs in transfer performance depending on the scenario. In the initial install case, a single pre-generated bundle would be better for transfers, although not nearly as much as some might believe.

However, a per-file based retrieval system has advantages over one that relies on pre-generated bundles of package content. Specifically, it enables additional functionality that would otherwise be costly (resource-wise) or not as practical, such as:

* multi-variant (and facet) packages

-- Allows package creators to determine package boundaries using delivered functionality (content) instead of being forced to use architecture or zone variations.

-- Enhances the user experience by simplifying the management of packages on a system.

-- Makes it possible for a user to change between or add additional sets of functionality to their system efficiently. For example, change from x86 to SPARC, downloading only the files that have changed.

* efficient update operations

-- Because pkg(5) only retrieves the files that have changed between package versions between updates, the client only downloads exactly what it needs to perform update operations. Other systems have chosen to implement this by pre-generating deltas between package versions, but that also means that their users have to rely on a pre-generated delta being available for every possible origin point from which they need to update from and to.

* efficient repair operations

-- Because pkg(5) only has to retrieve the individual files it needs to perform an operation, this greatly reduces the time needed to restore missing package content on a system.

* greatly reduced publication resource costs

-- Because pkg(5) stores package content as individual files (by content hash), files that are identical between packages are shared on the server. In addition, files that don't change between package versions are also shared. This can greatly reduce the resource-cost of publication storage and publication time.

apt-get on Nexenta is really kicking OpenSolaris IPS's butt right now in terms of performance and this is a real "apples to apples" comparison because both Nexenta and Oracle OpenSolaris both use the same SunOS / unix / genunix kernel, the same ZFS file system, etc. so the problem is obviously with the way IPS is implemented and not with the Solaris kernel itself (which has blazing fast performance as seen on Nexenta and Milax).

At the moment, I'm not aware of any posteriors that feel harmed.

With that said, apt and pkg(5) do not have completely equivalent functionality, so it's difficult to perform an "apples-to-apples" comparison. As an in-devleopment project that has been around for far less years (decades?) than apt-get, I believe pkg(5) is doing pretty well. Performance work remains on-going.

Where are all the dtrace performance improvement people hiding? Perhaps they can use dtrace to explain why apt-get on Nexenta's apt-get is so much faster, since Nexenta also has dtrace? Might be a good topic for a sun.com blog or paper write-up.

When you look for problems, you'll find them. But I'd also say, if you don't look for *improvements*, you won't find them either.

Quite frankly, when you say things like "is so much faster" without exactly quantifying what you're talking about, it tends to look a little vague and hand-wavy.

If you look back at the bug entries for pkg(5) or the mailing list discussions, you'll find that a significant amount of time during the last release cycle has been spent on improving performance. In addition, every time a major change was made, performance has always been part of the discussion.

I personally have spent easily a few weeks worth of time dtrace profiling parts of the package system and many more using other methods of profiling. And I know other team members have spent a lot of time doing performance analysis as well.

For example, the performance of pkg info and pkg list has greatly improved:

* pkg info on 2009.06: 3 seconds -> 2010.x: 0.3 seconds
* pkg list on 2009.06: 5.4 seconds -> 2010.x: 0.57 seconds
* pkg list -as on 2009.06: 33 seconds -> 2010.x: 3.56 seconds

Likewise, Johansen replaced pkg(5)'s existing transport mechanism that relied on core python libraries with libcurl instead, which has given pkg(5) substantially better performance during transfer operations. In addition, a new observation-based performance metric system is in place now that will attempt to automatically select the best "mirror" for retrieving package content if a user has configured a publisher appropriately.

If you also compare the performance and feel of the packagemanager in 2009.06 to that of the upcoming release, I think you'll find that it is significantly faster, more responsive, and more enjoyable to use. John, Padraig, Michal, and the rest of the packagemanager team have done wonders with it.

So, in summary, the performance of the pkg(5) system is continually tracked, and as resources and time have permitted, improvements were made.

And we're not done yet...
Shawn,

Thank you for taking the time to explain this.
I now feel a bit sorry for having initiated this long thread.
I do not think pkg is a bad tool, some rough edges, some missing features, certainly.
My purpose was more to help than to complain.

Bruno

PS : Whatever, I bought myself 4G of ram for my laptop today, so hopefully my pains are over (and, yes, the change feels quite incredible).

_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Reply via email to