There's not a whole lot of detail; I figure most of that will come with
implementation. Things like the new server methods and protocol tweaks are
TBD, and will make it into a future copy of this document.
Note: when I say "server", I'm not just talking about pkg.depotd, but the
back-end of the publication, even if that's in the same process as pkgsend,
pkgmerge, etc.
Performance
===========
- Publish a package at a time, rather than an action at a time. This will
require new server methods to let us upload an entire manifest, as well
as uploading a file (which will allow for publication by hash).
- Provide a mechanism by which the server can tell the client not to send
a file's content if the server already has the file.
- Allow the client to send compressed content.
- Allow the server to trust the client when they're in the same process.
- Allow a repository to have its file directory in a separate place from
the manifests and other metadata. This will allow for significant
speedups in pkgmerge, which, if repo variants are published to take
advantage of this layout, would need to publish only new manifests, and
not push any new file content.
Tentative Publication
=====================
Problem Statement
-----------------
Our publication infrastructure, as defined by the pkg(5) publication tools
and the scripts and makefiles built up around them, is designed to publish
an entire consolidation, and not selected bits of one. This worked well
during the S11 development builds, as every build generally introduced
change in almost all packages, and each consolidation's incorporation
would very simply keep a "flat" version surface for its packages -- for
any given build, the branch version (at least) would be identical for all
packages in that consolidation.
Not all consolidations would have changed to deliver in every build, in
which case, they would simply deliver nothing. The OS-wide incorporation
"entire" could be constructed dynamically to encompass the versions of the
consolidation incorporations that were known to be correct for a given
build because the small number of packages and well-parameterized content
provided a constrained, easily solvable task.
This model is inelegant when the pace of development slows, as in the case
of the Solaris 11 SRUs, or when different portions of a consolidation are
not updated in concert (userland and JDS are good examples). Only a small
handful of packages in any consolidation need to be redelivered in a given
build, so publishing all of them is a waste of time, storage, and catalog
size (which leads to longer processing times on the client).
What we want is to be able to deliver a package if and only if it has
changed, and have the incorporations constraining those packages do so at
the most recently delivered version.
Solution
--------
A naive solution is to fail publication if the server determines that
nothing has changed, and to publish the new version if something has.
Unfortunately, this runs into two problems:
- if done prior to pkgmerge (to merge, say, two architecture variants),
then if only one variant changes, pkgmerge won't work;
- if done after pkgmerge, versions in automated dependencies may be
incorrect, if they refer to packages within the consolidation which
may or may not have a new version after publication is complete.
Ideally, we discard unchanged packages as late as possible, so that as
much of the publication process is able to provide input as to whether a
package has changed. This means that if we can delay the finalization of
the automated dependency analysis, we'll have what we need.
We also need to be able to know the basis of the change -- what package
versions are we comparing the current set against.
We can provide the reference simply: as the name and version of an
incorporation, which lists the names and versions of the packages used
for comparison, and a repo URI. For any package stem which has more
than one version of a package allowed by the incorporation constraint,
choose the latest.
For simple scenarios involving no incorporations, packages can simply be
compared against the versions in a repo at a given URI; specifically,
the highest version in the repo less than the version stated in the
package's pkg.fmri attribute.
Determining the correct version of a package to use in a dependency
FMRI is more involved. Fundamentally, the part of the process that has
the necessary information is the server side; it can no longer all be
done with the client-side pkgdepend utility.
We also need to formalize the notion of publishing a set of packages as
a unit. The pkgdepend utility needs to know which packages satisfying
discovered dependencies can be recorded with known versions, and which
will need to be delayed. The server needs to know when the package set
is complete, so that it can record the now-known versions in depend
actions, or fail if not enough information has been provided.
When pkgmerge is not involved, pkgdepend can set the version of a
dependency to a token that is normally invalid ("@current" is one such
possibility, but I'd love to hear better alternatives). The server
will accept a package with such an action, but not publish it. Once it
has seen that package published and can determine whether the new
version supersedes the reference version, it can rewrite the action to
change the token to the actual version. Once all such actions have
been fixed up, it can determine whether the package with these
dependencies has changed from its reference version and publish it or
discard it.
More care will be necessary when pkgmerge is involved, since we expect
that all variants of a package should be valid (though relaxing that
constraint is also an interesting idea). We would also like (though
it, too, is not an absolute requirement) to avoid gratuitous
differences in dependencies across variants, so if a dependency changes
in one variant (due to a changed package in the same publication set)
but not another, we want to change it in both, eliminating the variant
from that dependency. We can check to see if, for any require
dependency on the same package stem where the version matches the
version from the same variant's repo, it can be considered to have been
"@current", and all versions be brought up to that level.
The consolidation incorporation, since it consists of little more than
depend actions on the current set of package versions, can simply be
published with "@current" versions, to be filled in by the server,
rather than having a more complex back-and-forth between the client and
server. The same can be done for micro-consolidations that use
optional dependencies in their component packages, such as the cluster
of vim-core, vim, and gvim.
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss