NOTE: [RT] stands for 'random thoughts' and it's a tradition of the
Cocoon community as a way to foster innovation and promote
brainstorming. Anything said in an RT can be out of line, blue sky or
even wako. That's a feature, not a bug.
----------------------
I'm glad to see Ivy join the ASF and I'm glad to see both Ant and Maven
people joining here.
As Steve mentioned already, I'm very much interested in a "gump that
doesn't suck".
In order to do that, I need a project dependency graph... with two types
of nodes: projects and versions
A project is the "concept of a project", a version is an immutable
"instance of a project" in time.
The dependency information links versions, not projects (this is where
Gump got it wrong!). This is something that pretty much all package
managers understand. For Gump, the 'project dependency' information can
be inferred from the 'version dependency' information... but having that
information allows Gump to act both as a nightly build system and as a
continuous integration system.
- o -
Another part that Gump did wrong (well not really, but didn't really
thought it would be a problem) is that if metadata is not part of the
workflow, it will lag behind.
Maven solved this problem brilliantly and Ivy follows the same
footsteps. But other systems like apt-get and ports have the same concept.
Maven does dependency management and it does project automation. I
personally like Ant much better for project automation because Ant
procedurality fits me better (take a good build.xml, copy over and tweak
it for my needs). Maven works great for standard tasks, but I rarely
have those :-) and if I do, I have an ant template I can use (if ant
implemented target inheritance, you could have predefined 'standard'
targets in ant as well, but that's another story).
But Maven dependency management (and the m2 eclipse plugin that sets up
your build path + source for you!) is sooo compelling that it forced me
to switch over.
In order to keep the project metadata in the workflow, you need tools
that make primary use of that, tools that give value.
- o -
My day job is to deal with incredibly large quantities of metadata.
I wrote an essay on my blog about the problem of the "quality of
metadata" and possible ways to solve it (it's the second link that shows
up if you query "quality of metadata", btw)
http://www.betaversion.org/~stefano/linotype/news/95/
Applying the lessons learned in metadata management and interoperability
here, there are a few things to note:
1) there are ways to associate quality with metrics, but those metrics
are very hard to define objectively (and they are normally full of
exceptions)
2) metadata gets 'polished' over time, mistakes are found and
corrected, change and feedback are *fundamental* to the stability of a
system and the convergence of metadata to higher quality standards
3) decentralization doesn't decrease your quality, it just increases
your entropy
So, following the discussions here:
1) the people who care about metadata are not necessarily the same
people that produce the software. Tools like maven and ivy force them to
care by making them impossible to build the project otherwise. This is a
very ingenious approach, but this is not the only way that metadata can
be introduced or changed. A system that is designed around the concept
of allowing projects and metadata to be edited independently has an
advantage in terms of social scalability over one that doesn't.
2) if we allow project and their metadata to be edited independently,
they need to have independent versions.
3) it is conceivable to enable 'trust metrics' that are more granular
than a repository level. So, for example, one could ask your package
manager to trust metadata that was signed with a key that you trust or
that was part of a chain of trusted keys.
- o -
I've also been involved in the design of the Avalon and then Cocoon
blocks system, which is a precursor of OSGi.
There, we identified the need to distinguish between "instance" blocks
and "interface" blocks. (this concept is similar to 'virtual packages'
in apt-get)
So, basically, it is possible for a version of a package to depend on a
version of a package interface.
cocoon 2.1.8 -(needs)-> jaxp 1.3
Then it is possible for a version of a package to implement one or more
package interfaces.
xerces-j 2.6 -(implements)-> jaxp 1.3
xerces-j 2.4 -(implements)-> jaxp 1.3
JVM 1.5 [stub] -(implements)-> jaxp 1.3
This creates a polymorphic decoupling: package "cocoon 2.1.8" can query
the repository for all packages that implement "jaxp 1.3" and select
which one to use at runtime.
Another useful feature is to be able to express the need for version ranges:
cocoon 2.1.8 -(needs)-> jaxp 1.x
- o -
Another important feature of a package manager is the lack of central
point of failure. Here maven didn't really predict its own success and
went the wrong way.
Bittorrent shows how much transparent HTTP-based mirroring systems can
scale. The problem with bittorrent is that it's very slow to start and
it works best with very big files (because the time taken to download is
on average the time the other peers participate in the swarm)
It is possible to design a system that uses bittorrent feature of using
HTTP ranges to get 'chunks' of the same file from different repositories
(it also makes it a lot hard to 'falsify' a binary package since you get
its pieces from many different repositories) but doesn't need to follow
a percolative random-graph model of peer discovery.
Think of it as a bittorrent meets DNS sort of thing: you can query a
repository with a URI that identifies a package and it will return you
the metadata for that package, along with a list of URLs that contain
that package. Then your client can decide the best strategy to download
the package.
In this view, the concept of "uploading" a file to a repository is
bogus: the metadata should tell me where to find it (for example, where
in the tar.gz binary distro, the jar I want is found).
Here, I follow the 'ports' approach where there is no need for
repackaging, but just a repository of metadata and patches that can be
applied to the *original* distribution.
Thoughts?
--
Stefano.