On 24 Oct 06, at 3:18 PM 24 Oct 06, Brian Topping wrote:

With the snapshots repo down, there was some discussion on IRC. Joakim mentioned there was some discussion of a resolution that was "DNS-like, not actual DNS" and it got me thinking DNS might be a better solution (possibly with RFC-2782 extensions) to resolve repositories. Apologies if this echos discussion at ApacheCon.

This solves some problems:

1) Downtime at well-known repositories such as we are seeing today could be backed by the actual repository that released the code. If a central repository goes down, the source repository that provided the original artifact would act as a fallback. DNS is distributed, so there is no central point of failure for artifact resolution.


If someone checks out your project and you specify your own repositories then your repositories will be used. The central repository provides convenience so that you don't have to specify any repositories. We know there needs to be mirrors but first steps first. The central repository is now hosted by Contegix so the central repository is not likely to go down. What happened today could be prevented by holding a time slice of snapshots on the central repository which would make things far more convenient.

I am totally open to anyone finding mirrors for the central repository. That is the first step, once we have then then there are a number of things we could do like what Apache does itself in terms of locating a mirror closest to the user so we might have a setup like ca.repo.maven.org, za.repo.maven.org ... and whomever else wants to donate some space. So the search logic could be embedded into Maven on the client side, or the Contegix machine would do redirects. The first step is finding other machines. I think the greatest pattern for ease of use is just having Maven do the right thing and find the artifacts and a replicated central repository in various regions of the world would be best IMO. Once the initial rsync is done, which can be expensive, the subsequent maintenance is manageable. We want to make the repository infrastructure robust which is why Contegix is involved.

2) Authenticity of artifacts is validated by control of DNS. The current method of getting an artifact into the central repository isn't scalable. If you know someone well enough, they put your code into the repository. If you don't know someone, your request gets put on a list of things to do. It's the way it has to work with a central repo. But if the repo could be found by DNS resolution, anyone could publish. It's up to the client to decide if a jar with <groupId>org.viruswriters</groupId> is safe to depend on, and it can be resolved without burdening central repository maintainers to decide whether to publish it since the crew at viruswriters.org could simply add their external repository to DNS. Done.

The manual process has to go away, we know that and again you can overcome this limitation by specifying your own repositories in your POMs today. What should happen in the future is that once a project is validated with a PGP key then we can take artifacts from that project in an automated way forever more. We could even just take their POMs and build the artifacts from source in a secure environment. It's simply a matter of time but Archiva is clipping along. If anyone wants talks here to make the automated submission of artifacts a reality I've got a big list for you.


3) Use of artifacts could be logged. I would like to be able to use log analyzers to know who is using my artifacts and what part of the world they are coming from. I can't do this with a central repository. If I have a sufficiently fast line, I should be able to run my own repo and collect these logs.

Since the central repository has moved over to Contegix all artifact use has been logged so we do have stats. Again if you want to write something to analyse the logs for project I'll give you access to the logs, the information is now being collected. I hope to actually serve the repository itself with Jetty and create some special handler to collect the information as artifacts are being downloaded so we have realtime stats.


Central repositories are still important, but their role would change to a fast cache of the distributed repos. Artifact suppliers would be mirrored if their artifacts were considered important enough and mirroring them would speed up builds for the masses, not because someone successfully campaigned to get their artifact in.

I think it's nice to wish that everyone is going to be able to provide a high QoS but I just don't think that's going to happen. If you want to provide users of your builds with repositories that you host, again, you can do that. But what we need is a robust central infrastructure that works for everything and that includes:

- many reliable mirrors
- an easy way to submit artifacts
- a convenient way to validate project
- an easy way to manage a project's credentials

But to start with anyone find us mirrors and we can start the process. If anyone wants to do any of these pieces then feel free to step up. I can keep you busy :-)

Jason.


WDYT?

-b



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to