On 11/03/11 18:07, Kelly O'Hair wrote:
On Mar 11, 2011, at 2:11 AM, Steve Poole wrote:

Kelly - can you explain for us newbies why you have separate repositories?  I'm 
sure I can list any number of reasons but it would be good to get your view.   
It may sound like a dumb question but it does help in these sort of discussions 
to know some of the history :-)


This is probably tainted, but I will try and provide hopefully a honest view, 
with some humor thrown in. ;^)

Prior to Mercurial, we used the Sun product Teamware and we had separate 
workspaces (what Teamware called
a repository) for: control, hotspot, j2se, deploy, install, etc. (deploy and 
install were Sun plugins&  installers).
So that set a pattern. Teamware basically managed SCCS files, so as a workspace 
grew, it did not scale well,
and Teamware relied on NFS access to share these files (80,000+ files, when you 
count SCCS s.* and * files).
So this separation initially, in my view, was done for developer productivity.
I don't have any history on why the other workspaces existed as separate 
workspaces, but I just assume
it was for the same reasons as hotspot, nobody wanted to part of the big j2se 
gorilla in the room,
and having your own workspace created more of a separate silo for that team to 
work in I suspect.
The control workspace was a small batch of makefiles that built all the 
workspaces, used by Release
Engineering mostly.

Note that Teamware allowed for partial workspaces, since it was only managing 
SCCS and individual
file edits, you could trim a Teamware workspace down to just the directories 
you were working in, and
still sync and push with subset workspaces. This flexibility was taken 
advantage of by the j2se team
to minimize the NFS traffic and improve productivity too. Mercurial doesn't 
allow for subset repositories.

The hotspot team found that their smaller 5,000 file workspace was easier to 
deal with, and in fact
the VM was a natural interface boundary, easy to isolate, controlled APIs, 
pre-built VMs could be
dropped into a JDK, testing/experiments were easy. Hotspot was also mostly C++ 
and native code.
Later, a "Hotspot Express" delivery model was possible so that the same sources 
could be delivered
to completely separate JDK releases.
The hotspot developers were happy, well, as happy as a hotspot developer can be 
I suppose ;^)
(The Serviceability Agent or SA was developed by the hotspot team and was/is 
very tightly integrated
with hotspot, so it became part of hotspot, not the j2se).

The j2se workspace was much larger, maybe 35,000 source files, it initially 
included all the sources from the
corba, jaxp. jaxws, and langtools repositories that exist now.
This j2se workspace was very hard to deal with and many of the sources were 
copy&paste from other projects
that weren't even managed by the JDK team, new deliveries created lost fix 
situations and an unreliable state.
The build process was complicated because part of the workspace had the javac 
sources, which had to be built
first, then that used to build the sources all over again.

So just prior to OpenJDK, or about then, we decided to try and split up the 
j2se workspace to better manage our build
and source importing issues. The corba, jaxp, and jaxws workspaces were created 
and those files were pulled
from the j2se workspace, as was the javac and "language tools" sources into a 
langtools workspace.
The j2se workspace was then renamed "jdk".

That gave us the workspaces: corba, jaxp, jaxws, langtools, jdk, hotspot, ...

These Teamware workspaces eventually became what you see today as the openjdk7 
Mercurial repositories,
but we had to push some files down into smaller closed repositories: 
src/closed, test/closed, and make/closed
for jdk, and src/closed, test/closed, and build/closed for hotspot. The fact 
that hotspot had managed sources
in a build directory was a thorn in our sides for a while and it was eventually 
removed along with build/closed.
Makefile logic is pretty much 100% open right now.

I'm not sure that the open sourcing influenced this, but note that corba, jaxp, 
jaxws, and langtools are pure
open source, and 100% Java (except for one .c file in corba initially). 
Managing pure open Java projects is a
joy if you ask me. ;^)

For langtools, the team wanted this separate repository and lobbied hard for it 
as a productivity aid and also to allow
them to use the NetBeans IDE on just their sources (NetBeans and some IDEs had 
a hard time swallowing the entire
j2se sources), but they also needed to try and ship a separate javac product 
somewhere, I forget the details.
Maybe some work with some outside developers, Jonathan Gibbons would remember.
I'm sure if you asked him, there is no way they would want back into a larger 
repository.

The corba sources haven't changed much since then, makefile changes and all 
native code has been removed.
Originally, we wanted an ant script for a faster build and to allow for 
NetBeans/IDE use as it became pure
Java. That hasn't happened. We keep thinking that these sources should be 
updated with newer Corba
sources and use whatever build process the J2EE Corba team has. Not sure what 
the plans are here.

The jaxp and jaxws repositories got the source drop model and the sources 
originally managed were deleted
in favor of source drops from these teams, where they manage the master sources 
for these products that also
ship in other forms in other products. This is still a work in progress in 
terms of finding the best way to
manage this. We need the sources (can't just get class/jar drops) so that we 
can build classes with -target 7,
but changes really need to go through these teams so they can be managed 
properly.

Mercurial's changeset model and the need for "merge changesets" when two 
changesets were created from
the same parent changeset is another aspect to this. Many teams that changed 
from a file based management
system to Mercurial have encountered "merge mania", the NetBeans team ran into 
this.
It's an issue with too many developers trying to push changes into a single 
large repository.
You can't push a changeset into Mercurial unless you have done a pull and 
sync'd up with the latest changesets
in the repository. If there are frequent pushes going on, either from too much 
activity or too many developers,
someone may experience a:
    hg push    # fails because you need to do a pull "too many heads message"
    hg pull -u&&  hg merge&&  hg commit -m Merge    #  Or hg fetch
    hg push   # fails because you took too long and someone else pushed a new 
one
    hg pull -u&&  hg merge&&  hg commit -m Merge    #  Or hg fetch
    hg push   # fails because you took too long and someone else pushed a new 
one
    ...
Hadn't thought about that situation - makes perfect sense though :-)
This is minimized by reducing the "fan in", smaller repositories, fewer 
developers pushing into the same
repository, etc. Our team forests minimize this, and our separate repositories 
minimize this.
Now some people might say this is a flaw in Mercurial, and I disagree.
By having one "tip", and explicit merge changesets, the sources have a singular 
state, with one simple
changeset ID, you know the state of all 20,000 files in the jdk repository.

Mercurial handles very large repositories very well in my opinion, tremendously 
fast when using local
disk and not NFS file systems. So having Mercurial manage one repository of 
50,000 files is not an issue,
except needing the disk space.

Hope this helps and I wasn't too long winded.

This is really great Kelly. Thank you for taking the time to write it down. I do vaguely recall Teamware. Neil and I worked in Cupertino for a short time a long time ago (cue music and star destroyers) where
we were introduced to Teamware.  What can I say - I like Mercurial :-)

Primarily then the objective of having multiple repositories is to improve developer productivity and component stability.

I don't know how much cross-repo changes go on. It would seem that if that is minimal then the next logical step would be to remove the source for the repos you're not working on and just have the binaries instead. Do you see that as a worthwhile goal?



-kto





Reply via email to