On Mar 11, 2011, at 10:07 AM, Kelly O'Hair wrote:

> On Mar 11, 2011, at 2:11 AM, Steve Poole wrote:
> 
>> Kelly - can you explain for us newbies why you have separate repositories?  
>> I'm sure I can list any number of reasons but it would be good to get your 
>> view.   It may sound like a dumb question but it does help in these sort of 
>> discussions to know some of the history :-)
> 
> This is probably tainted, but I will try and provide hopefully a honest view, 
> with some humor thrown in. ;^)
> 
> Prior to Mercurial, we used the Sun product Teamware and we had separate 
> workspaces (what Teamware called
> a repository) for: control, hotspot, j2se, deploy, install, etc. (deploy and 
> install were Sun plugins & installers).
> So that set a pattern. Teamware basically managed SCCS files, so as a 
> workspace grew, it did not scale well,
> and Teamware relied on NFS access to share these files (80,000+ files, when 
> you count SCCS s.* and * files).
> So this separation initially, in my view, was done for developer productivity.
> I don't have any history on why the other workspaces existed as separate 
> workspaces, but I just assume
> it was for the same reasons as hotspot, nobody wanted to part of the big j2se 
> gorilla in the room,
> and having your own workspace created more of a separate silo for that team 
> to work in I suspect.
> The control workspace was a small batch of makefiles that built all the 
> workspaces, used by Release
> Engineering mostly.
> 
> Note that Teamware allowed for partial workspaces, since it was only managing 
> SCCS and individual
> file edits, you could trim a Teamware workspace down to just the directories 
> you were working in, and
> still sync and push with subset workspaces. This flexibility was taken 
> advantage of by the j2se team
> to minimize the NFS traffic and improve productivity too. Mercurial doesn't 
> allow for subset repositories.
> 
> The hotspot team found that their smaller 5,000 file workspace was easier to 
> deal with, and in fact
> the VM was a natural interface boundary, easy to isolate, controlled APIs, 
> pre-built VMs could be
> dropped into a JDK, testing/experiments were easy. Hotspot was also mostly 
> C++ and native code.
> Later, a "Hotspot Express" delivery model was possible so that the same 
> sources could be delivered
> to completely separate JDK releases.
> The hotspot developers were happy, well, as happy as a hotspot developer can 
> be I suppose ;^)
> (The Serviceability Agent or SA was developed by the hotspot team and was/is 
> very tightly integrated
> with hotspot, so it became part of hotspot, not the j2se).
> 
> The j2se workspace was much larger, maybe 35,000 source files, it initially 
> included all the sources from the
> corba, jaxp. jaxws, and langtools repositories that exist now.
> This j2se workspace was very hard to deal with and many of the sources were 
> copy&paste from other projects
> that weren't even managed by the JDK team, new deliveries created lost fix 
> situations and an unreliable state.
> The build process was complicated because part of the workspace had the javac 
> sources, which had to be built
> first, then that used to build the sources all over again.
> 
> So just prior to OpenJDK, or about then, we decided to try and split up the 
> j2se workspace to better manage our build
> and source importing issues. The corba, jaxp, and jaxws workspaces were 
> created and those files were pulled
> from the j2se workspace, as was the javac and "language tools" sources into a 
> langtools workspace.
> The j2se workspace was then renamed "jdk".
> 
> That gave us the workspaces: corba, jaxp, jaxws, langtools, jdk, hotspot, ...
> 
> These Teamware workspaces eventually became what you see today as the 
> openjdk7 Mercurial repositories,
> but we had to push some files down into smaller closed repositories: 
> src/closed, test/closed, and make/closed
> for jdk, and src/closed, test/closed, and build/closed for hotspot. The fact 
> that hotspot had managed sources
> in a build directory was a thorn in our sides for a while and it was 
> eventually removed along with build/closed.
> Makefile logic is pretty much 100% open right now.
> 
> I'm not sure that the open sourcing influenced this, but note that corba, 
> jaxp, jaxws, and langtools are pure
> open source, and 100% Java (except for one .c file in corba initially). 
> Managing pure open Java projects is a
> joy if you ask me. ;^)
> 
> For langtools, the team wanted this separate repository and lobbied hard for 
> it as a productivity aid and also to allow
> them to use the NetBeans IDE on just their sources (NetBeans and some IDEs 
> had a hard time swallowing the entire
> j2se sources), but they also needed to try and ship a separate javac product 
> somewhere, I forget the details.
> Maybe some work with some outside developers, Jonathan Gibbons would remember.
> I'm sure if you asked him, there is no way they would want back into a larger 
> repository.
> 
> The corba sources haven't changed much since then, makefile changes and all 
> native code has been removed.
> Originally, we wanted an ant script for a faster build and to allow for 
> NetBeans/IDE use as it became pure
> Java. That hasn't happened. We keep thinking that these sources should be 
> updated with newer Corba
> sources and use whatever build process the J2EE Corba team has. Not sure what 
> the plans are here.
> 
> The jaxp and jaxws repositories got the source drop model and the sources 
> originally managed were deleted
> in favor of source drops from these teams, where they manage the master 
> sources for these products that also
> ship in other forms in other products. This is still a work in progress in 
> terms of finding the best way to
> manage this. We need the sources (can't just get class/jar drops) so that we 
> can build classes with -target 7,
> but changes really need to go through these teams so they can be managed 
> properly.
> 
> Mercurial's changeset model and the need for "merge changesets" when two 
> changesets were created from
> the same parent changeset is another aspect to this. Many teams that changed 
> from a file based management
> system to Mercurial have encountered "merge mania", the NetBeans team ran 
> into this.
> It's an issue with too many developers trying to push changes into a single 
> large repository.
> You can't push a changeset into Mercurial unless you have done a pull and 
> sync'd up with the latest changesets
> in the repository. If there are frequent pushes going on, either from too 
> much activity or too many developers,
> someone may experience a:
>   hg push    # fails because you need to do a pull "too many heads message"
>   hg pull -u && hg merge && hg commit -m Merge    #  Or hg fetch
>   hg push   # fails because you took too long and someone else pushed a new 
> one
>   hg pull -u && hg merge && hg commit -m Merge    #  Or hg fetch
>   hg push   # fails because you took too long and someone else pushed a new 
> one
>   ...
> This is minimized by reducing the "fan in", smaller repositories, fewer 
> developers pushing into the same
> repository, etc. Our team forests minimize this, and our separate 
> repositories minimize this.
> Now some people might say this is a flaw in Mercurial, and I disagree.
> By having one "tip", and explicit merge changesets, the sources have a 
> singular state, with one simple
> changeset ID, you know the state of all 20,000 files in the jdk repository.
> 
> Mercurial handles very large repositories very well in my opinion, 
> tremendously fast when using local
> disk and not NFS file systems. So having Mercurial manage one repository of 
> 50,000 files is not an issue,
> except needing the disk space.
> 
> Hope this helps and I wasn't too long winded.

This should go up on a wiki somewhere. I'd love to point folks interested in 
the macosx-port project to it, and the exact state will change over time as 
well.

Cheers,
Mike Swingler
Java Engineering
Apple Inc.

Reply via email to