On Mar 11, 2011, at 10:07 AM, Kelly O'Hair wrote: > On Mar 11, 2011, at 2:11 AM, Steve Poole wrote: > >> Kelly - can you explain for us newbies why you have separate repositories? >> I'm sure I can list any number of reasons but it would be good to get your >> view. It may sound like a dumb question but it does help in these sort of >> discussions to know some of the history :-) > > This is probably tainted, but I will try and provide hopefully a honest view, > with some humor thrown in. ;^) > > Prior to Mercurial, we used the Sun product Teamware and we had separate > workspaces (what Teamware called > a repository) for: control, hotspot, j2se, deploy, install, etc. (deploy and > install were Sun plugins & installers). > So that set a pattern. Teamware basically managed SCCS files, so as a > workspace grew, it did not scale well, > and Teamware relied on NFS access to share these files (80,000+ files, when > you count SCCS s.* and * files). > So this separation initially, in my view, was done for developer productivity. > I don't have any history on why the other workspaces existed as separate > workspaces, but I just assume > it was for the same reasons as hotspot, nobody wanted to part of the big j2se > gorilla in the room, > and having your own workspace created more of a separate silo for that team > to work in I suspect. > The control workspace was a small batch of makefiles that built all the > workspaces, used by Release > Engineering mostly. > > Note that Teamware allowed for partial workspaces, since it was only managing > SCCS and individual > file edits, you could trim a Teamware workspace down to just the directories > you were working in, and > still sync and push with subset workspaces. This flexibility was taken > advantage of by the j2se team > to minimize the NFS traffic and improve productivity too. Mercurial doesn't > allow for subset repositories. > > The hotspot team found that their smaller 5,000 file workspace was easier to > deal with, and in fact > the VM was a natural interface boundary, easy to isolate, controlled APIs, > pre-built VMs could be > dropped into a JDK, testing/experiments were easy. Hotspot was also mostly > C++ and native code. > Later, a "Hotspot Express" delivery model was possible so that the same > sources could be delivered > to completely separate JDK releases. > The hotspot developers were happy, well, as happy as a hotspot developer can > be I suppose ;^) > (The Serviceability Agent or SA was developed by the hotspot team and was/is > very tightly integrated > with hotspot, so it became part of hotspot, not the j2se). > > The j2se workspace was much larger, maybe 35,000 source files, it initially > included all the sources from the > corba, jaxp. jaxws, and langtools repositories that exist now. > This j2se workspace was very hard to deal with and many of the sources were > copy&paste from other projects > that weren't even managed by the JDK team, new deliveries created lost fix > situations and an unreliable state. > The build process was complicated because part of the workspace had the javac > sources, which had to be built > first, then that used to build the sources all over again. > > So just prior to OpenJDK, or about then, we decided to try and split up the > j2se workspace to better manage our build > and source importing issues. The corba, jaxp, and jaxws workspaces were > created and those files were pulled > from the j2se workspace, as was the javac and "language tools" sources into a > langtools workspace. > The j2se workspace was then renamed "jdk". > > That gave us the workspaces: corba, jaxp, jaxws, langtools, jdk, hotspot, ... > > These Teamware workspaces eventually became what you see today as the > openjdk7 Mercurial repositories, > but we had to push some files down into smaller closed repositories: > src/closed, test/closed, and make/closed > for jdk, and src/closed, test/closed, and build/closed for hotspot. The fact > that hotspot had managed sources > in a build directory was a thorn in our sides for a while and it was > eventually removed along with build/closed. > Makefile logic is pretty much 100% open right now. > > I'm not sure that the open sourcing influenced this, but note that corba, > jaxp, jaxws, and langtools are pure > open source, and 100% Java (except for one .c file in corba initially). > Managing pure open Java projects is a > joy if you ask me. ;^) > > For langtools, the team wanted this separate repository and lobbied hard for > it as a productivity aid and also to allow > them to use the NetBeans IDE on just their sources (NetBeans and some IDEs > had a hard time swallowing the entire > j2se sources), but they also needed to try and ship a separate javac product > somewhere, I forget the details. > Maybe some work with some outside developers, Jonathan Gibbons would remember. > I'm sure if you asked him, there is no way they would want back into a larger > repository. > > The corba sources haven't changed much since then, makefile changes and all > native code has been removed. > Originally, we wanted an ant script for a faster build and to allow for > NetBeans/IDE use as it became pure > Java. That hasn't happened. We keep thinking that these sources should be > updated with newer Corba > sources and use whatever build process the J2EE Corba team has. Not sure what > the plans are here. > > The jaxp and jaxws repositories got the source drop model and the sources > originally managed were deleted > in favor of source drops from these teams, where they manage the master > sources for these products that also > ship in other forms in other products. This is still a work in progress in > terms of finding the best way to > manage this. We need the sources (can't just get class/jar drops) so that we > can build classes with -target 7, > but changes really need to go through these teams so they can be managed > properly. > > Mercurial's changeset model and the need for "merge changesets" when two > changesets were created from > the same parent changeset is another aspect to this. Many teams that changed > from a file based management > system to Mercurial have encountered "merge mania", the NetBeans team ran > into this. > It's an issue with too many developers trying to push changes into a single > large repository. > You can't push a changeset into Mercurial unless you have done a pull and > sync'd up with the latest changesets > in the repository. If there are frequent pushes going on, either from too > much activity or too many developers, > someone may experience a: > hg push # fails because you need to do a pull "too many heads message" > hg pull -u && hg merge && hg commit -m Merge # Or hg fetch > hg push # fails because you took too long and someone else pushed a new > one > hg pull -u && hg merge && hg commit -m Merge # Or hg fetch > hg push # fails because you took too long and someone else pushed a new > one > ... > This is minimized by reducing the "fan in", smaller repositories, fewer > developers pushing into the same > repository, etc. Our team forests minimize this, and our separate > repositories minimize this. > Now some people might say this is a flaw in Mercurial, and I disagree. > By having one "tip", and explicit merge changesets, the sources have a > singular state, with one simple > changeset ID, you know the state of all 20,000 files in the jdk repository. > > Mercurial handles very large repositories very well in my opinion, > tremendously fast when using local > disk and not NFS file systems. So having Mercurial manage one repository of > 50,000 files is not an issue, > except needing the disk space. > > Hope this helps and I wasn't too long winded.
This should go up on a wiki somewhere. I'd love to point folks interested in the macosx-port project to it, and the exact state will change over time as well. Cheers, Mike Swingler Java Engineering Apple Inc.