Dmitry Samersoff (dmitry.samers...@oracle.com) wrote: > Kelly, > > > The serialize checkins issue can be minimized some by using > > distributed SCMs (Mercurial, Git, etc) > > We have chosen a model: > > build->test->integrate > > but we may consider different approach: > > integrate->build->test->[backout if necessary]
In that model, you can never rely on the repository having any degree of stability. It may not even build at a given moment. > Developer (A) integrate his changeset to an integration workspace > Bot takes snapshot and start building/testing > Developer (B) integrate his changeset to an integration workspace > Bot takes snapshot and start building/testing > > if Job A failed, bot lock integration ws, restore it to pre-A state, > apply B-patch. unlock ws. Don't forget the trusting souls that pulled from the integration repo after A inflicted the breakage: they each waste time cleaning up a copy of A's mess. -John > On 2012-01-29 23:52, Kelly O'Hair wrote: > > > > On Jan 29, 2012, at 10:23 AM, Georges Saab wrote: > > > >>> > >>> I'm missing something. How can everybody using the exact same system > >>> scale to 100's of developers? > >> > >> System = distributed build and test of OpenJDK > > > > Ah ha... I'm down in the trenches dealing with dozens of different > > OS's arch's variation machines. > > You are speaking to a higher level, I need to crawl out of the basement. > > > >> > >> Developers send in jobs > >> Jobs are distribute across a pool of (HW/OS) resources > >> The resources may be divided into pools dedicated to different tasks > >> (RE/checkin/perf/stress) > >> The pools are populated initially according to predictions of load and > >> then increased/rebalanced according to data on actual usage > >> No assumptions made about what exists on the machine other than HW/OS > >> The build and test tasks are self sufficient, i.e. bootstrap themselves > >> The bootstrapping is done in the same way for different build and test > >> tasks > > > > Understood. We have talked about this before. I have also been on the > > search for the Holy Grail. ;^) > > This is why I keep working on JPRT. > > > >> > >> The only scaling aspect that seems at all challenging is that the > >> current checkin system is designed to serialize checkins in a way that > >> apparently does not scale -- here there are some decisions to be made > >> and tradeoffs but this is nothing new in the world of Open community > >> development (or any large team development for that matter) > > > > The serialize checkins issue can be minimized some by using distributed > > SCMs (Mercurial, Git, etc) > > and using separate forests (fewer developers per source repository means > > fewer merge/sync issues) > > and having an integrator merge into a master. This has proven to work in > > many situations but it > > also creates delivery to master delays, especially if the integration > > process is too heavyweight. > > > > The JDK projects has been doing this for a long time, I'm sure many > > people have opinions as to how > > successful it is or isn't. > > > > It is my opinion that merges/syncs are some of the most dangerous things > > you can do to a source base, > > and anything we can do to avoid them is usually goodness, I don't think > > you should scale this without some > > very great care. > > > >> > >>> > >>> And that one system will naturally change over time too, so unless > >>> you are able to prevent all change > >>> to a system (impossible with security updates etc) every use of that > >>> 'same system' will be different. > >> > >> Yes, but it is possible to control this update and have a staging > >> environment so you know that a HW/OS update will not break the > >> existing successful build when rolled out to the build/test farm. > > > > Possible but not always easy. The auto updating of everything has > > increased significantly over the years, > > making it harder to control completely. > > > > I've been doing this build&test stuff long enough to never expect > > anything to be 100% reliable. > > Hardware fails, software updates regress functionality, networks become > > unreliable, humans trip over > > power cords, virus scanners break things, etc. It just happens, and > > often, it's not very predictable or reproducible. > > You can do lots of things to minimize issues, but at some point you just > > have to accept a few risks because > > the alternative just isn't feasible or just can't happen with the > > resources we have. > > > > -kto > > > > > > > -- > Dmitry Samersoff > Java Hotspot development team, SPB04 > * There will come soft rains ...