Re: Continuous TCK Testing

Jason Dillon Thu, 09 Oct 2008 09:12:53 -0700

Yup, it was manually installed on each machine ;-)

--jason



On Oct 9, 2008, at 6:43 PM, Jason Warner wrote:

My apologies. I didn't phrase my question properly. Most of thesoftware necessary was pulled down via svn, but I saw no suchbehaviour for AHP. After looking at it some more, I imagine thesoftware was just manually installed on the machine. It was kind ofa silly question to begin with, I suppose.
On Thu, Oct 9, 2008 at 4:16 AM, Jason Dillon<[EMAIL PROTECTED]> wrote:
On Oct 8, 2008, at 11:05 PM, Jason Warner wrote:
Here's a quick question.  Where does AHP come from?
http://www.anthillpro.com

(ever heard of google :-P)

--jason
On Mon, Oct 6, 2008 at 1:18 PM, Jason Dillon<[EMAIL PROTECTED]> wrote:
Sure np, took me a while to get around to writing it too ;-)

--jason


On Oct 6, 2008, at 10:24 PM, Jason Warner wrote:
Just got around to reading this. Thanks for the brain dump,Jason. No questions as of yet, but I'm sure I'll need a few morereads before I understand it all.
On Thu, Oct 2, 2008 at 2:34 PM, Jason Dillon<[EMAIL PROTECTED]> wrote:
On Oct 1, 2008, at 11:20 PM, Jason Warner wrote:
Is the GBuild stuff in svn the same as the anthill-based code oris that something different? GBuild seems to have scripts forrunning tck and that leads me to think they're the same thing, butI see no mention of anthill in the code.
The Anthill stuff is completely different than the GBuild stuff.I started out trying to get the TCK automated using GBuild, butdecided that the system lacked too many features to perform as Idesired, and went ahead with Anthill as it did pretty mucheverything, though had some stability problems.
One of the main reasons why I choose Anthill (AHP, Anthill Prothat is) was its build agent and code repository systems. Thisallowed me to ensure that each build used exactly the desiredartifacts. Another was the configurable workflow, which allowedme to create a custom chain of events to handle running builds onremote agents and control what data gets set to them, what it willcollect and what logic to execute once all distributed work hasbeen completed for a particular build. And the kicker which helpfacilitate bringing it all together was its concept of a build life.
At the time I could find *no other* build tool which could meetall of these needs, and so I went with AHP instead of spendingmonths building/testing features in GBuild.
While AHP supports configuring a lot of stuff via its web-interface, I found that it was very cumbersome, so I opted towrite some glue, which was stored in svn here:
   https://svn.apache.org/viewvc/geronimo/sandbox/build-support/?pathrev=632245
Its been a while, so I have to refresh my memory on how this stuffactually worked. First let me explain about the code repository(what it calls codestation) and why it was critical to the TCKtesting IMO. When we use Maven normally, it pulls data from a setof external repositories, picks up more repositories from thestuff it downloads and quickly we loose control where stuff comesfrom. After it pulls down all that stuff, it churns though abuild and spits out the stuff we care about, normally stuffingthem (via mvn install) into the local repository.
AHP supports by default tasks to publish artifacts (really just aset of files controlled by an Ant-like include/exclude path) froma build agent into Codestation, as well as tasks to resolveartifacts (ie. download them from Codestation to the local workingdirectory on the build agents system). Each top-level build inAHP gets assigned a new (empty) build life. Artifacts are alwayspublished to/resolved from a build life, either that of thecurrent build, or of a dependency build.
So what I did was I setup builds for Geronimo Server (the normalserver/trunk stuff), which did the normal mvn install thingy, butI always gave it a custom -Dmaven.local.repository which resolvedto something inside the working directory for the running build.The build was still online, so it pulled down a bunch of stuffinto an empty local repository (so it was a clean build wrt therepository, as well as the source code, which was always fetchedfor each new build). Once the build had finished, I used theartifact publisher task to push *all* of the stuff in the localrepository into Codestation, labled as something like "Mavenrepository artifacts" for the current build life.
Then I setup another build for Apache Geronimo CTS Server (theporting/branches/* stuff). This build was dependent upon the"Maven repository artifacts" of the Geronimo Server build, and Iconfigured those artifacts to get installed on the build agentssystem in the same directory that I configured the CTS Serverbuild to use for its local maven repository. So again the repostarted out empty, then got populated with all of the outputs fromthe normal G build, and then the cts-server build was started.The build of the components and assemblies is normally fairlyquick and aside from some stuff in the private tck repo won'tdownload muck more stuff, because it already had most of itsdependencies installed via the Codestation dependencyresolution. Once the build finished, I published to cts-serverassembly artifacts back to Codestation under like "CTS ServerAssemblies" or something.
Up until this point its normal builds, but now we have built the Gserver, then built the CTS server (using the *exact* artifactsfrom the G server build, even though each might have happened on adifferent build agent). And now we need to go and run a bunch oftests, using the *exact* CTS server assemblies, produce someoutput, collect it, and once all of the tests are done render somenice reports, etc.
AHP supports setting up builds which contain "parallel" tasks,each of those tasks is then performed by a build agent, they havefancy build agent selection stuff, but for my needs I hadbasically 2 groups, one group for running the server builds, andthen another for running the tests. I only set aside like 2agents for builds and the rest for tests. Oh, I forgot to mentionthat I had 2 16x 16g AMD beasts all running CentOS 5, each withabout 10-12 Xen virtual machines running internally to run buildagents. Each system also had a RAID-0 array setup over 4 disks tohelp reduce disk io wait, which was as I found out the limitingfactor when trying to run a ton of builds that all checkout anddownload artifacts and such.
I helped the AHP team add a new feature which was an paralleliterator task, so you define *one* task that internally fires offn parallel tasks, which would set the iteration number, and leaveit up to the build logic to pick what to do based on that index.The alternative was a unwieldy set of like 200 tasks in their UIwhich simply didn't work at all. You might have notice an"iterations.xml" file in the tck-testsuite directory, this was waswas used to take an iteration number and turn it into what testswe actually run. The <iteration> bits are order sensitive in thatfile.
Soooo, after we have a CTS Server for a particular G Server build,we can no go an do "runtests" for a specific set of tests (definedby an iteration)... this differed from the other builds above alittle, but still pulled down artifacts, the CTS Server assemblies(only the assemblies and the required bits to run the geronimo-maven-plugin, which was used to geronimo:install, as well as usedby the tck itself to fire up the server and so on). The key thinghere, with regards to the maven configuration (besides using thatcustom Codestation populated repository) was that the builds wererun *offline*.
After runtests completed, the results are then soaked up (thestuff that javatest pukes out with icky details, as well as thefull log files and other stuff I can recall) and then pushed backinto Codestation.
Once all of the iterations were finished, another task fires offwhich generates a report. It does this by downloading fromCodestation all of the runtests outputs (each was zipped I think),unzips them one by one, run some custom goo I wrote (based some ofthe concepts from original stuff from the GBuild-based TCKautomation), and generates a nice Javadoc-like report thatincludes all of the gory details.
I can't remember how long I spent working on this... too long (notthe reports I mean, the whole system). But in the end I recallsomething like running an entire TCK testsuite for a single serverconfiguration (like jetty) in about 4-6 hours... I sent mail tothe list with the results, so if you are curious what the realnumber is, instead of my guess, you can look for it there. Butanyway it was damn quick running on just those 2 machines. And I*knew* exactly that each of the distributed tests was actuallytesting a known build that I could trace back to its artifacts andthen back to its SVN revision, without worrying about mvndownloading something new when midnight rolled over or that a newG server or CTS server build that might be in progress hasn'tcompromised the testing by polluting the local repository.
 * * *

So, about the sandbox/build-support stuff...
First there is the 'harness' project, which is rather small, butcontains the basic stuff, like a version of ant and maven whichall of these builds would use, some other internal glue, a fixfor an evil Maven problem causing erroneous build failures due tosome internal thread state corruption or gremlins, not surewhich. I kinda used this project to help manage the softwareneeded by normal builds, which is why Ant and Maven were inthere... ie. so I didn't have to go install it on each agent eachtime it changed, just let the AHP system deal with it for me.
This was setup as a normal AHP project, built using its internalAnt builder (though having that builder configured still to usethe local version it pulled from SVN to ensure it always works.
Each other build was setup to depend on the output artifacts fromthe build harness build, using the latest in a range, like sayusing "3.*" for the latest 3.x build (which looks like that was3.7). This let me work on new stuff w/o breaking the currentbuilds as I hacked things up.
So, in addition to all of the stuff I mentioned above wrt the Gand CTS builds, each also had this step which resolved the buildharness artifacts to that working directory, and the Maven buildswere always run via the version of Maven included from theharness. But, AHP didn't actually run that version of Mavendirectly, it used its internal Ant task to execute the version ofAnt from the harness *and* use the harness.xml buildfile.
The harness.xml stuff is some more goo which I wrote to help mangeAHP configurations. With AHP (at that time, not sure if it haschanged) you had to do most everything via the web UI, whichsucked, and it was hard to refactor sets of projects and so on.So I came up with a standard set of tasks to execute for aproject, then put all of the custom muck I needed into what Icalled a _library_ and then had the AHP via harness.xml invoke itwith some configuration about what project it was and other builddetails.
The actual harness.xml is not very big, it simply makes sure that*/bin/* is executable (codestation couldn't preserve executebits), uses the Codestation command-line client (invoking thejavaclass directly though) to ask the repository to resolveartifacts from the "Build Library" to the local repository. I hadthis artifact resolution separate from the normal dependency (orharness) artifact resolution so that it was easier for me to fixproblems with the library while a huge set of TCK iterations werestill queued up to run. Basically, if I noticed a problem due toa code or configuration issue in an early build, I could fix it,and use the existing builds to verify the fix, instead of wastingan hour (sometimes more depending on networking problems accessingremote repos while building the servers) to rebuild and start over.
This brings us to the 'libraries' project. In general the idea ofa _library_ was just a named/versioned collection of files, whereyou could be used by a project. The main (er only) librarydefined in this SVN is system/. This is the groovy glue whichmade everything work. This is where the entry-point class islocated (the guy who gets invoked via harness.xml via:
   <target name="harness" depends="init">
       <groovy>
           <classpath>
               <pathelement location="${library.basedir}/groovy"/>
           </classpath>

           gbuild.system.BuildHarness.bootstrap(this)
       </groovy>
   </target>
I won't go into too much detail on this stuff now, take a look atit and ask questions. But, basically there is stuff ingbuild.system.* which is harness support muck, and stuff ingbuild.config.* which contains configuration. I was kinda mid-refactoring of some things, starting to add new features, not surewhere I left off actually. But the key bits are ingbuild.config.project.* This contains a package for each project,with the package name being the same as the AHP project (with " " -> "_"). And then in each of those package is at least aController.groovy class (or other classes if special muck wasneeded, like for the report generation in Geronimo_CTS, etc).
The controller defines a set of actions, implemented as Groovyclosures bound to properties of the Controller class. One of theproperties passed in from the AHP configuration (configured viathe Web UI, passed to the harness.xml build, and then on to theGroovy harness) was the name of the _action_ to execute. Most ofthat stuff should be fairly straightforward.
So after a build is started (maybe from a Web UI click, or SVNchange detection, or a TCK runtests iteration) the followinghappens (in simplified terms):
 * Agent starts build
 * Agent cleans its working directory
 * Agent downloads the build harness
 * Agent downloads any dependencies
 * Agent invoke Ant on harness.xml passing in some details
 * Harness.xml downloads the system/1 library
 * Harness.xml runs gbuild.system.BuildHarness
* BuildHarness tries to construct a Controller instance for theproject
 * BuildHarness tries to find Controller action to execute
 * BuildHarness executes the Controller action
 * Agent publishes output artifacts
 * Agent completes build
A few extra notes on libraries, the JavaEE TCK requires a bunch ofstuff we get from Sun to execute. This stuff isn't small, but isfor the most part read-only. So I setup a location on each buildagent where these files were installed to. I created AHP projectsto manage them and treated them like a special "library" one whichtried really hard not to go fetch its content unless the localcontent was out of date. This helped speed up the entire buildprocess... cause that delete/download of all that muck reallyslows down 20 agents running in parallel on 2 big machines withstripped array. For legal reasons this stuff was not kept insvn.apache.org's main repository, and for logistical reasonswasn't kept in the private tck repo on svn.apache.org either.Because there were so many files, and be case the httpdconfiguration on svn.apache.org kicks out requests that it thinksare *bunk* to help save the resources for the community, I hadsetup a private ssl secured private svn repository on the oldgbuild.org machines to put in the full muck required, then setupsome goo in the harness to resolve them. This goo is all ingbuild.system.library.* See thegbuild.config.projects.Geronimo_CTS.Controller for more of how itwas actually used.
 * * *
Okay, that is about all the brain-dump for TCK muck I have in mefor tonight. Reply with questions if you have any.
Cheers,

--jason





--
~Jason Warner
--
~Jason Warner
--
~Jason Warner

Re: Continuous TCK Testing

Reply via email to