Yup, it was manually installed on each machine ;-)
--jason
On Oct 9, 2008, at 6:43 PM, Jason Warner wrote:
My apologies. I didn't phrase my question properly. Most of the
software necessary was pulled down via svn, but I saw no such
behaviour for AHP. After looking at it some more, I imagine the
software was just manually installed on the machine. It was kind of
a silly question to begin with, I suppose.
On Thu, Oct 9, 2008 at 4:16 AM, Jason Dillon
<[EMAIL PROTECTED]> wrote:
On Oct 8, 2008, at 11:05 PM, Jason Warner wrote:
Here's a quick question. Where does AHP come from?
http://www.anthillpro.com
(ever heard of google :-P)
--jason
On Mon, Oct 6, 2008 at 1:18 PM, Jason Dillon
<[EMAIL PROTECTED]> wrote:
Sure np, took me a while to get around to writing it too ;-)
--jason
On Oct 6, 2008, at 10:24 PM, Jason Warner wrote:
Just got around to reading this. Thanks for the brain dump,
Jason. No questions as of yet, but I'm sure I'll need a few more
reads before I understand it all.
On Thu, Oct 2, 2008 at 2:34 PM, Jason Dillon
<[EMAIL PROTECTED]> wrote:
On Oct 1, 2008, at 11:20 PM, Jason Warner wrote:
Is the GBuild stuff in svn the same as the anthill-based code or
is that something different? GBuild seems to have scripts for
running tck and that leads me to think they're the same thing, but
I see no mention of anthill in the code.
The Anthill stuff is completely different than the GBuild stuff.
I started out trying to get the TCK automated using GBuild, but
decided that the system lacked too many features to perform as I
desired, and went ahead with Anthill as it did pretty much
everything, though had some stability problems.
One of the main reasons why I choose Anthill (AHP, Anthill Pro
that is) was its build agent and code repository systems. This
allowed me to ensure that each build used exactly the desired
artifacts. Another was the configurable workflow, which allowed
me to create a custom chain of events to handle running builds on
remote agents and control what data gets set to them, what it will
collect and what logic to execute once all distributed work has
been completed for a particular build. And the kicker which help
facilitate bringing it all together was its concept of a build life.
At the time I could find *no other* build tool which could meet
all of these needs, and so I went with AHP instead of spending
months building/testing features in GBuild.
While AHP supports configuring a lot of stuff via its web-
interface, I found that it was very cumbersome, so I opted to
write some glue, which was stored in svn here:
https://svn.apache.org/viewvc/geronimo/sandbox/build-support/?pathrev=632245
Its been a while, so I have to refresh my memory on how this stuff
actually worked. First let me explain about the code repository
(what it calls codestation) and why it was critical to the TCK
testing IMO. When we use Maven normally, it pulls data from a set
of external repositories, picks up more repositories from the
stuff it downloads and quickly we loose control where stuff comes
from. After it pulls down all that stuff, it churns though a
build and spits out the stuff we care about, normally stuffing
them (via mvn install) into the local repository.
AHP supports by default tasks to publish artifacts (really just a
set of files controlled by an Ant-like include/exclude path) from
a build agent into Codestation, as well as tasks to resolve
artifacts (ie. download them from Codestation to the local working
directory on the build agents system). Each top-level build in
AHP gets assigned a new (empty) build life. Artifacts are always
published to/resolved from a build life, either that of the
current build, or of a dependency build.
So what I did was I setup builds for Geronimo Server (the normal
server/trunk stuff), which did the normal mvn install thingy, but
I always gave it a custom -Dmaven.local.repository which resolved
to something inside the working directory for the running build.
The build was still online, so it pulled down a bunch of stuff
into an empty local repository (so it was a clean build wrt the
repository, as well as the source code, which was always fetched
for each new build). Once the build had finished, I used the
artifact publisher task to push *all* of the stuff in the local
repository into Codestation, labled as something like "Maven
repository artifacts" for the current build life.
Then I setup another build for Apache Geronimo CTS Server (the
porting/branches/* stuff). This build was dependent upon the
"Maven repository artifacts" of the Geronimo Server build, and I
configured those artifacts to get installed on the build agents
system in the same directory that I configured the CTS Server
build to use for its local maven repository. So again the repo
started out empty, then got populated with all of the outputs from
the normal G build, and then the cts-server build was started.
The build of the components and assemblies is normally fairly
quick and aside from some stuff in the private tck repo won't
download muck more stuff, because it already had most of its
dependencies installed via the Codestation dependency
resolution. Once the build finished, I published to cts-server
assembly artifacts back to Codestation under like "CTS Server
Assemblies" or something.
Up until this point its normal builds, but now we have built the G
server, then built the CTS server (using the *exact* artifacts
from the G server build, even though each might have happened on a
different build agent). And now we need to go and run a bunch of
tests, using the *exact* CTS server assemblies, produce some
output, collect it, and once all of the tests are done render some
nice reports, etc.
AHP supports setting up builds which contain "parallel" tasks,
each of those tasks is then performed by a build agent, they have
fancy build agent selection stuff, but for my needs I had
basically 2 groups, one group for running the server builds, and
then another for running the tests. I only set aside like 2
agents for builds and the rest for tests. Oh, I forgot to mention
that I had 2 16x 16g AMD beasts all running CentOS 5, each with
about 10-12 Xen virtual machines running internally to run build
agents. Each system also had a RAID-0 array setup over 4 disks to
help reduce disk io wait, which was as I found out the limiting
factor when trying to run a ton of builds that all checkout and
download artifacts and such.
I helped the AHP team add a new feature which was an parallel
iterator task, so you define *one* task that internally fires off
n parallel tasks, which would set the iteration number, and leave
it up to the build logic to pick what to do based on that index.
The alternative was a unwieldy set of like 200 tasks in their UI
which simply didn't work at all. You might have notice an
"iterations.xml" file in the tck-testsuite directory, this was was
was used to take an iteration number and turn it into what tests
we actually run. The <iteration> bits are order sensitive in that
file.
Soooo, after we have a CTS Server for a particular G Server build,
we can no go an do "runtests" for a specific set of tests (defined
by an iteration)... this differed from the other builds above a
little, but still pulled down artifacts, the CTS Server assemblies
(only the assemblies and the required bits to run the geronimo-
maven-plugin, which was used to geronimo:install, as well as used
by the tck itself to fire up the server and so on). The key thing
here, with regards to the maven configuration (besides using that
custom Codestation populated repository) was that the builds were
run *offline*.
After runtests completed, the results are then soaked up (the
stuff that javatest pukes out with icky details, as well as the
full log files and other stuff I can recall) and then pushed back
into Codestation.
Once all of the iterations were finished, another task fires off
which generates a report. It does this by downloading from
Codestation all of the runtests outputs (each was zipped I think),
unzips them one by one, run some custom goo I wrote (based some of
the concepts from original stuff from the GBuild-based TCK
automation), and generates a nice Javadoc-like report that
includes all of the gory details.
I can't remember how long I spent working on this... too long (not
the reports I mean, the whole system). But in the end I recall
something like running an entire TCK testsuite for a single server
configuration (like jetty) in about 4-6 hours... I sent mail to
the list with the results, so if you are curious what the real
number is, instead of my guess, you can look for it there. But
anyway it was damn quick running on just those 2 machines. And I
*knew* exactly that each of the distributed tests was actually
testing a known build that I could trace back to its artifacts and
then back to its SVN revision, without worrying about mvn
downloading something new when midnight rolled over or that a new
G server or CTS server build that might be in progress hasn't
compromised the testing by polluting the local repository.
* * *
So, about the sandbox/build-support stuff...
First there is the 'harness' project, which is rather small, but
contains the basic stuff, like a version of ant and maven which
all of these builds would use, some other internal glue, a fix
for an evil Maven problem causing erroneous build failures due to
some internal thread state corruption or gremlins, not sure
which. I kinda used this project to help manage the software
needed by normal builds, which is why Ant and Maven were in
there... ie. so I didn't have to go install it on each agent each
time it changed, just let the AHP system deal with it for me.
This was setup as a normal AHP project, built using its internal
Ant builder (though having that builder configured still to use
the local version it pulled from SVN to ensure it always works.
Each other build was setup to depend on the output artifacts from
the build harness build, using the latest in a range, like say
using "3.*" for the latest 3.x build (which looks like that was
3.7). This let me work on new stuff w/o breaking the current
builds as I hacked things up.
So, in addition to all of the stuff I mentioned above wrt the G
and CTS builds, each also had this step which resolved the build
harness artifacts to that working directory, and the Maven builds
were always run via the version of Maven included from the
harness. But, AHP didn't actually run that version of Maven
directly, it used its internal Ant task to execute the version of
Ant from the harness *and* use the harness.xml buildfile.
The harness.xml stuff is some more goo which I wrote to help mange
AHP configurations. With AHP (at that time, not sure if it has
changed) you had to do most everything via the web UI, which
sucked, and it was hard to refactor sets of projects and so on.
So I came up with a standard set of tasks to execute for a
project, then put all of the custom muck I needed into what I
called a _library_ and then had the AHP via harness.xml invoke it
with some configuration about what project it was and other build
details.
The actual harness.xml is not very big, it simply makes sure that
*/bin/* is executable (codestation couldn't preserve execute
bits), uses the Codestation command-line client (invoking the
javaclass directly though) to ask the repository to resolve
artifacts from the "Build Library" to the local repository. I had
this artifact resolution separate from the normal dependency (or
harness) artifact resolution so that it was easier for me to fix
problems with the library while a huge set of TCK iterations were
still queued up to run. Basically, if I noticed a problem due to
a code or configuration issue in an early build, I could fix it,
and use the existing builds to verify the fix, instead of wasting
an hour (sometimes more depending on networking problems accessing
remote repos while building the servers) to rebuild and start over.
This brings us to the 'libraries' project. In general the idea of
a _library_ was just a named/versioned collection of files, where
you could be used by a project. The main (er only) library
defined in this SVN is system/. This is the groovy glue which
made everything work. This is where the entry-point class is
located (the guy who gets invoked via harness.xml via:
<target name="harness" depends="init">
<groovy>
<classpath>
<pathelement location="${library.basedir}/groovy"/>
</classpath>
gbuild.system.BuildHarness.bootstrap(this)
</groovy>
</target>
I won't go into too much detail on this stuff now, take a look at
it and ask questions. But, basically there is stuff in
gbuild.system.* which is harness support muck, and stuff in
gbuild.config.* which contains configuration. I was kinda mid-
refactoring of some things, starting to add new features, not sure
where I left off actually. But the key bits are in
gbuild.config.project.* This contains a package for each project,
with the package name being the same as the AHP project (with " " -
> "_"). And then in each of those package is at least a
Controller.groovy class (or other classes if special muck was
needed, like for the report generation in Geronimo_CTS, etc).
The controller defines a set of actions, implemented as Groovy
closures bound to properties of the Controller class. One of the
properties passed in from the AHP configuration (configured via
the Web UI, passed to the harness.xml build, and then on to the
Groovy harness) was the name of the _action_ to execute. Most of
that stuff should be fairly straightforward.
So after a build is started (maybe from a Web UI click, or SVN
change detection, or a TCK runtests iteration) the following
happens (in simplified terms):
* Agent starts build
* Agent cleans its working directory
* Agent downloads the build harness
* Agent downloads any dependencies
* Agent invoke Ant on harness.xml passing in some details
* Harness.xml downloads the system/1 library
* Harness.xml runs gbuild.system.BuildHarness
* BuildHarness tries to construct a Controller instance for the
project
* BuildHarness tries to find Controller action to execute
* BuildHarness executes the Controller action
* Agent publishes output artifacts
* Agent completes build
A few extra notes on libraries, the JavaEE TCK requires a bunch of
stuff we get from Sun to execute. This stuff isn't small, but is
for the most part read-only. So I setup a location on each build
agent where these files were installed to. I created AHP projects
to manage them and treated them like a special "library" one which
tried really hard not to go fetch its content unless the local
content was out of date. This helped speed up the entire build
process... cause that delete/download of all that muck really
slows down 20 agents running in parallel on 2 big machines with
stripped array. For legal reasons this stuff was not kept in
svn.apache.org's main repository, and for logistical reasons
wasn't kept in the private tck repo on svn.apache.org either.
Because there were so many files, and be case the httpd
configuration on svn.apache.org kicks out requests that it thinks
are *bunk* to help save the resources for the community, I had
setup a private ssl secured private svn repository on the old
gbuild.org machines to put in the full muck required, then setup
some goo in the harness to resolve them. This goo is all in
gbuild.system.library.* See the
gbuild.config.projects.Geronimo_CTS.Controller for more of how it
was actually used.
* * *
Okay, that is about all the brain-dump for TCK muck I have in me
for tonight. Reply with questions if you have any.
Cheers,
--jason
--
~Jason Warner
--
~Jason Warner
--
~Jason Warner