Re: [RT] Gump 3.0 - Database Model

2005-05-27 Thread Leo Simons
On 26-05-2005 17:58, Adam R. B. Jack [EMAIL PROTECTED] wrote:
 Could we get back to this thread above (using  http://tinyurl.com/4qt9a to
 get to the attachment) and see where we want to take it?

Yes we can!

It probably still needs a lot of work.

One thing that's wrong with it at the moment is that for example
project_dependencies don't have ids, and they should (because there's
additional information about a dependency, like it being optional, or it
being part of the root cause trail for a failure).

 I see that Gump3
 has a schema that does not include some of the additions mentioned in the
 thread.

If you're referring to branches/Gump3/gumpdb and its contents, that's I
believe more evolved than that PDF (last change december 28 vs december 08).

 Also, I'm trying to flesh out DynaGumper (the Gump3 DB plugin) and I'd like
 to populate the run/build information. I think I need project_version ids,
 but I can't figure out how do calculate them. Do I simply use
 http://www.apache.org/projects/{projectname}#20050526 or #HEAD or #gump or
 something?

Take a look at the example data in the sql file.

IIRC a project_version is a project that is part of a specific gump
run. Those two need to be combined into an id. So if you have

project name=blah...

And a public gump run started on vmgump.apache.org at 2005-05-29 at
21:43, your id becomes something like

 vmgump.apache.org:public:200505292143:blah

Ie the current

  http://vmgump.apache.org/gump/public/xml-security/index.html

Is related to

  vmgump.apache.org:public:200505271902:xml-security

Right now, and will be related to

  vmgump.apache.org:public:200505281902:xml-security
  ^^ new run

Tomorrow. At least that's how Stefano set it up, using semi-URIs. I
would've probably prefixed everything with urn:gump: :-)

 Further, ought project dependencies (in project_dependencies) be
 between project versions not projects?

Project A is linked against the Project B compiled on a specific host as
part of a specific run. So yes, a project_version depends on another
project_version. I know the SQL gets this right.

Of course, there's also a declaration like this

 xml-security -depends- xml-xalan

But that declaration tends to mutate over time.

 Finally, is anybody able to take on the DynaGump Cocoon webapp? I think we'd
 all benefit from seeing inside the database as we populate it.

Probably not atm. Stefano's rather busy building shiny tools AFAICT; we
don't have that many cocooners around I think :-)

Gump data visualisation is a hard problem. For now, you could also consider
writing some trivial commandline scripts that dump out some data to the
console :-)

Cheers!

LSD



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [RT] Gump 3.0 - Database Model

2005-05-26 Thread Adam R. B. Jack
Could we get back to this thread above (using  http://tinyurl.com/4qt9a to
get to the attachment) and see where we want to take it? I see that Gump3
has a schema that does not include some of the additions mentioned in the
thread.

Also, I'm trying to flesh out DynaGumper (the Gump3 DB plugin) and I'd like
to populate the run/build information. I think I need project_version ids,
but I can't figure out how do calculate them. Do I simply use
http://www.apache.org/projects/{projectname}#20050526 or #HEAD or #gump or
something? Further, ought project dependencies (in project_dependencies) be
between project versions not projects?

Finally, is anybody able to take on the DynaGump Cocoon webapp? I think we'd
all benefit from seeing inside the database as we populate it.

regards,

Adam
- Original Message - 
From: Stefano Mazzocchi [EMAIL PROTECTED]
To: Gump general@gump.apache.org
Sent: Wednesday, December 08, 2004 6:32 PM
Subject: [RT] Gump 3.0 - Database Model


 Since I received no pushback on my proposal, let's move on discussing
 the database model.

 I think the first step is to identify the entities that we want to
 model, their relationships and their respective cardinality.

 Here is what Leo and I came up with so far (attached as PDF).

 Comments/criticism/questions appreciated.

 -- 
 Stefano.









 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [RT] Gump 3.0 - Database Model

2004-12-16 Thread Wade . Stebbings
Stefano:

  see my responses below.

wade


Stefano Mazzocchi [EMAIL PROTECTED] wrote on 12/15/2004 09:40:28 PM:

 [...]
 
 What I was thinking is that this (and other of your suggestions) adds a 
 meta-metadata layer and I'm not sure if I want to add this complexity 
 at this point (given that the model is complex enough already).
 
 I agree that this meta-metadata layer will be very useful (for 
 annotation, grouping and further user interaction around the collected 
 data) but this is something we can add incrementally later on.

Yes.  This is a very easy thing to add-on later, over the top so to
speak, as none of the inner workings depend on it.  It is purely a way
to organize projects for presentation purposes.  Meta-meta?  Sure, why
not call it that.



 [...]
 
 Ok, this is again another meta-metadata layer but this is something that 

 I'm not sure I like. It smells of overdesign and at this point I want to 

 keep features that are just critical for having the system working. the 

 simplest thing that can possibly work.

Understood.  It is probably something more useful within my environment,
which is based on several different build systems that feed this system.


 [...]
 
 Keep in mind that we DO NOT WANT gump to build anything that anybody 
 would start use for their own stuff. It is critical, socially and 
 politically and for the security ecosystem that gump's artifact 
 repository is not used for anything else rather than distributed gumping 

 and fallback scenarios.
 
 Consider it a cache, a repository of precomputed calculations rather 
 than anything else.
 
 This is true for executables: for javadocs and docs, this is a different 

 story but we should not attack too many problems at the same time.

I see.  Our requirement was more broad for the Artifact Repository, and
thus it is overloaded to serve the build system itself (more gump like)
as well as internal (to the company) users for certain artifacts.  This
notion of an Artifact Repository is not very well fleshed out at the
moment, here, it is mostly design ideas at the present time.  We have
some pieces in place, mostly in a crude way.


 [...]
  
  In fact, at present in my schema, for a single build table entry,
  there can be:
  
   - any number of notes
   - any number of artifacts
   - any number of results
 
 This is interesting. How can you have different numbers of results if 
 you have only one output signal for a given build?

Ah, that all depends on how 'result' is defined.  As a Build Results
system, in my case, it serves more than just to feed the build system
proper.  Thus, as an example:

 1. Building (e.g., compiling) -- one result
 2. Packaging -- 2nd result
 3. First level automated testing (eg., unit) -- 3rd result
 4. QA testing -- 4th result
...
 N. Overall

There is usually a fixed set of result types on a per project
basis, some projects might not bother with QA testing for an
example, some might fold packaging into the build proper, etc.
This is all very dynamic, of course, because a new type could be
added one day and then live on, another type could be phased out.
The presentation is setup to handle all this.

The one output signal to which you refer is probably #1, else it
is #N.  In my case, N is calculated and the calculation is again
a per-project parameter.  This might seem like unnecessary
overdesign to Gump, but there are reasons why this is needed
here--actually, plenty.

The main points being:

 - although the build system produces artifacts, and in doing
   so there is status about that activity, thus one type of
   build result,

 - there are things we learn about those artifacts after they
   are produced, thus more results.

Bit of background on me, hopefully not to bore anyone.  I am in a
business environment, now, but came from years of cross-development
builds, embedded systems, etc.  To me, a build (proper) produces a
stream of bytes which most people call artifacts.  That stream of
bytes is further qualified as time goes on, usually in a series of
steps, and each step I have defined here as a result.  Many complex
systems have added new twists where build tooling itself is
produced during the build process, to which I try to decompose into
separate builds, where subsequent builds then become consumers of
another build's artifacts--baselined to some level of goodness, one
would hope.  Not that I'm saying anything really new here, except
about my perspective on things.

wade

Re: [RT] Gump 3.0 - Database Model

2004-12-16 Thread Wade . Stebbings
Leo:


Leo Simons [EMAIL PROTECTED] wrote on 12/16/2004 02:46:14 AM:

 [...]
 
 yep, I seem to agree. Let's first implement the proposed setup and 
 optimize for understandability and cleanliness. Gump has a lot of 
 features already. Let's first focus on making the important ones easier 
 to use, then on making it easy to add the ones we want.

I totally agree.  Take the incremental approach in your implementation,
design a bit beyond your current needs (but not too much).  Seems to
remind me of an Einstein quote...  ;)


 I can't really see through Wade's setup right now (I'd like to see 
 more, it sounds very interesting :-D), but what I do have is a hunch is 
 addresses quite a few use cases (like redistribution of stuff) which we 
 really don't want to worry about right now.

One significant difference, a differing requirement, between (my) Build
Results system and Gump 3.0 would be that Build Results really consumes
the output of several build systems, some nightly, some some continuous
integration, etc., and we're planning on adding-in CruiseControl as well.
It is the common point where all this information comes together.  There
was just too much legacy stuff (here) to attack all at once, so instead,
this approach seemed the more practical.  And now we are able to leverage
off nof it in different ways (with new build loops, like CruiseControl,
where there's already a publisher interface).

wade

Re: [RT] Gump 3.0 - Database Model

2004-12-16 Thread Leo Simons
Stefano Mazzocchi wrote:
[EMAIL PROTECTED] wrote:
Yes, it is a many-to-many relation between the Project and Group tables.
Thus, I can define one group which is all mainline builds (we have
several release streams managed by separate branches), regardless of
platforms build on.  Another group would be all Windows/2003 builds.  It
is merely a way of seeing a limited set of project names, though when
presented on the web page, I do also display some project attributes for
each project displayed, like the lable  link of the current build as 
well as the last good build.
Ok, I see.
What I was thinking is that this (and other of your suggestions) adds a 
meta-metadata layer and I'm not sure if I want to add this complexity 
at this point (given that the model is complex enough already).

I agree that this meta-metadata layer will be very useful (for 
annotation, grouping and further user interaction around the collected 
data) but this is something we can add incrementally later on.
yep, I seem to agree. Let's first implement the proposed setup and 
optimize for understandability and cleanliness. Gump has a lot of 
features already. Let's first focus on making the important ones easier 
to use, then on making it easy to add the ones we want.

I can't really see through Wade's setup right now (I'd like to see 
more, it sounds very interesting :-D), but what I do have is a hunch is 
addresses quite a few use cases (like redistribution of stuff) which we 
really don't want to worry about right now.

cheers,
- Leo
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [RT] Gump 3.0 - Database Model

2004-12-15 Thread Stefano Mazzocchi
[EMAIL PROTECTED] wrote:
Stefano,
Some afterthoughts.  Hopefully to help clarify.  The scope of a Project
in our system (currently) is that of a build (a series of builds) for a 
given
instance of (1) product-release on a given (2) target.  This of course
means that a single configuration for a given instance of #1 would then
fan out to several Projects (as we have used this word).

I am not completely happy with this arrangement, since our Project
does not distinguish between:
 (a) separate configurations, or
 (b) the same configurations build on different targets.
And somehow I think this distinction should be more clearly represented
in the data model.
I think if (1) were to be defined as the Project and the (2)'s under it
would be SubProject (to use some names), and keep the arbitrary
grouping mechanism, though now at the SubProject level, then I think
we've gained something w/o any other feature loss.
I'm sorry, I totally lost you here :-/
--
Stefano.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [RT] Gump 3.0 - Database Model

2004-12-15 Thread Stefano Mazzocchi
[EMAIL PROTECTED] wrote:
Stefano:
See my responses below.
Stefano Mazzocchi [EMAIL PROTECTED] wrote on 12/10/2004 02:21:48 PM:
[EMAIL PROTECTED] wrote:
[...]
In my Build Results system, I have a schema that also includes a few
additional things:
- abritrary groupings of projects, which helps in organizaing various
   forms of the presentation of the data
Can you elaborate more on this?

Yes, it is a many-to-many relation between the Project and Group tables.
Thus, I can define one group which is all mainline builds (we have
several release streams managed by separate branches), regardless of
platforms build on.  Another group would be all Windows/2003 builds.  It
is merely a way of seeing a limited set of project names, though when
presented on the web page, I do also display some project attributes for
each project displayed, like the lable  link of the current build as 
well as the last good build.
Ok, I see.
What I was thinking is that this (and other of your suggestions) adds a 
meta-metadata layer and I'm not sure if I want to add this complexity 
at this point (given that the model is complex enough already).

I agree that this meta-metadata layer will be very useful (for 
annotation, grouping and further user interaction around the collected 
data) but this is something we can add incrementally later on.

- the general notion of attributes associated with each:
   - build (instance)
   - project
   - group
   - the whole system
attributes as in annotations or as in related data?
I'm not sure what is the difference between annotations (like added
noted, do you mean?) and related data.  But what these tables do
is, basically to allow one to add new fields to the associated table
without a schema change.  They are name/value pairs, with the added
key (foreign key) of the id of the table to which they relate.  Thus,
for the Project table:
  Project Attributes:
- proj_id (foreign key)
- name (string, key)
- value (blob)
  Project:
- proj_id (key)
- ... etc ...
In the case of the system wide attributes table, there is no id
field.  That table I use for stuff like debug on/off/level, motd,
and so far little else.
Ok, this is again another meta-metadata layer but this is something that 
I'm not sure I like. It smells of overdesign and at this point I want to 
keep features that are just critical for having the system working. the 
simplest thing that can possibly work.

And since my system is focused on creating interaction between people
about given built baselines, I have the notion of a notes history 
associated
with any given build, in a similar spirit as the comment history of a 
given
bug in bugzilla.
I like the concept of allowing bugzilla-style communication to happen 
without requiring people to subscribe to various mail lists, like a 
common ground for communication to happen.

But I don't want this to be too global, because I want gump-related 
discussions to happen on the mail list.
You could tie-in email notification when this table is updated.  We
don't do that, but it's not a bad idea.  Bugzilla of course does this.
Good suggestion. Again, this applies to the meta-metadata layer but it 
strikes me as a very useful feature to have right away. What do others 
think?

Like the notes table, I have separate tables for (references to) 
artifacts,
yes, the artifact table is missing, that's a good point.

I use the notion of an external Artifact Repository and refer into
that with this table.  The artifacts themselves are not stored in
the database nor on the database server.  Just wanted to be clear
about that.
The notion of an Artifact Repository: ah, well, I have my idea of
what I want, and then there's the reality that we don't have much
more (at present) than a web-based storage mechanism, organized
hierarchically within the file system.  Thus version information
is exposed in the file-path name space, and 3rd party artifacts
are managed in yet another system.  My notion of an Artifact
Repository would be a place to store any 3rd party artifact that
any build could depend on.  Build themselves would be producers,
but could also be consumers.  One of the main points of this is:
that I separate, architecturally, the Artifact Repository, as a
separate service, from the build system itself.
Keep in mind that we DO NOT WANT gump to build anything that anybody 
would start use for their own stuff. It is critical, socially and 
politically and for the security ecosystem that gump's artifact 
repository is not used for anything else rather than distributed gumping 
and fallback scenarios.

Consider it a cache, a repository of precomputed calculations rather 
than anything else.

This is true for executables: for javadocs and docs, this is a different 
story but we should not attack too many problems at the same time.

and another for results, to support any arbitrary number of 
artifacts/results
to a given build-instance. 
Good point.
[...]
So, things missing are:
 1) bugzilla like 

Re: [RT] Gump 3.0 - Database Model

2004-12-13 Thread Niclas Hedhman
On Monday 13 December 2004 09:09, Stefano Mazzocchi wrote:

 Eric, I really don't care what ID we choose, as long as it does identify
 something univocally also in a global and distributed environment.

RDF ?
Isn't RDF a perfect fit for this kind of problems ?

Niclas
-- 
   +--//---+
  / http://www.dpml.net   /
 / http://niclas.hedhman.org / 
+--//---+


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [RT] Gump 3.0 - Database Model

2004-12-13 Thread Wade . Stebbings
Stefano:

See my responses below.


Stefano Mazzocchi [EMAIL PROTECTED] wrote on 12/10/2004 02:21:48 PM:
 [EMAIL PROTECTED] wrote:
 [...]
  In my Build Results system, I have a schema that also includes a few
  additional things:
  
   - abritrary groupings of projects, which helps in organizaing various
  forms of the presentation of the data
 
 Can you elaborate more on this?

Yes, it is a many-to-many relation between the Project and Group tables.
Thus, I can define one group which is all mainline builds (we have
several release streams managed by separate branches), regardless of
platforms build on.  Another group would be all Windows/2003 builds.  It
is merely a way of seeing a limited set of project names, though when
presented on the web page, I do also display some project attributes for
each project displayed, like the lable  link of the current build as 
well as the last good build.


   - the general notion of attributes associated with each:
  - build (instance)
  - project
  - group
  - the whole system
 
 attributes as in annotations or as in related data?

I'm not sure what is the difference between annotations (like added
noted, do you mean?) and related data.  But what these tables do
is, basically to allow one to add new fields to the associated table
without a schema change.  They are name/value pairs, with the added
key (foreign key) of the id of the table to which they relate.  Thus,
for the Project table:

  Project Attributes:
- proj_id (foreign key)
- name (string, key)
- value (blob)

  Project:
- proj_id (key)
- ... etc ...

In the case of the system wide attributes table, there is no id
field.  That table I use for stuff like debug on/off/level, motd,
and so far little else.


  And since my system is focused on creating interaction between people
  about given built baselines, I have the notion of a notes history 
  associated
  with any given build, in a similar spirit as the comment history of a 
  given
  bug in bugzilla.
 
 I like the concept of allowing bugzilla-style communication to happen 
 without requiring people to subscribe to various mail lists, like a 
 common ground for communication to happen.
 
 But I don't want this to be too global, because I want gump-related 
 discussions to happen on the mail list.

You could tie-in email notification when this table is updated.  We
don't do that, but it's not a bad idea.  Bugzilla of course does this.


  Like the notes table, I have separate tables for (references to) 
  artifacts,
 
 yes, the artifact table is missing, that's a good point.

I use the notion of an external Artifact Repository and refer into
that with this table.  The artifacts themselves are not stored in
the database nor on the database server.  Just wanted to be clear
about that.

The notion of an Artifact Repository: ah, well, I have my idea of
what I want, and then there's the reality that we don't have much
more (at present) than a web-based storage mechanism, organized
hierarchically within the file system.  Thus version information
is exposed in the file-path name space, and 3rd party artifacts
are managed in yet another system.  My notion of an Artifact
Repository would be a place to store any 3rd party artifact that
any build could depend on.  Build themselves would be producers,
but could also be consumers.  One of the main points of this is:
that I separate, architecturally, the Artifact Repository, as a
separate service, from the build system itself.


  and another for results, to support any arbitrary number of 
  artifacts/results
  to a given build-instance. 
 
 Good point.
 
 [...]
 
 So, things missing are:
 
   1) bugzilla like comments (on build results only? or what else?)
   2) artifact table / artifact type table
 
 Anything else you guys see missing?

Note: The results per build table (to support an arbirary number
of results per build) was a separate table from the artifact table.

In fact, at present in my schema, for a single build table entry,
there can be:

 - any number of notes
 - any number of artifacts
 - any number of results

I separate artifacts (products of a build) from results (meta data
or things we know about or learn about the build products).  A 
result entry has one of four possible states in my schema: 1. unset,
2. pass, 3. warn, 4. fail (to which I map the obvious color in the
web presentation ;) -- extrapolating/generalizing that my sampling
of the world's traffic/semaphore lights extends to the rest of the
world; 7 countries on 4 continents - a good but small sample).  And
unset = white.

Thanks for including me in the discussion.  I look forward to more.

wade 

Re: [RT] Gump 3.0 - Database Model

2004-12-13 Thread Wade . Stebbings
Stefano,

Some afterthoughts.  Hopefully to help clarify.  The scope of a Project
in our system (currently) is that of a build (a series of builds) for a 
given
instance of (1) product-release on a given (2) target.  This of course
means that a single configuration for a given instance of #1 would then
fan out to several Projects (as we have used this word).

I am not completely happy with this arrangement, since our Project
does not distinguish between:

 (a) separate configurations, or
 (b) the same configurations build on different targets.

And somehow I think this distinction should be more clearly represented
in the data model.

I think if (1) were to be defined as the Project and the (2)'s under it
would be SubProject (to use some names), and keep the arbitrary
grouping mechanism, though now at the SubProject level, then I think
we've gained something w/o any other feature loss.

wade


[EMAIL PROTECTED] wrote on 12/13/2004 09:07:32 AM:

 Stefano:
 
 See my responses below.
 
 
 [...]

RE: [RT] Gump 3.0 - Database Model

2004-12-12 Thread Eric Pugh
Just catching up on my email after being gone for a week.  One thing that
strikes me about the project id's is that this seems to continue the same
discussion we have had in the past about maven generated project id's versus
the gump project id's...

Do the project id's have to have meaning?  While it's nice to look at a
project id and pick out some data, like the version and the timestamp or
what not, eventually gump will run into another project where the id's mean
something different and are generated differently.  I don't mind a project
id like 787234 that I then look up and find out is what ever specific
meaning it has.  Like version, or host, or whatnot.   I think that when we
establish project naming conventions we'll run into conflicts with how other
projects name themselves

 I would welcome project IDs of the form

   http://www.apache.org/projects/cocoon

 and then

   http://www.apache.org/projects/cocoon#v1.0

 for a particular released version, or

   http://www.apache.org/projects/cocoon#20041210



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [RT] Gump 3.0 - Database Model

2004-12-12 Thread Stefano Mazzocchi
Eric Pugh wrote:
Just catching up on my email after being gone for a week.  One thing that
strikes me about the project id's is that this seems to continue the same
discussion we have had in the past about maven generated project id's versus
the gump project id's...
Do the project id's have to have meaning?  While it's nice to look at a
project id and pick out some data, like the version and the timestamp or
what not, eventually gump will run into another project where the id's mean
something different and are generated differently.  I don't mind a project
id like 787234 that I then look up and find out is what ever specific
meaning it has.  Like version, or host, or whatnot.   I think that when we
establish project naming conventions we'll run into conflicts with how other
projects name themselves

I would welcome project IDs of the form
 http://www.apache.org/projects/cocoon
and then
 http://www.apache.org/projects/cocoon#v1.0
for a particular released version, or
 http://www.apache.org/projects/cocoon#20041210
Eric, I really don't care what ID we choose, as long as it does identify 
something univocally also in a global and distributed environment.

--
Stefano.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [RT] Gump 3.0

2004-12-10 Thread Stefan Bodewig
On Wed, 08 Dec 2004, Stefan Bodewig [EMAIL PROTECTED] wrote:
 On Mon, 06 Dec 2004, Stefano Mazzocchi [EMAIL PROTECTED] wrote:
 
 So, here is my first suggestion: split gump in three stages.
 
   1) metadata aggregation
   2) build
   3) build data use
 
 Sounds good.

One additional thing.

I'd love to have part 2 separated into at least three steps that can
get invoked indiviually:

2a) SCM update
2b) syncing updated working copy with workspace
2c) building

With traditional Gump it has been possible to modify classes in the
workspace and rebuild using Gump.  This has been very useful in
resolving Gump problems in the past.  Right now I don't see an easy
way to do this.

For example, I fixed the commons-jelly-tags-ant build by patching
the jelly-util taglib.  I verified it would fix the Gump build by
applying my patch locally and only building commons-jelly-tags-util
and after that commons-jelly-tags-ant.

Using current Gump my local patch would have been blown away by CVS
updates or syncs - unless I applied it in what is supposed to be a
clean checkout and disconnected from the network.

Also, just building commons-jelly-tags-util and commons-jelly-tags-ant
without rebuilding Ant and all that seems to be impossible right now
(I may be wrong, though).

Stefan

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [RT] Gump 3.0

2004-12-10 Thread Adam R. B. Jack

 2a) SCM update
 2b) syncing updated working copy with workspace
 2c) building

We do actually have 2a and 2c already, in bin/build.py and bin/update.py,
they just never got the usage/fixing they might need.

regards

Adam


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [RT] Gump 3.0 - Database Model

2004-12-10 Thread Stefano Mazzocchi
[EMAIL PROTECTED] wrote:
This is cool.  FWIW, here's some bits from my experience, implemeting
something similar in a MySQL database.
Awesome!
In my Build Results system, I have a schema that also includes a few
additional things:
 - abritrary groupings of projects, which helps in organizaing various
forms of the presentation of the data
Can you elaborate more on this?
 - the general notion of attributes associated with each:
- build (instance)
- project
- group
- the whole system
attributes as in annotations or as in related data?
And since my system is focused on creating interaction between people
about given built baselines, I have the notion of a notes history 
associated
with any given build, in a similar spirit as the comment history of a 
given
bug in bugzilla.
I like the concept of allowing bugzilla-style communication to happen 
without requiring people to subscribe to various mail lists, like a 
common ground for communication to happen.

But I don't want this to be too global, because I want gump-related 
discussions to happen on the mail list.

Like the notes table, I have separate tables for (references to) 
artifacts,
yes, the artifact table is missing, that's a good point.
and another for results, to support any arbitrary number of 
artifacts/results
to a given build-instance.  
Good point.
This could be hidden in your diagram inside 
the
builds entity/table, but wasn't explicit.
No, you're right, we need to add that.
I've built a lot of generality into my schema, since I need to support 
many
inputs into this database, from various (new and old) build systems.  Thus
things like the result table is kept very general within the database. One
area that is not very well thought out (in my case) are how results and/or
build instances depend on each other, a core requirement for Gump, as
it seems.

Hope this helps.  Comments?
So, things missing are:
 1) bugzilla like comments (on build results only? or what else?)
 2) artifact table / artifact type table
Anything else you guys see missing?
--
Stefano.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [RT] Gump 3.0

2004-12-10 Thread Adam R. B. Jack
Ok, here is my thinking on how we proceed towards Gump 3.0, i.e.:

1) Metadata Gathering
2) Processing (Build/Sync/Update)
3) Results/Presentation/History Query/Analysis

 
Fnor *now* ...

1) Phase One (Metadata Gathering) is simply the way to get XML documention
into a local file system for Gump to process. Eventually this could be
crawlers (etc.) that parse GOMs and POMs, but (for now) the CVS update 
HTTP gets are tolerable. [If anybody has an itch to tackle this first, speak
up, but I think it is a reasonable/significant amount of work and (IMHO) can
wait a little while longer.]

2) Phase Two (Building) is what we currently have as core, but that outputs
to an historical database (plus some files for those w/o huge databases). It
will not do RDF/RSS/Atom/Notification/XHTML Presentation (or XDOCS). It will
not do Stats (neither XHTML presentation nor internal to DBM) nor will it do
XRef (XHTML).

3) Phase Three  (Analysis/Communication) is a whole new world; re-writting
the 'will not do' list from above from the results database. This could be
Python code, or Cocoon, or ...

I'd like to focus my time on (2) and request that others help with (3).

Question: We currently run JDK1.5 and Kaffe off TRUNK not LIVE. Ought we
change this? Alternatively, ought we perform this Gump work in a separate
branch. I think I can add to the current w/o too much instability, then
remove stuff when needed. I'm game to listen to others opinions/concerns
though.

[FWIIW: Personally, I'd love to get back to NAnt building except that Mono
is still my roadblock.I think Gump 3.0 ought be far less resource bound, and
it ought help us simplify running/operating Gump. As such, I hope it leads
to more users and hence more hands to help with NAnt, etc.]

regards,

Adam


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [RT] Gump 3.0

2004-12-10 Thread Stefano Mazzocchi
Adam R. B. Jack wrote:
Ok, here is my thinking on how we proceed towards Gump 3.0, i.e.:
1) Metadata Gathering
2) Processing (Build/Sync/Update)
3) Results/Presentation/History Query/Analysis
 
Fnor *now* ...
1) Phase One (Metadata Gathering) is simply the way to get XML documention
into a local file system for Gump to process. Eventually this could be
crawlers (etc.) that parse GOMs and POMs, but (for now) the CVS update 
HTTP gets are tolerable. [If anybody has an itch to tackle this first, speak
up, but I think it is a reasonable/significant amount of work and (IMHO) can
wait a little while longer.]
+1
2) Phase Two (Building) is what we currently have as core, but that outputs
to an historical database (plus some files for those w/o huge databases). It
will not do RDF/RSS/Atom/Notification/XHTML Presentation (or XDOCS). It will
not do Stats (neither XHTML presentation nor internal to DBM) nor will it do
XRef (XHTML).
+1
3) Phase Three  (Analysis/Communication) is a whole new world; re-writting
the 'will not do' list from above from the results database. This could be
Python code, or Cocoon, or ...
I'd like to focus my time on (2) and request that others help with (3).
I'm game. I can take ownership of #3.
Question: We currently run JDK1.5 and Kaffe off TRUNK not LIVE. Ought we
change this? 
yeah, it makes sense.
Alternatively, ought we perform this Gump work in a separate
branch. I think I can add to the current w/o too much instability, then
remove stuff when needed. I'm game to listen to others opinions/concerns
though.
Currently, Dynagump is the code name for #3 and does not depend on any 
code from Gump (only on a common database schema).

I think we keep it the way it is for now, we can move stuff back and 
forth later on, thanks to SVN.

[FWIIW: Personally, I'd love to get back to NAnt building except that Mono
is still my roadblock.I think Gump 3.0 ought be far less resource bound, and
it ought help us simplify running/operating Gump. As such, I hope it leads
to more users and hence more hands to help with NAnt, etc.]
I personally would love to see Mono stuff being gumped as well, but it's 
a low priority for me ATM.

--
Stefano.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [RT] Gump 3.0

2004-12-10 Thread Stefan Bodewig
On Fri, 10 Dec 2004, Adam R. B. Jack [EMAIL PROTECTED] wrote:

 [FWIIW: Personally, I'd love to get back to NAnt building except
 that Mono is still my roadblock.

I still don't quite understand why it works far better on my oldish
RedHat box either.  Hmm, have we tried Mono 1.0.4 or even 1.0.5
(released today 8-) yet?

Anyway.  Once I merge my lst commit to the live branch we will build
apr-util against apr and everything should be there to support
configure/make based projects (we may need env variable support).  My
next prio will be documenting the stuff so that others like Graham can
get their feet wet - and then head towards NAnt and Mono.

This is what I expect to be able to do, I'll probably never dive into
Python (lack of time - and admittedly it hasn't been fun yet, either)
deep enough in order to scratch more than the surface.

Stefan

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [RT] Gump 3.0 - Database Model

2004-12-09 Thread Adam R. B. Jack

 Since I received no pushback on my proposal, let's move on discussing
 the database model.

I see this model is good enough for certain aspects of the proposed 3.0, but
not for all. We can't store the metadata in it, in order to perform builds
from, there is clearly insufficient information. That said, I am more than
happy to start on a 3.0 break-up by splitting the outputs from the
presentation of those outputs via this model.

That said, I still need more information on the contents of ids (and such),
to verify the model is correct. Here are some initial reactions:

One thing I noticed you mentioned was a desire for this database model to
allow Gump to be distributed. I like that goal. We can't assume one host can
do all builds (although Brutus is doing a fine fine job) so perhaps we could
allow different hosts to build and contribute data for individual aspects.
Maybe this is a goal to work towards, not focus on now, but I beleive that
project id including a host are not correct (they ought be independent of
the host) [Q: Are we comfortable with allowing remote hosts to connect to a
center MySQL database, or do we need an intermediary representation and more
secure protocol for such?]

Do we need environment, i.e, kaffe or JDK 1.5 or  whatever? Ought we have
hosts/workspaces as mainly informational, with environment (what ought be
the only differentiator for two builds of the same stuff, at exact time) as
the key to builds?

Do we need to allow build output to be optionally outside of the database,
for those of us w/o terrabytes to spare?

I like dependency within the database, but do we need more information
(such as optional, etc.) on that?

Also, one key piece of information in the current object model (which is
used to document from) is cause. We didn't build this thing 'cos X failed
to build. That, along with annotations (we build this, but w/o X 'cos it was
an optional failed dependency), seem important. Personally I like all the
information on this page being available.

http://brutus.apache.org/gump/public/ant/ant/details.html

Maybe (as a transition) we generate simple pages from the existing object
model, but generate a results database (with history) and migrate more and
more to it over time.

Thanks, both, for putting this together.

regards

Adam


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [RT] Gump 3.0 - Database Model

2004-12-09 Thread Wade . Stebbings
This is cool.  FWIW, here's some bits from my experience, implemeting
something similar in a MySQL database.


In my Build Results system, I have a schema that also includes a few
additional things:

 - abritrary groupings of projects, which helps in organizaing various
forms of the presentation of the data

 - the general notion of attributes associated with each:
- build (instance)
- project
- group
- the whole system

And since my system is focused on creating interaction between people
about given built baselines, I have the notion of a notes history 
associated
with any given build, in a similar spirit as the comment history of a 
given
bug in bugzilla.

Like the notes table, I have separate tables for (references to) 
artifacts,
and another for results, to support any arbitrary number of 
artifacts/results
to a given build-instance.  This could be hidden in your diagram inside 
the
builds entity/table, but wasn't explicit.

I've built a lot of generality into my schema, since I need to support 
many
inputs into this database, from various (new and old) build systems.  Thus
things like the result table is kept very general within the database. One
area that is not very well thought out (in my case) are how results and/or
build instances depend on each other, a core requirement for Gump, as
it seems.

Hope this helps.  Comments?

wade


Stefano Mazzocchi [EMAIL PROTECTED] wrote on 12/08/2004 06:32:34 PM:

 Since I received no pushback on my proposal, let's move on discussing 
 the database model.
 
 I think the first step is to identify the entities that we want to 
 model, their relationships and their respective cardinality.
 
 Here is what Leo and I came up with so far (attached as PDF).
 
 Comments/criticism/questions appreciated.
 
 -- 
 Stefano.


Re: [RT] Gump 3.0

2004-12-08 Thread Stefan Bodewig
On Mon, 06 Dec 2004, Stefano Mazzocchi [EMAIL PROTECTED] wrote:

 So, here is my first suggestion: split gump in three stages.
 
   1) metadata aggregation
   2) build
   3) build data use

Sounds good.

 We should be maintaing the metadata representation only for the
 projects that don't have that data integrated in their build system
 (like pure ant projects or make/configure projects).

Even the later may have them in some form, like RPM spec files, it may
be worth to look into them (some time later) as well.

Stefan

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [RT] Gump 3.0 - Database Model

2004-12-08 Thread Stefano Mazzocchi
Stefano Mazzocchi wrote:
Since I received no pushback on my proposal, let's move on discussing 
the database model.

I think the first step is to identify the entities that we want to 
model, their relationships and their respective cardinality.

Here is what Leo and I came up with so far (attached as PDF).
Comments/criticism/questions appreciated.
Hmmm, trying again.
--
Stefano.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [RT] Gump 3.0 - Database Model

2004-12-08 Thread Brett Porter
must be stripping attachemnts - maybe it can be put on the wiki or something?


On Wed, 08 Dec 2004 19:35:18 -0500, Stefano Mazzocchi
[EMAIL PROTECTED] wrote:
 Stefano Mazzocchi wrote:
 
 
  Since I received no pushback on my proposal, let's move on discussing
  the database model.
 
  I think the first step is to identify the entities that we want to
  model, their relationships and their respective cardinality.
 
  Here is what Leo and I came up with so far (attached as PDF).
 
  Comments/criticism/questions appreciated.
 
 Hmmm, trying again.
 
 --
 Stefano.
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [RT] Gump 3.0 - Database Model

2004-12-08 Thread Stefano Mazzocchi
Stefano Mazzocchi wrote:
Stefano Mazzocchi wrote:
Since I received no pushback on my proposal, let's move on discussing 
the database model.

I think the first step is to identify the entities that we want to 
model, their relationships and their respective cardinality.

Here is what Leo and I came up with so far (attached as PDF).
Comments/criticism/questions appreciated.

Hmmm, trying again.
Damn, it seems that my attachments get filtered out. All right, find it 
over here:

  http://tinyurl.com/4qt9a
--
Stefano.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [RT] Gump 3.0

2004-12-06 Thread Leo Simons
Stefano Mazzocchi wrote:
Comments?
Not really. Most of it sounds obvious by now, actually :-D
More images related to this architecture are at:
http://svn.apache.org/repos/asf/gump/trunk/src/xdocs/gump.pdf
though I'm afraid some of the comments in the gump.ppt alongside there 
didn't make it into the PDF.

I'll also point out that your RT (probably on purpose) leaves out a 
*lot* of talk about (lifting) social limitations. The fun bit about the 
thinking there is that it tends to span all those stages and database. 
That really needs to be written down as well at some point so some of 
the design decisions make more sense :-D

Finally I'll point out (just to keep this e-mail short, really, there's 
a lot to say), one other thing to realize is that this 
DB-based-architecture will help us move away from the batch-based 
approach we have right now.

- LSD
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]