Re: [RT] Gump 3.0 - Database Model
On 26-05-2005 17:58, Adam R. B. Jack [EMAIL PROTECTED] wrote: Could we get back to this thread above (using http://tinyurl.com/4qt9a to get to the attachment) and see where we want to take it? Yes we can! It probably still needs a lot of work. One thing that's wrong with it at the moment is that for example project_dependencies don't have ids, and they should (because there's additional information about a dependency, like it being optional, or it being part of the root cause trail for a failure). I see that Gump3 has a schema that does not include some of the additions mentioned in the thread. If you're referring to branches/Gump3/gumpdb and its contents, that's I believe more evolved than that PDF (last change december 28 vs december 08). Also, I'm trying to flesh out DynaGumper (the Gump3 DB plugin) and I'd like to populate the run/build information. I think I need project_version ids, but I can't figure out how do calculate them. Do I simply use http://www.apache.org/projects/{projectname}#20050526 or #HEAD or #gump or something? Take a look at the example data in the sql file. IIRC a project_version is a project that is part of a specific gump run. Those two need to be combined into an id. So if you have project name=blah... And a public gump run started on vmgump.apache.org at 2005-05-29 at 21:43, your id becomes something like vmgump.apache.org:public:200505292143:blah Ie the current http://vmgump.apache.org/gump/public/xml-security/index.html Is related to vmgump.apache.org:public:200505271902:xml-security Right now, and will be related to vmgump.apache.org:public:200505281902:xml-security ^^ new run Tomorrow. At least that's how Stefano set it up, using semi-URIs. I would've probably prefixed everything with urn:gump: :-) Further, ought project dependencies (in project_dependencies) be between project versions not projects? Project A is linked against the Project B compiled on a specific host as part of a specific run. So yes, a project_version depends on another project_version. I know the SQL gets this right. Of course, there's also a declaration like this xml-security -depends- xml-xalan But that declaration tends to mutate over time. Finally, is anybody able to take on the DynaGump Cocoon webapp? I think we'd all benefit from seeing inside the database as we populate it. Probably not atm. Stefano's rather busy building shiny tools AFAICT; we don't have that many cocooners around I think :-) Gump data visualisation is a hard problem. For now, you could also consider writing some trivial commandline scripts that dump out some data to the console :-) Cheers! LSD - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0 - Database Model
Could we get back to this thread above (using http://tinyurl.com/4qt9a to get to the attachment) and see where we want to take it? I see that Gump3 has a schema that does not include some of the additions mentioned in the thread. Also, I'm trying to flesh out DynaGumper (the Gump3 DB plugin) and I'd like to populate the run/build information. I think I need project_version ids, but I can't figure out how do calculate them. Do I simply use http://www.apache.org/projects/{projectname}#20050526 or #HEAD or #gump or something? Further, ought project dependencies (in project_dependencies) be between project versions not projects? Finally, is anybody able to take on the DynaGump Cocoon webapp? I think we'd all benefit from seeing inside the database as we populate it. regards, Adam - Original Message - From: Stefano Mazzocchi [EMAIL PROTECTED] To: Gump general@gump.apache.org Sent: Wednesday, December 08, 2004 6:32 PM Subject: [RT] Gump 3.0 - Database Model Since I received no pushback on my proposal, let's move on discussing the database model. I think the first step is to identify the entities that we want to model, their relationships and their respective cardinality. Here is what Leo and I came up with so far (attached as PDF). Comments/criticism/questions appreciated. -- Stefano. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0 - Database Model
Stefano: see my responses below. wade Stefano Mazzocchi [EMAIL PROTECTED] wrote on 12/15/2004 09:40:28 PM: [...] What I was thinking is that this (and other of your suggestions) adds a meta-metadata layer and I'm not sure if I want to add this complexity at this point (given that the model is complex enough already). I agree that this meta-metadata layer will be very useful (for annotation, grouping and further user interaction around the collected data) but this is something we can add incrementally later on. Yes. This is a very easy thing to add-on later, over the top so to speak, as none of the inner workings depend on it. It is purely a way to organize projects for presentation purposes. Meta-meta? Sure, why not call it that. [...] Ok, this is again another meta-metadata layer but this is something that I'm not sure I like. It smells of overdesign and at this point I want to keep features that are just critical for having the system working. the simplest thing that can possibly work. Understood. It is probably something more useful within my environment, which is based on several different build systems that feed this system. [...] Keep in mind that we DO NOT WANT gump to build anything that anybody would start use for their own stuff. It is critical, socially and politically and for the security ecosystem that gump's artifact repository is not used for anything else rather than distributed gumping and fallback scenarios. Consider it a cache, a repository of precomputed calculations rather than anything else. This is true for executables: for javadocs and docs, this is a different story but we should not attack too many problems at the same time. I see. Our requirement was more broad for the Artifact Repository, and thus it is overloaded to serve the build system itself (more gump like) as well as internal (to the company) users for certain artifacts. This notion of an Artifact Repository is not very well fleshed out at the moment, here, it is mostly design ideas at the present time. We have some pieces in place, mostly in a crude way. [...] In fact, at present in my schema, for a single build table entry, there can be: - any number of notes - any number of artifacts - any number of results This is interesting. How can you have different numbers of results if you have only one output signal for a given build? Ah, that all depends on how 'result' is defined. As a Build Results system, in my case, it serves more than just to feed the build system proper. Thus, as an example: 1. Building (e.g., compiling) -- one result 2. Packaging -- 2nd result 3. First level automated testing (eg., unit) -- 3rd result 4. QA testing -- 4th result ... N. Overall There is usually a fixed set of result types on a per project basis, some projects might not bother with QA testing for an example, some might fold packaging into the build proper, etc. This is all very dynamic, of course, because a new type could be added one day and then live on, another type could be phased out. The presentation is setup to handle all this. The one output signal to which you refer is probably #1, else it is #N. In my case, N is calculated and the calculation is again a per-project parameter. This might seem like unnecessary overdesign to Gump, but there are reasons why this is needed here--actually, plenty. The main points being: - although the build system produces artifacts, and in doing so there is status about that activity, thus one type of build result, - there are things we learn about those artifacts after they are produced, thus more results. Bit of background on me, hopefully not to bore anyone. I am in a business environment, now, but came from years of cross-development builds, embedded systems, etc. To me, a build (proper) produces a stream of bytes which most people call artifacts. That stream of bytes is further qualified as time goes on, usually in a series of steps, and each step I have defined here as a result. Many complex systems have added new twists where build tooling itself is produced during the build process, to which I try to decompose into separate builds, where subsequent builds then become consumers of another build's artifacts--baselined to some level of goodness, one would hope. Not that I'm saying anything really new here, except about my perspective on things. wade
Re: [RT] Gump 3.0 - Database Model
Leo: Leo Simons [EMAIL PROTECTED] wrote on 12/16/2004 02:46:14 AM: [...] yep, I seem to agree. Let's first implement the proposed setup and optimize for understandability and cleanliness. Gump has a lot of features already. Let's first focus on making the important ones easier to use, then on making it easy to add the ones we want. I totally agree. Take the incremental approach in your implementation, design a bit beyond your current needs (but not too much). Seems to remind me of an Einstein quote... ;) I can't really see through Wade's setup right now (I'd like to see more, it sounds very interesting :-D), but what I do have is a hunch is addresses quite a few use cases (like redistribution of stuff) which we really don't want to worry about right now. One significant difference, a differing requirement, between (my) Build Results system and Gump 3.0 would be that Build Results really consumes the output of several build systems, some nightly, some some continuous integration, etc., and we're planning on adding-in CruiseControl as well. It is the common point where all this information comes together. There was just too much legacy stuff (here) to attack all at once, so instead, this approach seemed the more practical. And now we are able to leverage off nof it in different ways (with new build loops, like CruiseControl, where there's already a publisher interface). wade
Re: [RT] Gump 3.0 - Database Model
Stefano Mazzocchi wrote: [EMAIL PROTECTED] wrote: Yes, it is a many-to-many relation between the Project and Group tables. Thus, I can define one group which is all mainline builds (we have several release streams managed by separate branches), regardless of platforms build on. Another group would be all Windows/2003 builds. It is merely a way of seeing a limited set of project names, though when presented on the web page, I do also display some project attributes for each project displayed, like the lable link of the current build as well as the last good build. Ok, I see. What I was thinking is that this (and other of your suggestions) adds a meta-metadata layer and I'm not sure if I want to add this complexity at this point (given that the model is complex enough already). I agree that this meta-metadata layer will be very useful (for annotation, grouping and further user interaction around the collected data) but this is something we can add incrementally later on. yep, I seem to agree. Let's first implement the proposed setup and optimize for understandability and cleanliness. Gump has a lot of features already. Let's first focus on making the important ones easier to use, then on making it easy to add the ones we want. I can't really see through Wade's setup right now (I'd like to see more, it sounds very interesting :-D), but what I do have is a hunch is addresses quite a few use cases (like redistribution of stuff) which we really don't want to worry about right now. cheers, - Leo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0 - Database Model
[EMAIL PROTECTED] wrote: Stefano, Some afterthoughts. Hopefully to help clarify. The scope of a Project in our system (currently) is that of a build (a series of builds) for a given instance of (1) product-release on a given (2) target. This of course means that a single configuration for a given instance of #1 would then fan out to several Projects (as we have used this word). I am not completely happy with this arrangement, since our Project does not distinguish between: (a) separate configurations, or (b) the same configurations build on different targets. And somehow I think this distinction should be more clearly represented in the data model. I think if (1) were to be defined as the Project and the (2)'s under it would be SubProject (to use some names), and keep the arbitrary grouping mechanism, though now at the SubProject level, then I think we've gained something w/o any other feature loss. I'm sorry, I totally lost you here :-/ -- Stefano. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0 - Database Model
[EMAIL PROTECTED] wrote: Stefano: See my responses below. Stefano Mazzocchi [EMAIL PROTECTED] wrote on 12/10/2004 02:21:48 PM: [EMAIL PROTECTED] wrote: [...] In my Build Results system, I have a schema that also includes a few additional things: - abritrary groupings of projects, which helps in organizaing various forms of the presentation of the data Can you elaborate more on this? Yes, it is a many-to-many relation between the Project and Group tables. Thus, I can define one group which is all mainline builds (we have several release streams managed by separate branches), regardless of platforms build on. Another group would be all Windows/2003 builds. It is merely a way of seeing a limited set of project names, though when presented on the web page, I do also display some project attributes for each project displayed, like the lable link of the current build as well as the last good build. Ok, I see. What I was thinking is that this (and other of your suggestions) adds a meta-metadata layer and I'm not sure if I want to add this complexity at this point (given that the model is complex enough already). I agree that this meta-metadata layer will be very useful (for annotation, grouping and further user interaction around the collected data) but this is something we can add incrementally later on. - the general notion of attributes associated with each: - build (instance) - project - group - the whole system attributes as in annotations or as in related data? I'm not sure what is the difference between annotations (like added noted, do you mean?) and related data. But what these tables do is, basically to allow one to add new fields to the associated table without a schema change. They are name/value pairs, with the added key (foreign key) of the id of the table to which they relate. Thus, for the Project table: Project Attributes: - proj_id (foreign key) - name (string, key) - value (blob) Project: - proj_id (key) - ... etc ... In the case of the system wide attributes table, there is no id field. That table I use for stuff like debug on/off/level, motd, and so far little else. Ok, this is again another meta-metadata layer but this is something that I'm not sure I like. It smells of overdesign and at this point I want to keep features that are just critical for having the system working. the simplest thing that can possibly work. And since my system is focused on creating interaction between people about given built baselines, I have the notion of a notes history associated with any given build, in a similar spirit as the comment history of a given bug in bugzilla. I like the concept of allowing bugzilla-style communication to happen without requiring people to subscribe to various mail lists, like a common ground for communication to happen. But I don't want this to be too global, because I want gump-related discussions to happen on the mail list. You could tie-in email notification when this table is updated. We don't do that, but it's not a bad idea. Bugzilla of course does this. Good suggestion. Again, this applies to the meta-metadata layer but it strikes me as a very useful feature to have right away. What do others think? Like the notes table, I have separate tables for (references to) artifacts, yes, the artifact table is missing, that's a good point. I use the notion of an external Artifact Repository and refer into that with this table. The artifacts themselves are not stored in the database nor on the database server. Just wanted to be clear about that. The notion of an Artifact Repository: ah, well, I have my idea of what I want, and then there's the reality that we don't have much more (at present) than a web-based storage mechanism, organized hierarchically within the file system. Thus version information is exposed in the file-path name space, and 3rd party artifacts are managed in yet another system. My notion of an Artifact Repository would be a place to store any 3rd party artifact that any build could depend on. Build themselves would be producers, but could also be consumers. One of the main points of this is: that I separate, architecturally, the Artifact Repository, as a separate service, from the build system itself. Keep in mind that we DO NOT WANT gump to build anything that anybody would start use for their own stuff. It is critical, socially and politically and for the security ecosystem that gump's artifact repository is not used for anything else rather than distributed gumping and fallback scenarios. Consider it a cache, a repository of precomputed calculations rather than anything else. This is true for executables: for javadocs and docs, this is a different story but we should not attack too many problems at the same time. and another for results, to support any arbitrary number of artifacts/results to a given build-instance. Good point. [...] So, things missing are: 1) bugzilla like
Re: [RT] Gump 3.0 - Database Model
On Monday 13 December 2004 09:09, Stefano Mazzocchi wrote: Eric, I really don't care what ID we choose, as long as it does identify something univocally also in a global and distributed environment. RDF ? Isn't RDF a perfect fit for this kind of problems ? Niclas -- +--//---+ / http://www.dpml.net / / http://niclas.hedhman.org / +--//---+ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0 - Database Model
Stefano: See my responses below. Stefano Mazzocchi [EMAIL PROTECTED] wrote on 12/10/2004 02:21:48 PM: [EMAIL PROTECTED] wrote: [...] In my Build Results system, I have a schema that also includes a few additional things: - abritrary groupings of projects, which helps in organizaing various forms of the presentation of the data Can you elaborate more on this? Yes, it is a many-to-many relation between the Project and Group tables. Thus, I can define one group which is all mainline builds (we have several release streams managed by separate branches), regardless of platforms build on. Another group would be all Windows/2003 builds. It is merely a way of seeing a limited set of project names, though when presented on the web page, I do also display some project attributes for each project displayed, like the lable link of the current build as well as the last good build. - the general notion of attributes associated with each: - build (instance) - project - group - the whole system attributes as in annotations or as in related data? I'm not sure what is the difference between annotations (like added noted, do you mean?) and related data. But what these tables do is, basically to allow one to add new fields to the associated table without a schema change. They are name/value pairs, with the added key (foreign key) of the id of the table to which they relate. Thus, for the Project table: Project Attributes: - proj_id (foreign key) - name (string, key) - value (blob) Project: - proj_id (key) - ... etc ... In the case of the system wide attributes table, there is no id field. That table I use for stuff like debug on/off/level, motd, and so far little else. And since my system is focused on creating interaction between people about given built baselines, I have the notion of a notes history associated with any given build, in a similar spirit as the comment history of a given bug in bugzilla. I like the concept of allowing bugzilla-style communication to happen without requiring people to subscribe to various mail lists, like a common ground for communication to happen. But I don't want this to be too global, because I want gump-related discussions to happen on the mail list. You could tie-in email notification when this table is updated. We don't do that, but it's not a bad idea. Bugzilla of course does this. Like the notes table, I have separate tables for (references to) artifacts, yes, the artifact table is missing, that's a good point. I use the notion of an external Artifact Repository and refer into that with this table. The artifacts themselves are not stored in the database nor on the database server. Just wanted to be clear about that. The notion of an Artifact Repository: ah, well, I have my idea of what I want, and then there's the reality that we don't have much more (at present) than a web-based storage mechanism, organized hierarchically within the file system. Thus version information is exposed in the file-path name space, and 3rd party artifacts are managed in yet another system. My notion of an Artifact Repository would be a place to store any 3rd party artifact that any build could depend on. Build themselves would be producers, but could also be consumers. One of the main points of this is: that I separate, architecturally, the Artifact Repository, as a separate service, from the build system itself. and another for results, to support any arbitrary number of artifacts/results to a given build-instance. Good point. [...] So, things missing are: 1) bugzilla like comments (on build results only? or what else?) 2) artifact table / artifact type table Anything else you guys see missing? Note: The results per build table (to support an arbirary number of results per build) was a separate table from the artifact table. In fact, at present in my schema, for a single build table entry, there can be: - any number of notes - any number of artifacts - any number of results I separate artifacts (products of a build) from results (meta data or things we know about or learn about the build products). A result entry has one of four possible states in my schema: 1. unset, 2. pass, 3. warn, 4. fail (to which I map the obvious color in the web presentation ;) -- extrapolating/generalizing that my sampling of the world's traffic/semaphore lights extends to the rest of the world; 7 countries on 4 continents - a good but small sample). And unset = white. Thanks for including me in the discussion. I look forward to more. wade
Re: [RT] Gump 3.0 - Database Model
Stefano, Some afterthoughts. Hopefully to help clarify. The scope of a Project in our system (currently) is that of a build (a series of builds) for a given instance of (1) product-release on a given (2) target. This of course means that a single configuration for a given instance of #1 would then fan out to several Projects (as we have used this word). I am not completely happy with this arrangement, since our Project does not distinguish between: (a) separate configurations, or (b) the same configurations build on different targets. And somehow I think this distinction should be more clearly represented in the data model. I think if (1) were to be defined as the Project and the (2)'s under it would be SubProject (to use some names), and keep the arbitrary grouping mechanism, though now at the SubProject level, then I think we've gained something w/o any other feature loss. wade [EMAIL PROTECTED] wrote on 12/13/2004 09:07:32 AM: Stefano: See my responses below. [...]
RE: [RT] Gump 3.0 - Database Model
Just catching up on my email after being gone for a week. One thing that strikes me about the project id's is that this seems to continue the same discussion we have had in the past about maven generated project id's versus the gump project id's... Do the project id's have to have meaning? While it's nice to look at a project id and pick out some data, like the version and the timestamp or what not, eventually gump will run into another project where the id's mean something different and are generated differently. I don't mind a project id like 787234 that I then look up and find out is what ever specific meaning it has. Like version, or host, or whatnot. I think that when we establish project naming conventions we'll run into conflicts with how other projects name themselves I would welcome project IDs of the form http://www.apache.org/projects/cocoon and then http://www.apache.org/projects/cocoon#v1.0 for a particular released version, or http://www.apache.org/projects/cocoon#20041210 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0 - Database Model
Eric Pugh wrote: Just catching up on my email after being gone for a week. One thing that strikes me about the project id's is that this seems to continue the same discussion we have had in the past about maven generated project id's versus the gump project id's... Do the project id's have to have meaning? While it's nice to look at a project id and pick out some data, like the version and the timestamp or what not, eventually gump will run into another project where the id's mean something different and are generated differently. I don't mind a project id like 787234 that I then look up and find out is what ever specific meaning it has. Like version, or host, or whatnot. I think that when we establish project naming conventions we'll run into conflicts with how other projects name themselves I would welcome project IDs of the form http://www.apache.org/projects/cocoon and then http://www.apache.org/projects/cocoon#v1.0 for a particular released version, or http://www.apache.org/projects/cocoon#20041210 Eric, I really don't care what ID we choose, as long as it does identify something univocally also in a global and distributed environment. -- Stefano. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0
On Wed, 08 Dec 2004, Stefan Bodewig [EMAIL PROTECTED] wrote: On Mon, 06 Dec 2004, Stefano Mazzocchi [EMAIL PROTECTED] wrote: So, here is my first suggestion: split gump in three stages. 1) metadata aggregation 2) build 3) build data use Sounds good. One additional thing. I'd love to have part 2 separated into at least three steps that can get invoked indiviually: 2a) SCM update 2b) syncing updated working copy with workspace 2c) building With traditional Gump it has been possible to modify classes in the workspace and rebuild using Gump. This has been very useful in resolving Gump problems in the past. Right now I don't see an easy way to do this. For example, I fixed the commons-jelly-tags-ant build by patching the jelly-util taglib. I verified it would fix the Gump build by applying my patch locally and only building commons-jelly-tags-util and after that commons-jelly-tags-ant. Using current Gump my local patch would have been blown away by CVS updates or syncs - unless I applied it in what is supposed to be a clean checkout and disconnected from the network. Also, just building commons-jelly-tags-util and commons-jelly-tags-ant without rebuilding Ant and all that seems to be impossible right now (I may be wrong, though). Stefan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0
2a) SCM update 2b) syncing updated working copy with workspace 2c) building We do actually have 2a and 2c already, in bin/build.py and bin/update.py, they just never got the usage/fixing they might need. regards Adam - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0 - Database Model
[EMAIL PROTECTED] wrote: This is cool. FWIW, here's some bits from my experience, implemeting something similar in a MySQL database. Awesome! In my Build Results system, I have a schema that also includes a few additional things: - abritrary groupings of projects, which helps in organizaing various forms of the presentation of the data Can you elaborate more on this? - the general notion of attributes associated with each: - build (instance) - project - group - the whole system attributes as in annotations or as in related data? And since my system is focused on creating interaction between people about given built baselines, I have the notion of a notes history associated with any given build, in a similar spirit as the comment history of a given bug in bugzilla. I like the concept of allowing bugzilla-style communication to happen without requiring people to subscribe to various mail lists, like a common ground for communication to happen. But I don't want this to be too global, because I want gump-related discussions to happen on the mail list. Like the notes table, I have separate tables for (references to) artifacts, yes, the artifact table is missing, that's a good point. and another for results, to support any arbitrary number of artifacts/results to a given build-instance. Good point. This could be hidden in your diagram inside the builds entity/table, but wasn't explicit. No, you're right, we need to add that. I've built a lot of generality into my schema, since I need to support many inputs into this database, from various (new and old) build systems. Thus things like the result table is kept very general within the database. One area that is not very well thought out (in my case) are how results and/or build instances depend on each other, a core requirement for Gump, as it seems. Hope this helps. Comments? So, things missing are: 1) bugzilla like comments (on build results only? or what else?) 2) artifact table / artifact type table Anything else you guys see missing? -- Stefano. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0
Ok, here is my thinking on how we proceed towards Gump 3.0, i.e.: 1) Metadata Gathering 2) Processing (Build/Sync/Update) 3) Results/Presentation/History Query/Analysis Fnor *now* ... 1) Phase One (Metadata Gathering) is simply the way to get XML documention into a local file system for Gump to process. Eventually this could be crawlers (etc.) that parse GOMs and POMs, but (for now) the CVS update HTTP gets are tolerable. [If anybody has an itch to tackle this first, speak up, but I think it is a reasonable/significant amount of work and (IMHO) can wait a little while longer.] 2) Phase Two (Building) is what we currently have as core, but that outputs to an historical database (plus some files for those w/o huge databases). It will not do RDF/RSS/Atom/Notification/XHTML Presentation (or XDOCS). It will not do Stats (neither XHTML presentation nor internal to DBM) nor will it do XRef (XHTML). 3) Phase Three (Analysis/Communication) is a whole new world; re-writting the 'will not do' list from above from the results database. This could be Python code, or Cocoon, or ... I'd like to focus my time on (2) and request that others help with (3). Question: We currently run JDK1.5 and Kaffe off TRUNK not LIVE. Ought we change this? Alternatively, ought we perform this Gump work in a separate branch. I think I can add to the current w/o too much instability, then remove stuff when needed. I'm game to listen to others opinions/concerns though. [FWIIW: Personally, I'd love to get back to NAnt building except that Mono is still my roadblock.I think Gump 3.0 ought be far less resource bound, and it ought help us simplify running/operating Gump. As such, I hope it leads to more users and hence more hands to help with NAnt, etc.] regards, Adam - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0
Adam R. B. Jack wrote: Ok, here is my thinking on how we proceed towards Gump 3.0, i.e.: 1) Metadata Gathering 2) Processing (Build/Sync/Update) 3) Results/Presentation/History Query/Analysis Fnor *now* ... 1) Phase One (Metadata Gathering) is simply the way to get XML documention into a local file system for Gump to process. Eventually this could be crawlers (etc.) that parse GOMs and POMs, but (for now) the CVS update HTTP gets are tolerable. [If anybody has an itch to tackle this first, speak up, but I think it is a reasonable/significant amount of work and (IMHO) can wait a little while longer.] +1 2) Phase Two (Building) is what we currently have as core, but that outputs to an historical database (plus some files for those w/o huge databases). It will not do RDF/RSS/Atom/Notification/XHTML Presentation (or XDOCS). It will not do Stats (neither XHTML presentation nor internal to DBM) nor will it do XRef (XHTML). +1 3) Phase Three (Analysis/Communication) is a whole new world; re-writting the 'will not do' list from above from the results database. This could be Python code, or Cocoon, or ... I'd like to focus my time on (2) and request that others help with (3). I'm game. I can take ownership of #3. Question: We currently run JDK1.5 and Kaffe off TRUNK not LIVE. Ought we change this? yeah, it makes sense. Alternatively, ought we perform this Gump work in a separate branch. I think I can add to the current w/o too much instability, then remove stuff when needed. I'm game to listen to others opinions/concerns though. Currently, Dynagump is the code name for #3 and does not depend on any code from Gump (only on a common database schema). I think we keep it the way it is for now, we can move stuff back and forth later on, thanks to SVN. [FWIIW: Personally, I'd love to get back to NAnt building except that Mono is still my roadblock.I think Gump 3.0 ought be far less resource bound, and it ought help us simplify running/operating Gump. As such, I hope it leads to more users and hence more hands to help with NAnt, etc.] I personally would love to see Mono stuff being gumped as well, but it's a low priority for me ATM. -- Stefano. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0
On Fri, 10 Dec 2004, Adam R. B. Jack [EMAIL PROTECTED] wrote: [FWIIW: Personally, I'd love to get back to NAnt building except that Mono is still my roadblock. I still don't quite understand why it works far better on my oldish RedHat box either. Hmm, have we tried Mono 1.0.4 or even 1.0.5 (released today 8-) yet? Anyway. Once I merge my lst commit to the live branch we will build apr-util against apr and everything should be there to support configure/make based projects (we may need env variable support). My next prio will be documenting the stuff so that others like Graham can get their feet wet - and then head towards NAnt and Mono. This is what I expect to be able to do, I'll probably never dive into Python (lack of time - and admittedly it hasn't been fun yet, either) deep enough in order to scratch more than the surface. Stefan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0 - Database Model
Since I received no pushback on my proposal, let's move on discussing the database model. I see this model is good enough for certain aspects of the proposed 3.0, but not for all. We can't store the metadata in it, in order to perform builds from, there is clearly insufficient information. That said, I am more than happy to start on a 3.0 break-up by splitting the outputs from the presentation of those outputs via this model. That said, I still need more information on the contents of ids (and such), to verify the model is correct. Here are some initial reactions: One thing I noticed you mentioned was a desire for this database model to allow Gump to be distributed. I like that goal. We can't assume one host can do all builds (although Brutus is doing a fine fine job) so perhaps we could allow different hosts to build and contribute data for individual aspects. Maybe this is a goal to work towards, not focus on now, but I beleive that project id including a host are not correct (they ought be independent of the host) [Q: Are we comfortable with allowing remote hosts to connect to a center MySQL database, or do we need an intermediary representation and more secure protocol for such?] Do we need environment, i.e, kaffe or JDK 1.5 or whatever? Ought we have hosts/workspaces as mainly informational, with environment (what ought be the only differentiator for two builds of the same stuff, at exact time) as the key to builds? Do we need to allow build output to be optionally outside of the database, for those of us w/o terrabytes to spare? I like dependency within the database, but do we need more information (such as optional, etc.) on that? Also, one key piece of information in the current object model (which is used to document from) is cause. We didn't build this thing 'cos X failed to build. That, along with annotations (we build this, but w/o X 'cos it was an optional failed dependency), seem important. Personally I like all the information on this page being available. http://brutus.apache.org/gump/public/ant/ant/details.html Maybe (as a transition) we generate simple pages from the existing object model, but generate a results database (with history) and migrate more and more to it over time. Thanks, both, for putting this together. regards Adam - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0 - Database Model
This is cool. FWIW, here's some bits from my experience, implemeting something similar in a MySQL database. In my Build Results system, I have a schema that also includes a few additional things: - abritrary groupings of projects, which helps in organizaing various forms of the presentation of the data - the general notion of attributes associated with each: - build (instance) - project - group - the whole system And since my system is focused on creating interaction between people about given built baselines, I have the notion of a notes history associated with any given build, in a similar spirit as the comment history of a given bug in bugzilla. Like the notes table, I have separate tables for (references to) artifacts, and another for results, to support any arbitrary number of artifacts/results to a given build-instance. This could be hidden in your diagram inside the builds entity/table, but wasn't explicit. I've built a lot of generality into my schema, since I need to support many inputs into this database, from various (new and old) build systems. Thus things like the result table is kept very general within the database. One area that is not very well thought out (in my case) are how results and/or build instances depend on each other, a core requirement for Gump, as it seems. Hope this helps. Comments? wade Stefano Mazzocchi [EMAIL PROTECTED] wrote on 12/08/2004 06:32:34 PM: Since I received no pushback on my proposal, let's move on discussing the database model. I think the first step is to identify the entities that we want to model, their relationships and their respective cardinality. Here is what Leo and I came up with so far (attached as PDF). Comments/criticism/questions appreciated. -- Stefano.
Re: [RT] Gump 3.0
On Mon, 06 Dec 2004, Stefano Mazzocchi [EMAIL PROTECTED] wrote: So, here is my first suggestion: split gump in three stages. 1) metadata aggregation 2) build 3) build data use Sounds good. We should be maintaing the metadata representation only for the projects that don't have that data integrated in their build system (like pure ant projects or make/configure projects). Even the later may have them in some form, like RPM spec files, it may be worth to look into them (some time later) as well. Stefan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0 - Database Model
Stefano Mazzocchi wrote: Since I received no pushback on my proposal, let's move on discussing the database model. I think the first step is to identify the entities that we want to model, their relationships and their respective cardinality. Here is what Leo and I came up with so far (attached as PDF). Comments/criticism/questions appreciated. Hmmm, trying again. -- Stefano. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0 - Database Model
must be stripping attachemnts - maybe it can be put on the wiki or something? On Wed, 08 Dec 2004 19:35:18 -0500, Stefano Mazzocchi [EMAIL PROTECTED] wrote: Stefano Mazzocchi wrote: Since I received no pushback on my proposal, let's move on discussing the database model. I think the first step is to identify the entities that we want to model, their relationships and their respective cardinality. Here is what Leo and I came up with so far (attached as PDF). Comments/criticism/questions appreciated. Hmmm, trying again. -- Stefano. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0 - Database Model
Stefano Mazzocchi wrote: Stefano Mazzocchi wrote: Since I received no pushback on my proposal, let's move on discussing the database model. I think the first step is to identify the entities that we want to model, their relationships and their respective cardinality. Here is what Leo and I came up with so far (attached as PDF). Comments/criticism/questions appreciated. Hmmm, trying again. Damn, it seems that my attachments get filtered out. All right, find it over here: http://tinyurl.com/4qt9a -- Stefano. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [RT] Gump 3.0
Stefano Mazzocchi wrote: Comments? Not really. Most of it sounds obvious by now, actually :-D More images related to this architecture are at: http://svn.apache.org/repos/asf/gump/trunk/src/xdocs/gump.pdf though I'm afraid some of the comments in the gump.ppt alongside there didn't make it into the PDF. I'll also point out that your RT (probably on purpose) leaves out a *lot* of talk about (lifting) social limitations. The fun bit about the thinking there is that it tends to span all those stages and database. That really needs to be written down as well at some point so some of the design decisions make more sense :-D Finally I'll point out (just to keep this e-mail short, really, there's a lot to say), one other thing to realize is that this DB-based-architecture will help us move away from the batch-based approach we have right now. - LSD - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]