On Fri, Oct/27/2006 10:31:44AM, Josh Hursey wrote: > > On Oct 27, 2006, at 7:39 AM, Jeff Squyres wrote: > > > On Oct 25, 2006, at 10:37 AM, Josh Hursey wrote: > > > >> The discussion started with the bug characteristics of v1.2 versus > >> the trunk. > > > > Gotcha. > > > >> It seemed from the call that IU was the only institution that can > >> asses this via MTT as noone else spoke up. Since people were > >> interested in seeing things that were breaking I suggested that I > >> start forwarding the IU internal MTT reports (run nightly and > >> weekly) to the test...@open-mpi.org. This was meet by Brain > >> insisting that it would result in "thousands" of emails to the > >> development list. I clarified that it is only 3 - 4 messages a day > >> from IU. However if all other institutions do this then it would be > >> a bunch of email (where 'a bunch' would still be less than > >> 'thousands'). That's how we got to a 'we need a single summary > >> presented to the group' comment. It should be noted that we brought > >> up IU sending to the 'test...@open-mpi.org' list as a bandaid until > >> MTT could do it better. > > > > How about sending them to me and Ethan? > > Sure I can add you both to the list if you like. > > > > >> This single summary can be email or a webpage that people can > >> check. Rich said that he would prefer a webpage, and noone else > >> really had a comment. That got us talking about the current summary > >> page that MTT generates. Tim M mentioned that the current website > >> is difficult to figure out how to get the answers you need. I > >> agree, it is hard [usability] for someone to go to the summary page > >> and answer the question "So what failed from IU last night, and how > >> does that differ from Yesterday -- e.g., what regressed and > >> progressed yesterday at IU?". The website is flexible enough to due > >> it, but having a couple of basic summary pages would be nice for > >> basic users. What that should look like we can discuss further. > > > > Agreed; we aren't super-fond of the current web page, either. Do you > > guys want to have a teleconf to go over the current status of MTT, > > where you want it to go, etc.? I consider IU's input here quite > > important, since you're the ones pushing the boundaries, flexing > > MTT's muscles, etc. > > In my previous email I suggested a couple of questions that I would > like a webpage to answer. A teleconf might be good to talk about some > of the various items that IU is trying to do around MTT. > > > > >> The IU group really likes the emails that we currently generate. A > >> plain-text summary of the previous run. I posted copies on the MTT > >> bug tracker here: > >> http://svn.open-mpi.org/trac/mtt/ticket/61 > >> Currently we have not put the work in to aggregate the runs, so for > >> each INI file that we run we get 1 email to the IU group. This is > >> fine for the moment, but as we add the rest of the clusters and > >> dimensions in the testing matrix we will need MTT to aggregate the > >> results for us and generate such an email. > > > > Ok. > > > > We created another ticket yesterday to make a new MTT Reporter (our > > internal plugins) that duplicates this output format. It actually > > shouldn't be that hard -- we don't have to do parsing to get the > > numbers that you're reporting; we have access to the actual data. So > > it's mostly caching the data, calculating the totals that you're > > calculating, and printing in your output format. > > > > Ethan has some other short tasks to do before he gets to this, but > > its near the top of the priority list. You can see the current > > workflow on the wiki (this is a living document; it keeps changing as > > requirements, etc. change): > > > > http://svn.open-mpi.org/trac/mtt/wiki/TaskPlan > > > > Awesome Thanks! :) > > >> So I think the general feel of the discussion is that we need the > >> following from MTT: > >> - A 'basic' summary page providing answers to some general > >> frequently asked queries. The current interface is too advanced for > >> the current users. > > > > We have the summary.php page, but I personally have never found it > > too useful. :-) > > > > We're getting towards a full revamp of reporter.php (got some other > > tasks to complete first, but we're definitely starting to think about > > it) -- got any ideas / input? Our "haven't thought about it much > > yet" idea is to be more menu/Q-A driven with a few common queries > > easily available (rather than a huge, complicated single screen). > > See previous email for some general ideas. Tim M might have a few > more that he would like to see since he is the one at IU that is > watching the nightly results the closest. > > > > >> - A summary email [in plain-text preferably] similar to the one > >> that IU generated showing an aggregation of the previous nights > >> results for (a) all reporters (b) my institution [so I can track > >> them down and file bugs]. > > > > For the moment, we don't have the dynamic capability for you to login > > to the web page, create a report, and say "mail this to me nightly". > > However, Ethan can make up custom reports on the server quite easily > > -- if you want some IU-specific reports, just file a ticket and > > Ethan can Make It So. > > > > Cool. We'll talk it over and see what we would like. > > > >> - 1 email a day on the previous nights testing results. > > > > That's what we intended for the mails that are coming today, but it > > seemed to not be sufficient -- we ended up with 4 nightly mails, one > > for each relevant phase failures and a 4th for showing stderr of mpi > > installs. > > > >> Some relevant bugs currently in existence: > >> http://svn.open-mpi.org/trac/mtt/ticket/92 > >> http://svn.open-mpi.org/trac/mtt/ticket/61 > >> http://svn.open-mpi.org/trac/mtt/ticket/94 > >> > >> > >> The other concern is that given the frequency of testing as bugs > >> appear from the testing someone needs to make sure the bug tracker > >> is updated. I think the group is unclear about how this is done. > >> Meaning when a MTT identifies a test as failed whom is responsible > >> for putting the bug in the bug tracker? > > > > At the moment, I've been manually examining the mails every day and > > firing off e-mails to those responsible. However, due to travel last > > week and this week, I've gotten quite behind. :-( > > I wonder if there is a way to do something more automated. Probably > too advanced for MTT 2.0 or 3.0, but something to think about. Maybe > tie it in with the bug tracker, so send a "Bug Master Engineer" an > aggregated list of failures that can be easily put into TRAC. Donno.. > just an idea to help take the burden off of you. >
I think for starters, we want to at least tie in trac links anywhere on the webpage an ompi rev number is referenced. Then given a changeset, we can figure out contact info ... ? > > > >> The obvious solution is the institution that identified the bug. > >> [Warning: My opinion] But then that becomes unwieldy for IU since > >> we have a large testing matrix, and would need to commit someone to > >> doing this everyday (and it may take all day to properly track a > >> set of bugs). Also this kind of punishes an institution for testing > >> more instead of providing incentive to test. > > > > True. I don't know the proper answer to this, either -- I know the > > "Jeff look at e-mail" solution doesn't scale well. > > > >> ------ Page Break -- Context switch ------ > >> > >> In case you all want to know what we are doing here at IU. I > >> attached to this email our planed MTT testing matrix. Currently we > >> have BigRed and Odin running the complete matrix less the BLACS > >> tests. Wotan and Thor will come online as we get more resources to > >> support them. > >> > >> In order to do such a complex testing matrix we have various .INI > >> files that we use. And since some of the dimensions in the matrix > >> are large we break some of the tests into a couple .INI files that > >> are submitted concurrently to have them run in a reasonable time. > >> > >> <MTT-testing-matrix.txt> > > > > Awesome. > > > > I would like to schedule some phone time with you guys and Ethan and > > me to talk about what's working, what's not working, etc. One > > obvious question I have is: is the INI config file format suitable? > > Do we need to do something more complex that would allow > > consolidation of your various configurations? ...etc. > > Tim M and I spent the better part of two days revamping our current > setup to do some more 'advanced' things (Parallel builds, etc...). We > are putting all of these scripts in ompi-tests/iu/mtt in case anyone > wants to see how we are doing it and use that as an example for doing > something similar. > > Basically our problems are: > - Testing results come in at various times as they complete, we > would really like a 'status report' at 8 am every day finished or not. For now, could you load up this webpage everyday at 8am? http://tinyurl.com/ydt777 > - Due to the combinatorial effect of MTT this lends itself to some > obvious parallelism. Can we harness that to reduce the time to > complete the testing cycle. I brought that up at the last MTT "Developers Conference" :) TET does it. Sounds like a 3.0 (or 4.0) thing. > - We will soon have 4 clusters [wotan, bigred, odin, thor] each > running 3 branches [trunk, v1.2, v1.1], 2 different builds [64 bit > gcc, 32 bit gcc] every night! That 24 sets of the nightly tests, and > we have biweekly tests in there as well :o. That means a lot of INI > files that basically say the same thing. > > What we are trying to do: > - Generalize the INI files with default sets that can be plugged in. FWIW, I've been able to generalize INI sections by splitting them out into seperate INI files. E.g., [Reporter], [Mpi get: trunk], [Test Run: trivial], etc. are always the same, but I have four different [MPI install] sections (for each combo of 32/64-bit and Sparc/i386). I can then cat specific INI files (some contain single INI sections) into client/mtt with the '-' option for whichever configuration I'm testing. Also, I wonder if --[no]-section and command-line INI param overrides would help? (see http://svn.open-mpi.org/trac/mtt/attachment/ticket/61/do_mtt.pl). > - Make the scripts more general so they can be used easily across > all clusters > - Reduce the number of emails to at most 2 from the nightly runs > per cluster [Progress, and Final] -- we are not using SLURM, LL, and > a hostlist in or runs. > - Increase the parallelism per stage as much as possible, in as > general a way as possible. > - 8 am (or 10 am) status report from our script to check on the run > as it goes. > > We already have a list of refinements that we would like to add to > this new script setup, but those are a bit more advanced (e.g., using > a mgr/worker model to use allocations as they become available, using > a queue to order the tests into the most important, etc.) > > One thing that would be nice if MTT could do, but would be initially > institution specific would be a custom trigger for aggregation from > the MTT server. The problem is that we currently get 2 emails from > each cluster every night (this does not include the weekly runs) so > that will be 8 emails a day, which can be a bit hard to parse. If we > put the aggregation code close to the server (or just had a way for > us to query the DB from the IU side via ODBC stuff) then we could > have the aggregation function generate 2 emails which include results > from all clusters. So 2 giant emails instead of 8 smaller emails. > Just an idea, but if you gave me the information so that I can send > queries to the MTT database I can mock up something and we can all > experiment with it to see if we can generalize a bit. Obviously the > 'guest' user access to the DB that this aggregation function will use > would only have read access since we don't want it modifying the DB. > > So generally yeah I think we would like to have a teleconf to talk > about our experiences with MTT, and what we have done around it to > fit our needs. We realize that we are pushing it a bit further than > others, so we are fine with doing a home brewed solution for a while > until MTT is able to replicate the functionality. When does everyone want to talk? -Ethan > > Thanks! > Josh > > > > > -- > > Jeff Squyres > > Server Virtualization Business Unit > > Cisco Systems > > > > _______________________________________________ > > mtt-users mailing list > > mtt-us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users > > ---- > Josh Hursey > jjhur...@open-mpi.org > http://www.open-mpi.org/ > > _______________________________________________ > mtt-users mailing list > mtt-us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users