On Mon, Oct 25, 2010 at 4:23 PM, Mitesh Patel <qed...@gmail.com> wrote: > On 10/25/2010 01:54 PM, William Stein wrote: >>> Also, I was talking to Craig Citro about this and he had the >>> interesting idea of creating some kind of a "test object" which would >>> be saved and then could be run into future versions of Sage and re-run >>> in. The idea of saving the tests that are run, and then running the >>> exact same tests (rather than worrying about correlation of files and >>> tests) will make catching regressions much easier. >> >> Wow, that's an *extremely* good idea! Nice work, Craig. >> Basically, we could have one object that has: >> >> (a) list of tests that got run. >> (b) for each of several machines and sage versions: >> - how long each test took >> >> Regarding (a), this gets extracted from the doctests somehow for >> starters, though could have some other tests thrown if we want. >> >> I could easily imagine storing the above as a single entry in a >> MongoDB collection (say): >> >> {'tests':[ordered list of input blocks of code that could be >> extracted from doctests], >> 'timings':[{'machine':'sage.math.washington.edu', >> 'version':'sage-4.6.alpha3', 'timings':[a list of floats]}, >> {'machine':'bsd.math.washington.edu', >> 'version':'sage-4.5.3', 'timings':[a list of floats]}] >> >> Note that the ordered list of input blocks could stored using GridFS, >> since it's bigger than 4MB: >> >> wst...@sage:~/build/sage-4.6.alpha3/devel/sage$ sage -grep "sage:" > a >> wst...@sage:~/build/sage-4.6.alpha3/devel/sage$ ls -lh a >> -rw-r--r-- 1 wstein wstein 9.7M 2010-10-25 11:41 a >> wst...@sage:~/build/sage-4.6.alpha3/devel/sage$ wc -l a >> 133579 a >> >> Alternatively, the list of input blocks could be stored in its own >> collection, which would just get named by the tests field: >> >> {'tests':'williams_test_suite_2010-10-25'} >> >> The latter is nice, since it would make it much easier to make a web >> app that allows for browsing through >> the timing results, e.g,. sort them by longest to slowest, and easily >> click to get the input that took a long time. >> >> Another option: have exactly one collection for each test suite, and >> have all other data be in that collection: >> >> Collection name: "williams_test_suite-2010-10-25" >> >> Documents: >> >> * A document with a unique id, starting at 0, for each actual test >> {'id':0, 'code':'factor(2^127+1)'} >> >> * A document for each result of running the tests on an actual platform: >> {'machine':'bsd.math.washington.edu', 'version':'sage-4.5.3', >> 'timings':{0:1.3, 1:0.5,...} } >> Here, the timings are stored as a mapping from id's to floats. > > > This last option seems most "natural" to me, though identical inputs > that appear in multiple suites would generally(?) get different ids in > the collections. Would it be better to use a hash of the 'code' for the > 'id', or can the database automatically ensure that different ids imply > different inputs?
Yes, the database can automatically ensure that different ids imply different inputs. So your change is to store all inputs in a single collection, with a unique id for each. Then in the collection corresponding to a given test suite, you have one document that has an ordered list of id's of inputs, and the the rest is as before. Thus: 1. A collection named "tests" with documents of the form: {'id':5, 'for i in range(100,120):\n factor(2^i+1)'} This collection can grow larger and larger over time, and could start with the input blocks of current sage doctests. 2. A collection named "timings" with documents of the form: {'machine':'sage.math.washington.edu', 'version':'sage-4.6.alpha3', 'timings':{5:1.2, 7:1.1, 10:'crash', ...}} and that's it. Given those two collections, one can do queries to extract whatever we want, run tests using a subset of the test id's, and also easily input new information. 3. Oh, there could be a third collection that groups together some of the tests, and it could be named test_groups, and could just contain stuff like: {'name':'sage-4.6.alpha3', 'ids':[1,2,5,18, 97]} and {'name':'boeing', 'ids':[95,96,97]} so that it would be easy to run just the tests in a given group, or sort out for a report these tests. -- William > Disclaimer: I'm not familiar with MongoDB. Here's a brief introduction: > > http://www.mongodb.org/display/DOCS/Introduction > > In your experience, are queries fast? For example, if we wanted to see > how timings vary across Sage versions and machines for a specific input? In short yes, if you know what you're doing. If you make an index, then queries are superfast. They are what you would expect otherwise (i.e., linear time...). -- William Stein Professor of Mathematics University of Washington http://wstein.org -- To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org