On Mon, Oct 25, 2010 at 4:23 PM, Mitesh Patel <qed...@gmail.com> wrote:
> On 10/25/2010 01:54 PM, William Stein wrote:
>>> Also, I was talking to Craig Citro about this and he had the
>>> interesting idea of creating some kind of a "test object" which would
>>> be saved and then could be run into future versions of Sage and re-run
>>> in. The idea of saving the tests that are run, and then running the
>>> exact same tests (rather than worrying about correlation  of files and
>>> tests) will make catching regressions much easier.
>>
>> Wow, that's an *extremely* good idea!  Nice work, Craig.
>> Basically, we could have one object that has:
>>
>>     (a) list of tests that got run.
>>     (b) for each of several machines and sage versions:
>>             - how long each test took
>>
>> Regarding (a), this gets extracted from the doctests somehow for
>> starters, though could have some other tests thrown if we want.
>>
>> I could easily imagine storing the above as a single entry in a
>> MongoDB collection (say):
>>
>>    {'tests':[ordered list of input blocks of code that could be
>> extracted from doctests],
>>     'timings':[{'machine':'sage.math.washington.edu',
>> 'version':'sage-4.6.alpha3', 'timings':[a list of floats]},
>>                    {'machine':'bsd.math.washington.edu',
>> 'version':'sage-4.5.3', 'timings':[a list of floats]}]
>>
>> Note that the ordered list of input blocks could stored using GridFS,
>> since it's bigger than 4MB:
>>
>> wst...@sage:~/build/sage-4.6.alpha3/devel/sage$ sage -grep "sage:" > a
>> wst...@sage:~/build/sage-4.6.alpha3/devel/sage$ ls -lh a
>> -rw-r--r-- 1 wstein wstein 9.7M 2010-10-25 11:41 a
>> wst...@sage:~/build/sage-4.6.alpha3/devel/sage$ wc -l a
>> 133579 a
>>
>> Alternatively, the list of input blocks could be stored in its own
>> collection, which would just get named by the tests field:
>>
>>     {'tests':'williams_test_suite_2010-10-25'}
>>
>> The latter is nice, since it would make it much easier to make a web
>> app that allows for browsing through
>> the timing results, e.g,. sort them by longest to slowest, and easily
>> click to get the input that took a long time.
>>
>> Another option:  have exactly one collection for each test suite, and
>> have all other data be in that collection:
>>
>> Collection name: "williams_test_suite-2010-10-25"
>>
>> Documents:
>>
>>   * A document with a unique id, starting at 0, for each actual test
>>        {'id':0, 'code':'factor(2^127+1)'}
>>
>>   * A document for each result of running the tests on an actual platform:
>>        {'machine':'bsd.math.washington.edu', 'version':'sage-4.5.3',
>> 'timings':{0:1.3, 1:0.5,...} }
>> Here, the timings are stored as a mapping from id's to floats.
>
>
> This last option seems most "natural" to me, though identical inputs
> that appear in multiple suites would generally(?) get different ids in
> the collections.  Would it be better to use a hash of the 'code' for the
> 'id', or can the database automatically ensure that different ids imply
> different inputs?

Yes, the database can automatically ensure that different ids imply
different inputs.

So your change is to store all inputs in a single collection, with a
unique id for each.
Then in the collection corresponding to a given test suite, you have
one document
that has an ordered list of id's of inputs, and the the rest is as
before.  Thus:

1. A collection named "tests" with documents of the form:

{'id':5,  'for i in range(100,120):\n   factor(2^i+1)'}

This collection can grow larger and larger over time, and could start
with the input blocks of current sage doctests.

2. A collection named "timings" with documents of the form:

{'machine':'sage.math.washington.edu', 'version':'sage-4.6.alpha3',
'timings':{5:1.2, 7:1.1, 10:'crash', ...}}

and that's it.

Given those two collections, one can do queries to extract whatever we
want, run tests using a subset of the test id's, and also easily input
new information.

3. Oh, there could be a third collection that groups together some of
the tests, and it could be named test_groups, and could just contain
stuff like:

  {'name':'sage-4.6.alpha3', 'ids':[1,2,5,18, 97]}

and

  {'name':'boeing', 'ids':[95,96,97]}

so that it would be easy to run just the tests in a given group, or
sort out for a report these tests.

 -- William



> Disclaimer: I'm not familiar with MongoDB.  Here's a brief introduction:
>
> http://www.mongodb.org/display/DOCS/Introduction
>
> In your experience, are queries fast?  For example, if we wanted to see
> how timings vary across Sage versions and machines for a specific input?

In short yes, if you know what you're doing.  If you make an index,
then queries are superfast.  They are what you would expect otherwise
(i.e., linear time...).



-- 
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Reply via email to