Re: [sage-devel] Regression testing

Robert Bradshaw Wed, 20 Oct 2010 21:56:16 -0700

On Wed, Oct 20, 2010 at 5:33 PM, David Roe <r...@math.harvard.edu> wrote:
> There are a number of tickets in trac about performance regressions in
> Sage.  I'm sure there are far more performance regressions which we don't
> know about because nobody noticed.
>
> As someone writing library code, it's generally not obvious that one is
> about to introduce a performance regression (otherwise you'd probably not do
> it).  There have been instances where I've wondered whether a change I was
> making might have unintended consequences for performance, but testing all
> the ways in which this could manifest is laborious.
>
> Consequently, I've been thinking recently about how to detect performance
> regressions automatically.  There are really two parts to the problem:
> gathering timing data on the Sage library, and analyzing that data to
> determine if regression have occurred (and how serious they are).


+1. This has been lacking for a long time.

> Data gathering:
>
> One could modify local/bin/sage-doctest to allow the option of changing each
> doctest by wrapping it in a "timeit()" call.  This would then generate a
> timing datum for each doctest line.
> * these timings vary from run to run (presumably due to differing levels of
> load on the machine).  I don't know how to account for this, but usually
> it's a fairly small effect (on the order of 10% error).
> * if you're testing against a previous version of sage, the doctest
> structure will have changed because people wrote more doctests.  And doctest
> lines depend on each other: you define variables that are used in later
> lines.  So inserting a line could make timings of later lines incomparable
> to the exact same line without the inserted line.  We might be able to parse
> the lines and check that various objects are actually the same (across
> different versions of sage, so this would require either a version-invariant
> hash or saving in one version, loading in the other and comparing.  And you
> would have to do that for each variable that occurs in the line), but that
> seems to be getting too complicated...
>
> Many users will only have one version of sage installed.  And even with
> multiple versions, you need somewhere to PUT the data in order to compare to
> later.
>
> One way of handling these problems is to create a relational database to put
> timings in.  This could also be interesting for benchmarketing purposes: we
> could have timings on various machines, and we highlight performance
> improvements, in addition to watching for performance regressions.
>
> So, here's a first draft for a database schema to put timing data into.
> I've included a description of each table, along with a description of
> columns I thought were non-obvious.  I'm definitely interested in
> suggestsion for improving this schema.
>
> Table: Hosts
> # computer information; including identifying data to determine when running
> on same host
> col: id
> col: identifying_data # some way to uniquely identify the computer on which
> a test is being run. Presumably the output of some unix function, but I
> don't know what.
>
> Table: Sage_version
> # a table giving each existing version of Sage an id
> # the ids for official sage releases should be consistent across databases;
> # users can also create their own temporary versions which use a different
> block of id numbers.
> # when a new version is created, the Files, Doctest_blocks and Doctest_lines
> tables are updated
> col: id
> col: version_name # string defining the version
> col: prev_version_id # versions should be totally ordered.  Since we want to
> reserve some id space for official sage versions and other space for users,
> this can't be done by just the numerical id.
>
> Table: Files
> # a table for all the files in sage that have doctests
> col: id
> col: filename # e.g. "sage/rings/integer.pyx"
> # we might also want some other data, like when the file was added (or
> deleted), whether it's a python vs cython file….
>
> Table: Doctest_blocks
> # a table for the triple-quote-encased strings that hold doctests
> # theoretically, these could be smaller groupings within such a larger
> block.  But while it's generally true that doctest blocks can be executed
> independently of each other, it's hard to identify subsets of the blocks
> that are independent of each other.
> col: id
> col: name # usually the name of the function, class or file to which this
> doctest belongs
> col: version_id # sage version in which this block lies
> col: file_id # sage file in which this block lies
> col: prev_id # doctest block id for analogous block in previous version of
> sage.  -1 indicates nonexistent
>
> Table: Doctest_lines
> # a table for lines beginning with "sage:" that are executed by the doctest
> framework.
> # lines that occur many times in the Sage library (e.g. "R.<x> = QQ[]") have
> col: id
> col: code # e.g. "R.<x> = QQ[]"
> # could theoretically support storing equivalent lines of code, such as
> "R.<x> = PolynomialRing(QQ)"
>
> Table: Block_line_incidences
> # a table to record the many-to-many relation between doctest_blocks and
> doctest_lines
> col: id
> col: block_id
> col: line_id
> col: line_number # which line in the block this is, e.g. 1 (or 0?) if this
> is the first doctest line in the block
>
> Table: Regression_runs
> # a table to record times (possibly on different machines) where data was
> entered into this database from a regression-test-run
> col: id
> col: host_id # the computer on which this was run
> col: version_id # the sage version
> col: start_time # the time this run was started
> col: duration # how long this run lasted (would end_time be more useful?)
> # doesn't necessarily have to be a run through the whole library; could only
> include a subset of the files.
> # could include other info, like loading, though I don't know how to measure
> this…
>
> Table: Timing_data
> # A table to record the timing data we're actually interested in
> col: id
> col: bli_id # the id for a block_line_incidence
> col: run_id # the regression run that this measurement was a part of
> col: cpu_time # the amount of time this took to run, as measured through
> timeit
> # presumably we require that the two version_ids you can get tracing up
> through the bli and the run are the same version.
>
>
> There are also questions of how this would be distributed.  Presumably the
> data part wouldn't come standard with sage.  Maybe an optional spkg?  Of
> course, you're mainly interested in comparing DIFFERENT versions of sage on
> the SAME computer, which doesn't really fit with how we normally distribute
> code and data.
>
>
> Analysis:
> Once we have the database set up and have gathered some data, you can do all
> kinds of things with it.  I'm most interested in how to find speed
> regressions, and I can certainly imagine writing code to do so.  You have
> data from a previous version of sage (or your own, before you applied some
> patch(es)) and you run a regression test with "sage -rt" or something; this
> generates the same kind of timing data and then you look for slowdowns,
> either absolute or relative to the average ratio of your current run to the
> previous run for a given line of code.  It can then print out a sorted list
> of lines with slowdowns, or ones above a certain threshold, or...
>
> I'd like to hear more ideas on how to improve all of this.

timit() might be a bit extreem--I think it performs a minimum of 5
loops. I don't look forward to 15-cpu-hour test runs, ideally we could
just collection (noisier) timing information by default. Just have it
spit out a sqlite database object (perhaps in a default location if
none is specified).

I would say you don't really need a complicated schema. Perhaps a
single table would suffice, with columns

OS info/id
Sage version
hg changeset (useful to tell if it's vanilla)
file
doctest identifier (fully qualified function name?)
doctest content
cpu time
wall time
run id (unique per run)

There'd be a lot of redundancy in some of the columns, but that's
OK--it'll just compress nice. Perhaps it could even be a comma/tab
separated text file, to make things even simpler.

A separate tool could be used to compare runs, flagging any egregious
(or favorable) differences. Matching could be done in several ways--on
the text content, on the method name, etc.

- Robert

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Re: [sage-devel] Regression testing

Reply via email to