Okay, the general vibe I got from the responses is:

* Report metadata isn't useful enough without the report text
* Data for old distributions matters for the people maintaining systems that 
use them

I think the best path forward might be to only do these things:

1. Archive the full report text for reports older than 5 years
2. Keep only the full report text in the database for 5 years
3. Keep metadata and statistics in the database forever

Then we can build a site that can read those old reports from the archive 
files. Since the most common use-case for a visitor (and correct me if I'm 
wrong) is to go look up the reports for a specific distribution on a specific 
Perl/platform, no functionality is lost. With development we can even make some 
filtering / searching of the archived reports possible.

At the moment, keeping metadata forever should not be a huge issue: If I fix 
the metadata to remove some duplicate data and normalize it a bit better, I can 
even make it smaller.

The full data retention policy then becomes the following decision tree:

* Reports (full report data)
        * Reports submitted >5 years ago
                * Release on CPAN
                        - Report archived
                * Release not on CPAN
                        - Report archived
        * Reports submitted <5 years ago
                * Release on CPAN
                        + Report available
                * Release not on CPAN
                        + Report available
* Metadata (release, Perl version, Perl architecture, OS name/version, test 
reporter, date/time, pass/fail status)
        * Reports submitted >5 years ago
                * Release on CPAN
                        + Metadata available
                * Release not on CPAN
                        + Metadata available
        * Reports submitted <5 years ago
                * Release on CPAN
                        + Metadata available
                * Release not on CPAN
                        + Metadata available
* Statistics (release, pass/fail count)
        * Reports submitted >5 years ago
                * Release on CPAN
                        + Statistics available
                * Release not on CPAN
                        + Statistics available
        * Reports submitted <5 years ago
                * Release on CPAN
                        + Statistics available
                * Release not on CPAN
                        + Statistics available

I'll start planning out the scripts needed to achieve this, and when I'm ready 
to do something, I'll make an announcement and give some time for additional 
comments.

Thanks,


Doug Bell
d...@preaction.me


Reply via email to