On Mon, Dec 15, 2025 at 3:35 AM Lucas Nussbaum <[email protected]> wrote:
I've been working on orig-check, a service that tries to reproduce the > generation of upstream tarballs (e.g. .orig.tar.gz) from what is > described in the debian/watch file. > > See https://orig-check.debian.net/ I looked at the statistics page, at https://orig-check.debian.net/statistics, as it exists at this instant (my local time as I start to write this is now ~ 2:50 PM east coast USA standard time, Dec 26, 2025). Comments and suggestions are below. There's also a comment or two about the home page. At this point I consider the service to be in a reasonable state, and > I'm mainly interested in feedback, requests for improvements, etc. > (1) Perhaps provide a "totals" line at the very bottom for each suite. Presumably the percentage for the total for each suite would be 100%. (1a) Oh! I assume the "totals" values for each suite from (1) would always match the actual current number of packages in that suite (at least modulo the last time the service ran on the packages from that suite, e.g. the suite added a package after the service last processed packages from that suite). If not, this is a bug, correct? Should such a bug announce itself on the summary and/or home page? (2) Perhaps provide a "subtotals" line and/or a separate summary grouping table (with totals, of course!) for each logical group of results. At this instant, it seems to me the logical groups are 900/901/910, 800/801, 700, 200 thru 290 (and perhaps a subsubtotal for all the 280 values?), and 120. If a separate summary grouping table, I'm not sure if it'd be best to put it at the top or bottom of the page. I don't know if it would make sense (and/or be doable) to make the logical groups be clickable into a detailed list as the individual results classes are. (2a) A separate summary groupings table as in (2) would also make sense for the home page for this service. (Perhaps instead of putting it on the main summary page?) Again, I don't know if it would make sense / be doable to have the logical groups be clickable. (2b) Perhaps it would make sense to have a separate summary of the "successful" logical groups (900 et al, 800 et al) compared to a total of all successful groups (900 et al group is N% of all successful groups) and/or all successful groups against all groups (successful groups are N% of all groups). Presumably ideally the ratio of 800 et al groups to 900 et al groups would shrink over time as fewer packages require normalization (I assume this is deemed desirable)? And, also, the total of all successful groups to all groups in total. (3) Provide information, such as a timestamp, stating when the summary page was generated; I assume the page is generated dynamically and/or regenerated periodically as the service chugs away. (This would also be applicable to the home page, since at least some of it is apparently dynamically generated.) (3a) Provide information about when the data that the summary page summarizes was last generated or processed. (Also see (6) below.) (4) Perhaps the statistics should be archived periodically, and/or as they are regenerated. If they are generated on-demand and/or frequently (say every 10 minutes), then say hourly for a day or three, daily for a week or two, weekly for a month or two, monthly for a year or two, and yearly indefinitely. (Obviously if they are not generated as frequently, then archiving should not be as frequent as I propose either.) This would allow for analysis of any trends that might show up in this service over time. Archives should be retrievable somehow. (5) Provide a link at the top back to the home page for the service (although I suppose the "Back to All Results" button on the bottom provides that) and other boilerplate. Information about the contact person (you) and the source, copyright and license of the results and data as applicable, thanks to DebianNet (and if this ever transitions to an Official Service to DSA), etc. Some of this would also be applicable on the home page (some of this is already there of course). (6) On the home page, provide information about the frequency and the last time that the orig-check service runs and/or regenerates its data. Is it run manually? As a cron job (when and/or how often)? When a .dsc file is uploaded and/or changed? If when a .dsc file is uploaded / changed, is there also a periodic instance of "dump everything and reboot, start from scratch" processing? That's everything I can think of off the top of my head. Nothing really novel here, just bits and pieces, "nice to haves", rounding out the bulk of what you've already provided (which does look interesting and like it'll be useful to people). Hope this is of some use, interest. Thanks for your time. Be well. Joseph

