Bug#999677: RFP: popcon-stats-data -- Debian's Popularity Contest statistics
On Wed, Nov 17, 2021 at 08:05:38AM +0800, Paul Wise wrote: > On Tue, 2021-11-16 at 13:38 +0100, Bill Allombert wrote: > > > What is the idea exactly ? > > Bálint's idea was to ship popcon data in a popcon-stats-data package in > the Debian archive. I suggested to instead ship that in the apt > metadata present in the Packages files. > > > How often the popcon data are going to be refreshed ? > > I would assume with the same frequency as the existing data on the > popcon.d.o website is refreshed. Anything faster than that would just > be refreshing unchanged data. Anything slower than that would be > providing outdated data. Outdated data is fine though, so maybe weekly. > > > Which exact set of data are going to be used ? > > Initially I thought similar to the QA per-package popcon data: > > https://qa.debian.org/popcon.php?package=iotop > > Package: iotop > Popcon: 30314 7962 21197 1143 12 > > If I massage the by_inst file into the same format as this, I calculate > that the extra Popcon fields would add 3.7 MB to the Packages files and > that data would change often, making the apt updating process slower. > So probably the data should go into new files instead and there should > be a config file snippet to enable downloading them, a tool to query > and index them and a way for apt clients to get that data. > > Since the Debian repository splits the metadata by suite and component, > these new statistics should probably do the same. So the raw popcon > submissions would need to be individually mapped to a suite based on > the popcon version in the submission, and then each item in the > submission attributed to that suite/component. For popcon versions that > don't match a suite, if they match a known Debian version, attribute > them to the next highest suite and discard submissions with popcon > versions that were never in Debian, or maybe attribute them to the > relevant vendor separately. popcon submissions that don't have Debian > as the vendor probably should be discarded, or maybe attribute them to > the relevant vendor separately. So the idea is to have a Popcon file for each suite ? So let say bookworm is released today. What bookworm/Popcon will contain ? We release a new popularity-contest package. What sid/Popcon will contain ? The package migrate to testing; What testing/Popcon will contain ? As I understand, the metadata for stable are only updated with point releases. Would that be the same for stable/Popcon ? I still do not quite see how this would work... We do not want to provide data generated from a very small subset of reports for accuracy and privacy reasons. The current all-popcon-result.gz/stable-popcon-result.gz split is middle ground between competing constraints. What not instead write a tool to download all-popcon-result.gz or stable-popcon-result.gz when needed, and cache them ? This can then be processed by a tool that makes suggestions. Cheers, -- Bill. Imagine a large red swirl here.
Bug#999677: RFP: popcon-stats-data -- Debian's Popularity Contest statistics
On Tue, 2021-11-16 at 13:38 +0100, Bill Allombert wrote: > What is the idea exactly ? Bálint's idea was to ship popcon data in a popcon-stats-data package in the Debian archive. I suggested to instead ship that in the apt metadata present in the Packages files. > How often the popcon data are going to be refreshed ? I would assume with the same frequency as the existing data on the popcon.d.o website is refreshed. Anything faster than that would just be refreshing unchanged data. Anything slower than that would be providing outdated data. Outdated data is fine though, so maybe weekly. > Which exact set of data are going to be used ? Initially I thought similar to the QA per-package popcon data: https://qa.debian.org/popcon.php?package=iotop Package: iotop Popcon: 30314 7962 21197 1143 12 If I massage the by_inst file into the same format as this, I calculate that the extra Popcon fields would add 3.7 MB to the Packages files and that data would change often, making the apt updating process slower. So probably the data should go into new files instead and there should be a config file snippet to enable downloading them, a tool to query and index them and a way for apt clients to get that data. Since the Debian repository splits the metadata by suite and component, these new statistics should probably do the same. So the raw popcon submissions would need to be individually mapped to a suite based on the popcon version in the submission, and then each item in the submission attributed to that suite/component. For popcon versions that don't match a suite, if they match a known Debian version, attribute them to the next highest suite and discard submissions with popcon versions that were never in Debian, or maybe attribute them to the relevant vendor separately. popcon submissions that don't have Debian as the vendor probably should be discarded, or maybe attribute them to the relevant vendor separately. -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part
Bug#999677: RFP: popcon-stats-data -- Debian's Popularity Contest statistics
On Tue, Nov 16, 2021 at 09:34:00AM +0800, Paul Wise wrote: > On Tue, 2021-11-16 at 08:38 +0800, Paul Wise wrote: > > > I think a better approach would be to ship this data in the Debian > > apt repository metadata, either in the Packages files or in > > Popularity files in the dists/ dir > > I note that debtags.debian.org uses this approach, data is gathered on > the site, then uploaded to ftp-master, which integrates the data and > distributes it via the Packages files. So it should work if the FTP > Team and Popcon teams are willing to support the idea. What is the idea exactly ? Several questions come to mind: How often the popcon data are going to be refreshed ? Which exact set of data are going to be used ? Cheers, -- Bill. Imagine a large red swirl here.
Bug#999677: RFP: popcon-stats-data -- Debian's Popularity Contest statistics
On Sun, Nov 14, 2021 at 09:43:46PM +0100, Bálint Réczey wrote: > Package: wnpp > Severity: wishlist > > * Package name: popcon-stats-data > Version : 0.2024 > * URL or Web page : https://popcon.debian.org/ > * License : Public Domain (data) > Description : Debian's Popularity Contest statistics > > --- > > The shipped data would let package managers show the popularity of > packages which could let users make more informed decisions when > choosing between packages to install. > > I don't believe this will change the Vim vs. Emacs battle, but when I > looked for a DICOM viewer I found a crazy amount of programs of > various quality and knowing which ones were the most widely used would > have sped up picking a good one. The popularity of packages is heavily skewed by how the distribution is structured, in particular by the set of packages installed by default, so alas it is not always an indication of user preferences... Cheers, -- Bill. Imagine a large red swirl here.
Bug#999677: RFP: popcon-stats-data -- Debian's Popularity Contest statistics
On Tue, 2021-11-16 at 08:38 +0800, Paul Wise wrote: > I think a better approach would be to ship this data in the Debian > apt repository metadata, either in the Packages files or in > Popularity files in the dists/ dir I note that debtags.debian.org uses this approach, data is gathered on the site, then uploaded to ftp-master, which integrates the data and distributes it via the Packages files. So it should work if the FTP Team and Popcon teams are willing to support the idea. -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part
Bug#999677: RFP: popcon-stats-data -- Debian's Popularity Contest statistics
[Forwarded to and CCing the debian-popcon mailing list] On Sun, 2021-11-14 at 21:43 +0100, Bálint Réczey wrote: > The shipped data would let package managers show the popularity of > packages which could let users make more informed decisions when > choosing between packages to install. ... > Ideally the stats would be shipped in a format from which APT and > other package managers could efficiently look up the percentage of > Debian systems a particular binary package was used. This package would be very Debian specific and would give the wrong data when installed in Ubuntu, I think a better approach would be to ship this data in the Debian apt repository metadata, either in the Packages files or in Popularity files in the dists/ dir (similar to the Contents files used by apt-file) so that the data is directly available to apt clients like aptitude/etc. This way Ubuntu and other derivatives could also ship popularity data for their users too. -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part
Bug#999677: RFP: popcon-stats-data -- Debian's Popularity Contest statistics
Package: wnpp Severity: wishlist * Package name: popcon-stats-data Version : 0.2024 * URL or Web page : https://popcon.debian.org/ * License : Public Domain (data) Description : Debian's Popularity Contest statistics --- The shipped data would let package managers show the popularity of packages which could let users make more informed decisions when choosing between packages to install. I don't believe this will change the Vim vs. Emacs battle, but when I looked for a DICOM viewer I found a crazy amount of programs of various quality and knowing which ones were the most widely used would have sped up picking a good one. Ideally the stats would be shipped in a format from which APT and other package managers could efficiently look up the percentage of Debian systems a particular binary package was used. Cheers, Balint