Bug#999677: RFP: popcon-stats-data -- Debian's Popularity Contest statistics

2021-11-17 Thread Bill Allombert
On Wed, Nov 17, 2021 at 08:05:38AM +0800, Paul Wise wrote:
> On Tue, 2021-11-16 at 13:38 +0100, Bill Allombert wrote:
> 
> > What is the idea exactly ?
> 
> Bálint's idea was to ship popcon data in a popcon-stats-data package in
> the Debian archive. I suggested to instead ship that in the apt
> metadata present in the Packages files.
> 
> > How often the popcon data are going to be refreshed ?
> 
> I would assume with the same frequency as the existing data on the
> popcon.d.o website is refreshed. Anything faster than that would just
> be refreshing unchanged data. Anything slower than that would be
> providing outdated data. Outdated data is fine though, so maybe weekly.
> 
> > Which exact set of data are going to be used ?
> 
> Initially I thought similar to the QA per-package popcon data:
> 
> https://qa.debian.org/popcon.php?package=iotop
> 
> Package: iotop
> Popcon: 30314 7962 21197 1143 12
> 
> If I massage the by_inst file into the same format as this, I calculate
> that the extra Popcon fields would add 3.7 MB to the Packages files and
> that data would change often, making the apt updating process slower.
> So probably the data should go into new files instead and there should
> be a config file snippet to enable downloading them, a tool to query
> and index them and a way for apt clients to get that data.
> 
> Since the Debian repository splits the metadata by suite and component,
> these new statistics should probably do the same. So the raw popcon
> submissions would need to be individually mapped to a suite based on
> the popcon version in the submission, and then each item in the
> submission attributed to that suite/component. For popcon versions that
> don't match a suite, if they match a known Debian version, attribute
> them to the next highest suite and discard submissions with popcon
> versions that were never in Debian, or maybe attribute them to the
> relevant vendor separately. popcon submissions that don't have Debian
> as the vendor probably should be discarded, or maybe attribute them to
> the relevant vendor separately.

So the idea is to have a Popcon file for each suite ?
So let say bookworm is released today. What bookworm/Popcon will contain ?
We release a new popularity-contest package. What sid/Popcon will contain ?
The package migrate to testing; What testing/Popcon will contain ?
As I understand, the metadata for stable are only updated with point
releases. Would that be the same for stable/Popcon ?

I still do not quite see how this would work...
We do not want to provide data generated from a very small subset of
reports for accuracy and privacy reasons.
The current all-popcon-result.gz/stable-popcon-result.gz split is
middle ground between competing constraints.

What not instead write a tool to download all-popcon-result.gz or
stable-popcon-result.gz when needed, and cache them ?
This can then be processed by a tool that makes suggestions.

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 



Bug#999677: RFP: popcon-stats-data -- Debian's Popularity Contest statistics

2021-11-16 Thread Paul Wise
On Tue, 2021-11-16 at 13:38 +0100, Bill Allombert wrote:

> What is the idea exactly ?

Bálint's idea was to ship popcon data in a popcon-stats-data package in
the Debian archive. I suggested to instead ship that in the apt
metadata present in the Packages files.

> How often the popcon data are going to be refreshed ?

I would assume with the same frequency as the existing data on the
popcon.d.o website is refreshed. Anything faster than that would just
be refreshing unchanged data. Anything slower than that would be
providing outdated data. Outdated data is fine though, so maybe weekly.

> Which exact set of data are going to be used ?

Initially I thought similar to the QA per-package popcon data:

https://qa.debian.org/popcon.php?package=iotop

Package: iotop
Popcon: 30314 7962 21197 1143 12

If I massage the by_inst file into the same format as this, I calculate
that the extra Popcon fields would add 3.7 MB to the Packages files and
that data would change often, making the apt updating process slower.
So probably the data should go into new files instead and there should
be a config file snippet to enable downloading them, a tool to query
and index them and a way for apt clients to get that data.

Since the Debian repository splits the metadata by suite and component,
these new statistics should probably do the same. So the raw popcon
submissions would need to be individually mapped to a suite based on
the popcon version in the submission, and then each item in the
submission attributed to that suite/component. For popcon versions that
don't match a suite, if they match a known Debian version, attribute
them to the next highest suite and discard submissions with popcon
versions that were never in Debian, or maybe attribute them to the
relevant vendor separately. popcon submissions that don't have Debian
as the vendor probably should be discarded, or maybe attribute them to
the relevant vendor separately.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Bug#999677: RFP: popcon-stats-data -- Debian's Popularity Contest statistics

2021-11-16 Thread Bill Allombert
On Tue, Nov 16, 2021 at 09:34:00AM +0800, Paul Wise wrote:
> On Tue, 2021-11-16 at 08:38 +0800, Paul Wise wrote:
> 
> > I think a better approach would be to ship this data in the Debian
> > apt repository metadata, either in the Packages files or in
> > Popularity files in the dists/ dir
> 
> I note that debtags.debian.org uses this approach, data is gathered on
> the site, then uploaded to ftp-master, which integrates the data and
> distributes it via the Packages files. So it should work if the FTP
> Team and Popcon teams are willing to support the idea.

What is the idea exactly ?

Several questions come to mind:

How often the popcon data are going to be refreshed ?
Which exact set of data are going to be used ?

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 



Bug#999677: RFP: popcon-stats-data -- Debian's Popularity Contest statistics

2021-11-16 Thread Bill Allombert
On Sun, Nov 14, 2021 at 09:43:46PM +0100, Bálint Réczey wrote:
> Package: wnpp
> Severity: wishlist
> 
> * Package name: popcon-stats-data
>   Version : 0.2024
> * URL or Web page : https://popcon.debian.org/
> * License : Public Domain (data)
>   Description : Debian's Popularity Contest statistics
> 
> ---
> 
> The shipped data would let package managers show the popularity of
> packages which could let users make more informed decisions when
> choosing between packages to install.
> 
> I don't believe this will change the Vim vs. Emacs battle, but when I
> looked for a DICOM viewer I found a crazy amount of programs of
> various quality and knowing which ones were the most widely used would
> have sped up picking a good one.

The popularity of packages is heavily skewed by how the distribution is
structured, in particular by the set of packages installed by default,
so alas it is not always an indication of user preferences...

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 



Bug#999677: RFP: popcon-stats-data -- Debian's Popularity Contest statistics

2021-11-15 Thread Paul Wise
On Tue, 2021-11-16 at 08:38 +0800, Paul Wise wrote:

> I think a better approach would be to ship this data in the Debian
> apt repository metadata, either in the Packages files or in
> Popularity files in the dists/ dir

I note that debtags.debian.org uses this approach, data is gathered on
the site, then uploaded to ftp-master, which integrates the data and
distributes it via the Packages files. So it should work if the FTP
Team and Popcon teams are willing to support the idea.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Bug#999677: RFP: popcon-stats-data -- Debian's Popularity Contest statistics

2021-11-15 Thread Paul Wise
[Forwarded to and CCing the debian-popcon mailing list]

On Sun, 2021-11-14 at 21:43 +0100, Bálint Réczey wrote:

> The shipped data would let package managers show the popularity of
> packages which could let users make more informed decisions when
> choosing between packages to install.
...
> Ideally the stats would be shipped in a format from which APT and
> other package managers could efficiently look up the percentage of
> Debian systems a particular binary package was used.

This package would be very Debian specific and would give the wrong
data when installed in Ubuntu, I think a better approach would be to
ship this data in the Debian apt repository metadata, either in the
Packages files or in Popularity files in the dists/ dir (similar to the
Contents files used by apt-file) so that the data is directly available
to apt clients like aptitude/etc. This way Ubuntu and other derivatives
could also ship popularity data for their users too.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Bug#999677: RFP: popcon-stats-data -- Debian's Popularity Contest statistics

2021-11-14 Thread Bálint Réczey
Package: wnpp
Severity: wishlist

* Package name: popcon-stats-data
  Version : 0.2024
* URL or Web page : https://popcon.debian.org/
* License : Public Domain (data)
  Description : Debian's Popularity Contest statistics

---

The shipped data would let package managers show the popularity of
packages which could let users make more informed decisions when
choosing between packages to install.

I don't believe this will change the Vim vs. Emacs battle, but when I
looked for a DICOM viewer I found a crazy amount of programs of
various quality and knowing which ones were the most widely used would
have sped up picking a good one.

Ideally the stats would be shipped in a format from which APT and
other package managers could efficiently look up the percentage of
Debian systems a particular binary package was used.

Cheers,
Balint