On Sun, 26 Apr 2020 10:52:27 +0200
Michał Górny <mgo...@gentoo.org> wrote:

> Do you have any other idea for spam protection then?

What is the realistic risk here for spamming?

If the record is well formed, and pertains to known packages, the worst
I currently imagine is astroturfing: A single individual attempting to
make a package seem more popular than it is.

Just generally IME, spamming aims to make a buck somehow, but if
there's no fields in the data set that can be used for this, and abuse
of existing fields to fill with spam prose get filtered by not
correlating to any known possible values, then the entire record is
simply invalid, and can be removed on that basis.

Conceptually, you could have a report with
"dev-foo/plz-sir-halp-me-I-have-money-and-an-a-nigerian-prince::nigeria-prince",
but for anybody to see that they'd have to be querying data about the
::nigeria-prince overlay, and that's assuming we even show data about
overlays we can't locate.

Trolling ::gentoo with packages that don't exist seems easy to eliminate.

I don't like that astroturfing could be a thing ... but like, I also
don't really care about that happening.

For instance, crates.io has per-crate and per-crate-version download
statistics.

That's super easy to rig, you get lots of spiky noise in infrequently
used packages simply due to various automated services fetching things.

But at scale, the data still turns out to be quasi-useful, as it allows
you to chart adoption and migration... because as soon as a new version
gets shipped, if people are using it, then you'll start to see an
uptick in reports from the new version.

The "change" and "change response" information is very useful, and a
very odd target for astroturfing.

I for one would be greatly interested in "new perl version shipped,
explosion of results due to people upgrading", because then I can gauge
roughly how many people managed to upgrade perl without having to join
#gentoo and cry about it being broken.

(We could also designate a certain UUID flag for use by Gentoo infra,
possibly even a UUID-per-host, the results of which were invisible in
the public data, but still visible to people with approved perms,
because we really do value the ability to know which packages we have
to be careful about causing problems in, and where infra is at with
upgrading various things before we remove the versions infra is using,
whereas currently, working out what infra are currently running
requires lots of direct communication)

Attachment: pgpd35R8sKJD6.pgp
Description: OpenPGP digital signature

Reply via email to