Hello,

We have mirrors. Almost 100 of them. Feel free to contact them all,
have them write code to count downloads which then sends the stats to
us, and then we can implement this.

What you suggest is absolutely not feasible at all.

That's too bad, I wanted to suggest counting of downloads too (because I believe that the number downloads of particular version of a package would after a while correlate quite well with the number of users that actually use, i. e. upgrade this package - it should more or less solve the problem of people trying the package and removing it quickly after that that was mentioned).

Anyway I've been meaning to contribute with some ideas for the topic for at least four days (since I read the first IRC log on Sunday), unfortunately my job hasn't allowed it this week. I just wanted to do some thinking out loud about both methods (voting/pkgstats) for both packages already in community and those that might get there in the future from a regular user's point of view (also with regards to privacy/paranoia matters).

(1) pkgstats
The obvious problem with accuracy is that not everybody will use it (or use it even from time to time to update their "contribution" to the statistics). Some people don't know about it, some people won't be bothered, some might be concerned about privacy. Even though IP address is not necessarily an identifier of a person, it still a "good enough information". I actually more or less trust Arch devs that really only a hash of the IP is stored together with the package list but I hardly can be sure and there are much more paranoid users out there than myself. (Their problem doesn't have to be only with privacy itself - when someone knows the packages you use and even the exact versions, it makes it so much easier to target some kind of attack on the system.)

On the other hand it can be nicely used to promote a package that is in unsupported. "Do you use this package? Do you want to see it in community? Have you run pkgstats on you system then?" It would be nice to see the statistics in AUR frontend, one could see how far the package is from the magic number that makes the package a good candidate for community (whatever the number will be).

As for pruning of community as it is now (if it still is an issue, I'm not quite sure anymore). How about this. Pick a reasonable percentage (it doesn't have to be the same number as the one for new packages entering community, it can be lower) by whatever criteria (number of packages to prune, number of MB to save, ...), create a list of all the packages with usage below this number and create lists of these packages grouped by their maintainers. Then send the individual maintainer-lists to the maintainers with a note that they should consider whether or not these particular packages are really a good material for community. At the same time put the list of all those packages on the web, announce its existence in the latest news and tell people that if they see a package/packages they use and haven't yet run pkgstats, they should probably do it now, otherwise the package might be removed from community. Then wait for some time and look at the change in statistics (maybe there will be some, maybe there won't).

(2) votes
Again, not everybody uses it. Especially since voting means that you have to have an AUR account. Today everybody has tons of accounts at different internet services, ideally one should have as many passwords as possible, and people don't like to create yet another account (I know I don't). Frankly, if I hadn't needed those about 15 packages I now maintain in unsupported (because I hadn't found them there), I wouldn't have created an AUR account either.

There's another problem with accuracy. Even users who have an account and vote don't vote for every single package they use. Especially many people (myself included) probably never voted for packages already in community. This makes the system usable for dealing with the transition unsupported -> community but not for the other way round. That, too, could be helped by similar approach as above - count packages with the least votes, create their list (lists) and urge people to vote for packages on this list if they use them a want to see them still in community in the future.

The problem is that this way the privacy concerns will be even bigger. Right now if someone looked up which packages I voted for, it wouldn't give them much of an idea which packages I actually use (because I only voted for packages in unsupported and only for those that I had a reason to believe that my vote might help push them to community). After applying the above suggestion, anyone who gained access to AUR data knows more or less about all community packages that a certain nickname uses (which is much worse that knowing that this list of packages is used by someone with this hash of IP address - which is the information pkgstats provides). Moreover, each nickname is associated with an e-mail which is then more or less associated with a particular person. Of course, the e-mail can be fake (or completely or almost unused), on the other hand if you also want to maintain some packages in unsupported, you want to have a valid e-mail, so, if you're paranoid, you'd probably have to have two AUR accounts - one connected to you for maintaining packages and the other one as "anonymous" as possible just for voting.

Conclusion
Unfortunately, I don't have a solution. Both systems can be made more accurate (and useful for pointing fingers at packages that really aren't all that much used) but at the price of some amount of privacy or even security. I still think that the best solution would be counting downloads, because it would be quite accurate and also quite anonymous (definitely more than pkgstats or voting) but sadly it's not an option.

I hope I haven't wasted too much time of those who have read it all. If so, then I apologize :-), but I felt that when I spent some the time thinking about these matters on my way to work and back this week, I should share the thoughts.

Ondřej


--
Cheers,
Ondřej Kučera

Reply via email to