On Mon, Jan 24, 2011 at 11:24:26AM -0500, Scott Howard wrote: > Asking scientists for hints about analyzing data . . . sounds scary...
:-) > It would be interesting to see a single number representing the > "health" of a team over time. I propose using something like the > Hirsch index [1] (an index which invokes widespread unease when > applied to evaluating faculty candidates and promotion while at the > same time invoking widespread use.) > > An h-index moving average (over a year, perhaps) shows if the team is > growing both in size and in sharing of workload. If a single person is > doing all the work, then the team isn't behaving like one, and would > yield a low h-index. Similarly, to a large team made up of one-time > uploaders isn't healthy for long term stability. Sounds interesting: Anybody willing to code the SQL query for implementing the h-index? I admit I have no real clue how to map the "citations" mentioned on the WikiPedia reference with package uploads. Uploading a new package version is quite different from creating a new package - measuring both the same is unfair. My "highest number of upload per year" was just reached in a QA effort which did not costed a lot of time - way less than if I would have created a medium complicated package from scratch. > I assume the data is only for the top ten uploaders since 2001, and > there there are others that are new and haven't made enough uploads to > make the top 10 over the past decade. Yes. I missed to mention this. It is the same idea as it is behind the mailing list statistics. > However, for the case of this > example, I'll pretend like the 10 uploaders listed at [2] make up the > entire group of uploaders to debian-med. Don't undersetimate the Debian Med team! :-) http://blends.debian.net/liststats/uploaders_debian-med_top20.png [ http://blends.debian.net/liststats/uploaders_debian-med_top20.txt ] (but in the end there are probably NMUs - at least Matthias K. and Moritz M. will not count themselves as part of the team). Also Dirk E. was basically doing some NMUs for R packages. > med: > 2001 2 > 2002 2 > 2003 2 > 2004 2 > 2005 2 > 2006 4 > 2007 4 > 2008 8 > 2009 6 > 2010 6 > > > I do not believe the above numbers are correct, because it is > excluding uploaders who may have significantly contributed recently > (e.g. ~15 uploads in each of 2009 and 2010), but did not make 30 > uploads over the decade to be represented in the data set. For > example, if two such people existed, 2009 and 2010 would have > h-indexes of 10 - clearly showing growth and improved team health over > the past decade. The above data is an example and would have to be > compiled using every uploader's data to give the correct number. It > will probably yield higher numbers in 2009 and 2010 as more people > contributed. Yes. Those people in fact exist as the top20 graph shows. Unfortunately the graph just becomes quite big if you take more than 10 people into account. > That could be a useful metric for other Debian teams to identify if > the team is behaving more or less team-like over time. This is a good idea but as I said I need a better algorithm than on the WikiPedia page to acomplish this. For completeness I also calculated top20 for Debian Science (with an URL you can guess. I also have calculated the graph for some other teams (as you might have seen once you found the *.txt file). Thanks for your input Andreas. > [1] http://en.wikipedia.org/wiki/H-index > [2] http://blends.debian.net/liststats/uploaders_debian-med.txt -- http://fam-tille.de -- To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20110124164915.ga25...@an3as.eu