On 7/30/2012 1:42 PM, Patrick W. Gilmore wrote:
I'm sorry Panashe is upset by this rule.  Interestingly, "Your search - Panashe 
Flack nanog - did not match any documents."  So my guess is that a post from that 
account has not happened before, meaning the post was moderated yet still made it through.

Has anyone done a data mining experiment to see how many posts a month are from 
"new" members?  My guess is it is a trivial percentage.


Ignoring many harder to determine things like "who has changed their email address" and reducing it to simple shell commands, I got this:

for i in `cat ../nanog_archive_index.html | grep txt | cut -f2 -d\"` ; do wget http://mailman.nanog.org/pipermail/nanog/$i; done du -sh=41M (uncompressed=100M). That seems small for all the mail since random 2007 but I'd rather use an official archive so people can duplicate results and refine things.
 grep -h "^From: " * |  sort | uniq -c | sort -nr

First of all I will say Owen is winning by a fair margin:

   1562 From: owen at delong.com (Owen DeLong)
    929 From: randy at psg.com (Randy Bush)
    775 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu)
    688 From: morrowc.lists at gmail.com (Christopher Morrow)
    621 From: jbates at brightok.net (Jack Bates)
    558 From: jra at baylink.com (Jay Ashworth)
    480 From: gbonser at seven.com (George Bonser)
    450 From: patrick at ianai.net (Patrick W. Gilmore)
    446 From: cidr-report at potaroo.net (cidr-report at potaroo.net)

Total count:
grep -h "^From: " * | wc -l
54166

# Totals for < 10 contributors
for i in 1 2 3 4 5 6 7 8 9; do grep -h "^From: " * | sort | uniq -c | sort -nr | grep " $i" | wc -l; done
3129
1111
552
319
208
157
131
103
94

Total for less than 10 posts contributors:  5804

Percentages:  5804/54166=1% of posts from low contributors.

# shows the number of people who've contributed that number of times.
grep -h "^From: " * | sort | uniq -c | sort -nr | awk '{print $1}' | uniq -c | sort -nr

# another interesting thing to look at is posts by month per user (dropping the -h from grep):
grep "^From: " * | sort | uniq -c | sort -nr

# not the most efficient, but tells you who posted the most in a month:
for i in *; do grep "^From: " * | sort | uniq -c | sort -nr | grep $i | head -n 1; done

# Per month, how many single post contributions happen/total. The numbers can be higher here since people who posted in a different month may still be counted as a new contributor for i in *; do echo -n "$i "; grep "^From: " $i | sort | uniq -c | sort -nr | grep " 1 " | wc -l | tr '\n' '/'; grep "^From: " $i | wc -l ; done



Reply via email to