perl script to pre-cache popular urls from apache logs

admintim Sun, 06 Oct 2013 06:47:30 -0700

Hi,

Thanks for memcached.


I wanted a quick script to hoick out urls from my apache access log and 
then use wget to call the most common ones after a memcached restart, just 
so that users don't have to wait on the first calls. I didn't find anything 
specific from searches, perhaps it's too simple to post. Here is the 
(crude) script anyway - improvements and other tips welcomed:

#!/usr/bin/perl -w

use strict;
use warnings;

my $wait = 2;
my $min_count = 500;
my $hist = 20;

my %url_counts;
my @urls;
while(<>) {
    /.*\"(http:\/\/[^\"]+)\".*/;
    if (!$1) {
        print "Warning: No URL found on line: $_\n";
    } else {
        push(@urls, $1);
        $url_counts{$1}++;
    }
}
foreach my $key (sort {$url_counts{$b} - $url_counts{$a}} keys %url_counts 
) {
    if ($url_counts{$key} >= $min_count) {
        print "Visiting common url: Count: $url_counts{$key} URL: $key\n";
        system("wget '$key' -O- > /dev/null");
        sleep $wait;
    } else {
        last;
    }
}
my $pos = 0;
foreach my $url (reverse @urls) {
    if ($pos <= $hist) {
        print "Visiting reverse historical url $pos: $url\n";
        system("wget '$url' -O- > /dev/null");
        sleep $wait;
        $pos++;
    } else {
        last;
    }
}

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

perl script to pre-cache popular urls from apache logs

Reply via email to