Better way to mirror CPAN locally?

2012-05-07 Thread gvim

I currently use the script listed below, provided by Randall Schwartz, to 
mirror CPAN locally as I spend a lot of time Perl-ing without an internet 
connection. With CPAN now totalling around 2GB I'm wondering if there isn't a 
more efficient method as this script doesn't use any kind of rsync method and I 
end up just downloading the whole 2GB every time.

gvim

***
#!/usr/bin/perl -w
use strict;
$|++;

my $REMOTE = http://mirror.bytemark.co.uk/CPAN/;;

## warning: unknown files below this dir are deleted!
my $LOCAL = /Users/gmac/cpmirror/mirror/;

my $TRACE = 1;


## core -
use File::Path qw(mkpath);
use File::Basename qw(dirname);
use File::Spec::Functions qw(catfile);
use File::Find qw(find);

## LWP -
use URI ();
use LWP::Simple qw(mirror RC_OK RC_NOT_MODIFIED);

## Compress::Zlib -
use Compress::Zlib qw(gzopen $gzerrno);

## first, get index files
my_mirror($_) for qw(authors/01mailrc.txt.gz modules/02packages.details.txt.gz 
modules/03modlist.data.gz);

## now walk the packages list
my $details = catfile($LOCAL, qw(modules 02packages.details.txt.gz));
my $gz = gzopen($details, rb) or die Cannot open details: $gzerrno;
my $inheader = 1;
while ($gz-gzreadline($_)  0) {
  if ($inheader) {
$inheader = 0 unless /\S/;
next;
  }
  my ($module, $version, $path) = split;
  next if $path =~ m{/perl-5};  # skip Perl distributions
  my_mirror(authors/id/$path, 1);
}

## finally, clean the files we didn't stick there
clean_unmirrored();
exit 0;

BEGIN {
my %mirrored;

sub my_mirror {
 my $path = shift;   # partial URL
 my $skip_if_present = shift; # true/false

 my $remote_uri = URI-new_abs($path, $REMOTE)-as_string; # full URL
 my $local_file = catfile($LOCAL, split /, $path); # native absolute file
 my $checksum_might_be_up_to_date = 1;

 if ($skip_if_present and -f $local_file) {
   ## upgrade to checked if not already
   $mirrored{$local_file} = 1 unless $mirrored{$local_file};
 } elsif (($mirrored{$local_file} || 0)  2) {
 ## upgrade to full mirror
 $mirrored{$local_file} = 2;
 mkpath(dirname($local_file), $TRACE, 0711);
 print $path if $TRACE;
 my $status = mirror($remote_uri, $local_file);

 if ($status == RC_OK) {
   $checksum_might_be_up_to_date = 0;
   print  ... updated\n if $TRACE;
 } elsif ($status != RC_NOT_MODIFIED) {
 warn \n$remote_uri: $status\n;
 return;
   } else {
 {
 print  ... up to date\n if $TRACE;
 }
   }

 if ($path =~ m{^authors/id}) { # maybe fetch CHECKSUMS
   my $checksum_path = URI-new_abs(CHECKSUMS, $remote_uri)-rel($REMOTE);
   if ($path ne $checksum_path) {
 my_mirror($checksum_path, $checksum_might_be_up_to_date);
   }
 }
   }

   sub clean_unmirrored {
 find sub {
   return unless -f and not $mirrored{$File::Find::name};
   print $File::Find::name ... removed\n if $TRACE;
   unlink $_ or warn Cannot remove $File::Find::name: $!;
 }, $LOCAL;
   }
 }
}


Re: Better way to mirror CPAN locally?

2012-05-07 Thread Jason Clifford
On Mon, 2012-05-07 at 13:21 +0100, gvim wrote:
 I currently use the script listed below, provided by Randall Schwartz, to 
 mirror CPAN locally as I spend a lot of time Perl-ing without an internet 
 connection. With CPAN now totalling around 2GB I'm wondering if there isn't a 
 more efficient method as this script doesn't use any kind of rsync method and 
 I end up just downloading the whole 2GB every time.

Why don't you use rsync?

http://www.cpan.org/misc/how-to-mirror.html#rsync



Re: Better way to mirror CPAN locally?

2012-05-07 Thread Fernando Corrêa de Oliveira


JAPH

Em 07/05/2012, às 09:21, gvim gvi...@gmail.com escreveu:

 I currently use the script listed below, provided by Randall Schwartz, to 
 mirror CPAN locally as I spend a lot of time Perl-ing without an internet 
 connection. With CPAN now totalling around 2GB I'm wondering if there isn't a 
 more efficient method as this script doesn't use any kind of rsync method and 
 I end up just downloading the whole 2GB every time.
 
 gvim
 
 ***
 #!/usr/bin/perl -w
 use strict;
 $|++;
 
 my $REMOTE = http://mirror.bytemark.co.uk/CPAN/;;
 
 ## warning: unknown files below this dir are deleted!
 my $LOCAL = /Users/gmac/cpmirror/mirror/;
 
 my $TRACE = 1;
 
 
 ## core -
 use File::Path qw(mkpath);
 use File::Basename qw(dirname);
 use File::Spec::Functions qw(catfile);
 use File::Find qw(find);
 
 ## LWP -
 use URI ();
 use LWP::Simple qw(mirror RC_OK RC_NOT_MODIFIED);
 
 ## Compress::Zlib -
 use Compress::Zlib qw(gzopen $gzerrno);
 
 ## first, get index files
 my_mirror($_) for qw(authors/01mailrc.txt.gz 
 modules/02packages.details.txt.gz modules/03modlist.data.gz);
 
 ## now walk the packages list
 my $details = catfile($LOCAL, qw(modules 02packages.details.txt.gz));
 my $gz = gzopen($details, rb) or die Cannot open details: $gzerrno;
 my $inheader = 1;
 while ($gz-gzreadline($_)  0) {
  if ($inheader) {
$inheader = 0 unless /\S/;
next;
  }
  my ($module, $version, $path) = split;
  next if $path =~ m{/perl-5};  # skip Perl distributions
  my_mirror(authors/id/$path, 1);
 }
 
 ## finally, clean the files we didn't stick there
 clean_unmirrored();
 exit 0;
 
 BEGIN {
 my %mirrored;
 
 sub my_mirror {
 my $path = shift;   # partial URL
 my $skip_if_present = shift; # true/false
 
 my $remote_uri = URI-new_abs($path, $REMOTE)-as_string; # full URL
 my $local_file = catfile($LOCAL, split /, $path); # native absolute file
 my $checksum_might_be_up_to_date = 1;
 
 if ($skip_if_present and -f $local_file) {
   ## upgrade to checked if not already
   $mirrored{$local_file} = 1 unless $mirrored{$local_file};
 } elsif (($mirrored{$local_file} || 0)  2) {
 ## upgrade to full mirror
 $mirrored{$local_file} = 2;
 mkpath(dirname($local_file), $TRACE, 0711);
 print $path if $TRACE;
 my $status = mirror($remote_uri, $local_file);
 
 if ($status == RC_OK) {
   $checksum_might_be_up_to_date = 0;
   print  ... updated\n if $TRACE;
 } elsif ($status != RC_NOT_MODIFIED) {
 warn \n$remote_uri: $status\n;
 return;
   } else {
 {
 print  ... up to date\n if $TRACE;
 }
   }
 
 if ($path =~ m{^authors/id}) { # maybe fetch CHECKSUMS
   my $checksum_path = URI-new_abs(CHECKSUMS, 
 $remote_uri)-rel($REMOTE);
   if ($path ne $checksum_path) {
 my_mirror($checksum_path, $checksum_might_be_up_to_date);
   }
 }
   }
 
   sub clean_unmirrored {
 find sub {
   return unless -f and not $mirrored{$File::Find::name};
   print $File::Find::name ... removed\n if $TRACE;
   unlink $_ or warn Cannot remove $File::Find::name: $!;
 }, $LOCAL;
   }
 }
 }



Re: Better way to mirror CPAN locally?

2012-05-07 Thread Mark Fowler

On Monday, 7 May 2012 at 13:21, gvim wrote: 
 I currently use the script listed below, provided by Randall Schwartz, to 
 mirror CPAN locally as I spend a lot of time Perl-ing without an internet 
 connection. With CPAN now totalling around 2GB I'm wondering if there isn't a 
 more efficient method as this script doesn't use any kind of rsync method and 
 I end up just downloading the whole 2GB every time.

The script you provided is designed to keep a mirror of only the parts of CPAN 
that you can install via the CPAN shell (i.e. not keep a copy of dev versions 
and old versions of distributions).  Assuming you want the same thing, rsync is 
not the answer as that'll suck down a whole host of stuff you don't want on 
disk.

However: The more modern CPAN::Mini https://metacpan.org/module/minicpan can do 
the same thing and subsequent runs only download indexes and distributions.  
It's probably what you're looking for.

HTH

Mark.


Re: Better way to mirror CPAN locally?

2012-05-07 Thread Mark Overmeer
* Mark Fowler (m...@twoshortplanks.com) [120507 12:55]:
 However: The more modern CPAN::Mini https://metacpan.org/module/minicpan
 can do the same thing and subsequent runs only download indexes and
 distributions.  It's probably what you're looking for.

cpanmini or CPAN::Site
-- 
   MarkOv


   Mark Overmeer MScMARKOV Solutions
   m...@overmeer.net  soluti...@overmeer.net
http://Mark.Overmeer.net   http://solutions.overmeer.net



Re: Better way to mirror CPAN locally?

2012-05-07 Thread gvim

On 07/05/2012 13:50, Mark Fowler wrote:


The script you provided is designed to keep a mirror of only the
parts of CPAN that you can install via the CPAN shell (i.e. not keep
a copy of dev versions and old versions of distributions).  Assuming
you want the same thing, rsync is not the answer as that'll suck down
a whole host of stuff you don't want on disk.

However: The more modern CPAN::Mini
https://metacpan.org/module/minicpan can do the same thing and
subsequent runs only download indexes and distributions.  It's
probably what you're looking for.



If subsequent runs only download indexes and distributions what use is it for 
keeping non-distribution modules up to date? I may be ignorant of what you mean 
by distribution.

gvim


Re: Better way to mirror CPAN locally?

2012-05-07 Thread gvim

On 07/05/2012 13:50, Mark Fowler wrote:


The script you provided is designed to keep a mirror of only the
parts of CPAN that you can install via the CPAN shell (i.e. not keep
a copy of dev versions and old versions of distributions).  Assuming
you want the same thing, rsync is not the answer as that'll suck down
a whole host of stuff you don't want on disk.

However: The more modern CPAN::Mini
https://metacpan.org/module/minicpan can do the same thing and
subsequent runs only download indexes and distributions.  It's
probably what you're looking for.



Ignore last message about distributions. Got it now and it seems to be what I'm 
looking for. Thanks.

gvim