Re: OAI::Harvester installation help

2011-05-18 Thread Saiful Amin

 I am attempting to use some code which depends on Net::OAI::Harvester,
 but my attempts to install OAI::Harvester are running into problems
 with:

 Any suggestions for getting this installed properly?  I'm assuming that
 this is a case where a simple force install isn't going to get me a
 working installation...


I've used Net::OAI::Harvester on both Ubuntu and Windows XP for my projects.
On XP I've used Strawberry Perl, in which installing using CPAN luckily
worked without any problem. In Ubuntu, I had to install from synaptic when
CPAN failed. If I remember correctly, the command was:
# sudo apt-get install libnet-oai-harvester-perl

It works better than cpan in managing dependencies in my experience.

Regards,
Saiful Amin


RE: OAI::Harvester installation help

2011-05-18 Thread Dave Sherohman
Thanks for the responses!  I was also contacted off-list by 
Net::OAI::Harvester's maintainer and we tracked the issue down to a rogue 
standalone release of XML::SAX::Base.  Removing that, so that the 
XML::SAX::Base included with XML::SAX was used instead, resolved my problems 
and I was then able to install Net::OAI::Harvester cleanly.

  I am not so familiar with the oai harvesting tools in Perl, so
  forgive me if I am giving you incorrect information. My vague
  recollection is that there are several oai harvesters for Perl.

There are, but the import tool I'm using is specifically built to use 
Net::OAI::Harvester.

 In Ubuntu, I had to install from synaptic when CPAN failed.
 If I remember correctly, the command was:
 # sudo apt-get install libnet-oai-harvester-perl
 
 It works better than cpan in managing dependencies in my experience.

Although my dev laptop is Ubuntu, the servers I deploy to tend to be running 
Very Old Releases of Fedora, so I try to stay within CPAN despite its warts, 
simply for the sake of broad compatibility.  The security of running 
installation tests is a great bonus, too, though - in this particular case, I 
expect that installing Ubuntu's packaged version would have gotten me a 
non-working installation because the rogue XML::SAX::Base would still be there, 
it just wouldn't have been caught during installation.

Re: OAI::Harvester installation help

2011-05-18 Thread Thomas Berger
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

thanks to further input from the original reporter I (as co-maintainer of
N:O:H) have been able to sort out the issues illustrated by the report:

- - one of the repositories used in the test suite changed its address
  recently thus causing some tests to fail: fixed

- - The LibXML family of parsers behaves very noisily when it comes to
  the test for illegal XML making it hard to notice that the test
  actually succeeds (not fixed)

Two issues with XML::SAX might be of broader interest:

[not applicable to this thread:
- - ParserDetails.ini
  XML::SAX (::ParserFactory) uses a text file ParserDetails.ini located
  in the folder SAX.pm resides in or (Debian only?) under /etc/perl/...
  This file contains the list of known parsers in this installation and
  their properties. For myself I have noticed several times that this
  file was not generated (because of non-interactive dependency installs?)
  and subsequently installed individual parsers were not registered.
  My impression is that the XML::SAX framework should fall back to
  XML::SAX::PurePerl installed by the package itself but this does
  not seem to happen in the Net::OAI::Harvester test suite (maybe
  because the parser is requested with a required feature).
  Status: not tackled yet
  cf.  http://perl-xml.sourceforge.net/faq/#parserdetails.ini 
]

- - latest version of XML::SAX::Base
  some sub-modules of Net::OAI::Harvester use the get_handler() method
  supplied by XML::SAX::Base as of version 1.04. This module is
  literally hidden in the XML::SAX distribution (source is generated by
  Makefile.PL to prevent indexing of the module on CPAN). There is a
  strain of standalone versions of XML::SAX::Base on CPAN ending at
  version 1.02 which does not contain the method in question. (The README
  of this standalone module gives the warning that you probably do not want
  to install this module but the complete XML::SAX framework).
  When you explicitly request installation of XML::SAX::Base there is
  a probability that this fetches version 1.02 and takes precedence over or
  actually overwrites version 1.04 installed by XML::SAX and there is
  absolutely no upgrade path: XML::SAX::Base 1.02 must be uninstalled/removed
  then for things to work again.
  For Net::OAI::Harvester I have refined the requirement of XML::SAX::Base
  to the specific version 1.04 and I'm awaiting the CPAN Tester reports
  to come in: It might well be that more systems than before are entrapped
  to install the wrong module when performing Build installdeps and thus
  effectively cut themselves off from executing the tests at all.


Thomas Berger



Am 18.05.2011 09:14, schrieb Saiful Amin:

 I am attempting to use some code which depends on Net::OAI::Harvester,
 but my attempts to install OAI::Harvester are running into problems
 with:

 Any suggestions for getting this installed properly?  I'm assuming that
 this is a case where a simple force install isn't going to get me a
 working installation...

 
 I've used Net::OAI::Harvester on both Ubuntu and Windows XP for my projects.
 On XP I've used Strawberry Perl, in which installing using CPAN luckily
 worked without any problem. In Ubuntu, I had to install from synaptic when
 CPAN failed. If I remember correctly, the command was:
 # sudo apt-get install libnet-oai-harvester-perl
 
 It works better than cpan in managing dependencies in my experience.
 
 Regards,
 Saiful Amin
 
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iJwEAQECAAYFAk3ThigACgkQYhMlmJ6W47PypwP9HwJSNbwmtOh+3G+Y4wKFJODS
r1UiDOGc/TDi5zcgRtEHq8lDTlH/CecHYnJv5IN5rJiW2icykoI1Th5lEKX5K90N
s+I8xSpEZXfL5k51hTu7Nql5F8iyF/L7lSyMic3s91/kdAraoDgagcf6pEYg4dRt
Lv2MSAS1EPuU6jdU4JQ=
=QUFO
-END PGP SIGNATURE-


Re: OAI::Harvester installation help

2011-05-17 Thread Thomas Krichel
  Dave Sherohman writes

 Hey, all!  Long-time Perl programmer, but new to the world of libraries,
 so I'm not all that familiar with all the data formats used in these
 parts.
 
 I am attempting to use some code which depends on Net::OAI::Harvester,
 but my attempts to install OAI::Harvester are running into problems
 with:
 
  I am not so familiar with the oai harvesting tools in Perl, so 
  forgive me if I am giving you incorrect information. My vague
  recollection is that there are several oai harvesters for Perl.
  The one I use is different one, I think, http::oai. I suggest
  you try with this. FWIW, I attach a script taht I use to download
  OAI archives. I used to keep a collection of them, based on an
  opendoar listing. I think I will soon stop it.

  I hope this is helpful.

  Cheers,

  Thomas Krichelhttp://openlib.org/home/krichel
http://authorclaim.org/profile/pkr1
   skype: thomaskrichel
#!/usr/bin/perl -w

use lib '/home/mamf/usr/share/perl/';

use strict;
use Data::Dumper;
use Data::Random qw(:all);
use List::Util qw(shuffle);
use File::Basename;
use File::Compare;
use File::Copy;
use File::Find;
use File::Path;
use File::Listing qw(parse_dir);
use File::Temp qw/ tempfile tempdir /;
use File::Touch;
use LWP::Simple;
use HTTP::OAI;
use Storable;
use XML::DOM;
use XML::LibXML;
use Time::Piece;
use Time::Seconds;
# use Sys::RunAlone;

## home-grown
use Mamf::Common;

## the size of the files, in terms of OAI_DC records
my $batch_size=100;


## directories
my $home=$ENV{'HOME'};
my $log_dir=$home/public_html/log;
my $amf_file =$home/amf/oa/oa.amf.xml;


## renewal time of 30 days
my $renewal_time=30*24*60*60;


## counters
my $collection_count=0;
my $no_oai_count=0;


## XML and standards constants
my $amf_ns='http://amf.openlib.org';
my $doar_ns='http://opendoar.org';
my $freelib_ns='http://3lib.org';
my $collection_prefix='info:3lib:oa:';


## run parmeter
my $verbose=0;



##
## first argument will be an archive to do
##
my $to_do_archive=$ARGV[0];

##
## parse the amf file to find the already existing
## 3lib ids and the oai interfaces, recorded in doar
##


## gives  the oai_url for an id
my $oai_urls;
## gives the id for an oai_url
my $ids;
## gives the rID for an oai_url
my $rIDs;
## gives the metadata_formats for an id
my $metadata_format;


##
## open log file
##
my $date=`date -I`;
chomp $date;
my $log_file=$log_dir/down_oa_$date.log;
open(LOG, $log_file);
binmode(LOG,:utf8:);

## populate these varibles, deletes
## archives not to get
parse_oa_amf();

## create in_dirs variable, that contains input
## directories
my @in_dirs;
## an indicator of the input directory
my $in_dir;

foreach my $archive (keys %{$metadata_format}) {
  my $format=$metadata_format-{$archive};
  if(not defined($in_dir-{$format})) {
push(@in_dirs,$home/opt/$format/oa/$archive);
  }
  ## double meaning array
  $in_dir-{$archive}=$home/opt/$format/oa/$archive;
}


if(not $to_do_archive) {
  harvest_all();
}
else {
  print doing $to_do_archive\n;
  eval {
 harvest_to_dir($to_do_archive);
   } ;
}

exit;

##


##
## shuffle the oai_url, find what archives to download
## 
sub harvest_all {
  my @rand_ids=shuffle(keys %{$oai_urls}) ;
  my $ineligible=get_ineligble_archives($renewal_time);
  foreach my $id (@rand_ids) {
open(LOG, $log_file);
binmode(LOG,:utf8:);
my $date=`date --rfc-3339=seconds`;
chomp $date;
print LOG at: $date ;
if($ineligible-{$id}) {
  print LOG not renewing .$id., rID .
$rIDs-{$id}., .$ineligible-{$id}.\n;
  next;
}
## try to catch errors if it bombs out
print LOG  get: $id, rID $rIDs-{$id} from $oai_urls-{$id}\n;
eval {
  harvest_to_dir($id);
} ;
if($@) {
  print LOG error at id $id: $@\n;
  close LOG;
}
  }
  close LOG;
}




sub get_ineligble_archives {
  ## directory where the archives
  my $max_ago=shift;
  ## result, an array reference
  my $r;
  my $count;
  foreach my $in_dir (@in_dirs) {
if(not -d  $in_dir) {
  print LOG making $in_dir\n;
  mkdir $in_dir;
}
#foreach my $dir (`ls $format_dir`) {
#  ## remove newline
#  chomp $dir;
#  ## it hase to have 6-char names
#  my $archive_dir=$format_dir/$dir;
#  if($verbose) {
#print LOG checking $archive_dir\n;
#  }
if(not $in_dir=~m|/([^/]{6})$|) {
  next;
}
my $id=$1;
$r-{$id}=is_eligible($in_dir,$max_ago);
  }
  return $r;
}


## check for archiving time
sub is_eligible {
  ## list xml files, but report no error if they are 
  ## not there
  ## code kept as a transition
  my $archive_dir=shift;
  my $max_ago=shift;
  my $now=time();
  if(not -d $archive_dir) {
print no such dir: $archive_dir\n;
  }
  ## check if it is locked
  my $lock_file=$archive_dir/lock; 
  if(-f $lock_file) {
## remove lock file if