Re: Timeout required for FTP

2003-07-21 Thread Paul Hoffman
I'm not sure this will fix your problem, but you can specify a timeout 
value in the constructor:

$ftpClient = Net::FTP->new(
$FTP_server,
Timeout => $my_timeout,
# etc.
);
The only thing is, the default is 120 seconds, so unless the server is 
up but really, really slow (not *too* slow!) I don't see how it could 
be hanging.  Can you give us some more code?  I also suggest adding 
Debug => 1 (or possibly higher) to the constructor and look at the 
debugging output.

HTH,

Paul.

On Monday, July 21, 2003, at 07:45  AM, G.B.Evans wrote:

Hi all,

I'm having a problem with FTP, where the remote site is only partially 
available and the connect/login/put/get doesn't report an error and 
doesn't time out.
This leads to the situation where a hanging run on a Friday evening 
locks up the whole process until the following Monday, when it can be 
manually cleared; however, in the meantime, the transactions from 
Friday, Saturday & Sunday have all been blocked and we haven't met our 
SLA.

What I should like to be able to do is to put a timeout value on the 
connect/login/put/get operation, to force it to timeout after (say) 10 
minutes - is there a simple way of doing this ?

These are the relevant calls that are being used :

	use Net::FTP;

	$ftpClient = Net::FTP->new($FTP_server);

	$return_value = $ftpClient->login($FTP_username, $FTP_password);

	$return_value = $ftpClient->put($fn);

	$return_value  = $ftpClient->get($fn, $uniqueLocalname);

Any suggestions would be much appreciated.

Geoff.



Geoff Evans
Technical Consultant
Talis Information Ltd.
Birmingham Research Park
Vincent Drive
Birmingham  Phone  +44 (0)121  471  1179
B15 2SQ Fax+44 (0)121  472  0298
United Kingdom  E-Mail [EMAIL PROTECTED]
[...]
--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: newbie question: isbn -> marc?

2003-08-21 Thread Paul Hoffman
On Thursday, August 21, 2003, at 01:10  PM, Bill White wrote:

Is there an easy perl way to retrieve a MARC record for a given ISBN?
Yes, using Z39.50!  :-)

With Net::Z3950 you can do something like this (untested, error 
handling omitted):

  use Net::Z3950;
  my %options = (
  'databaseName' => $db_name,
  'querytype' => 'prefix',
  'preferredRecordSyntax' => 19,  # USMARC
  ... other options ...
  );
  my %searches = (
  'isbn' => '@attr 1=7 @attr 4=1 "%s"',
  'issn' => '@attr 1=8 @attr 4=1 "%s"',
  ... other types of search if desired ...
  );
  my $conn = Net::Z3950::Connection->new($host, $port, %options);
  my $result_set = $conn->search(sprintf $searches{isbn}, $isbn);
  foreach my $i (1..$result_set->size) {
  my $record = $result_set->record($i);
  my $marc = MARC::Record->new_from_usmarc($record->rawdata);
  ... do something with the record ...
  }
  $conn->close;
You can use syntaxes other than prefix, but that's the one I know best.

FWIW, the specs on bib-1 searches in Z39.50 aren't too horribly 
complicated:

  ftp://ftp.loc.gov/pub/z3950/defs/bib1.txt

Some good Z39.50 resources are available:

  http://www.ilrt.bris.ac.uk/discovery/z3950/resources/
  http://www.niso.org/standards/resources/Z3950_Resources.html
The Library of Congress keeps a list of servers that have been made 
available for testing purposes, including their own:

  http://lcweb.loc.gov/z3950/agency/resources/testport.html

Make sure you read the info here before using them:

  http://lcweb.loc.gov/z3950/agency/proced.html#hosts

FWIW, I'm working on a Perl module that makes bib-1 searching via 
Z39.50 a bit simpler:

  use Zoose;
  $z = Zoose->new(
  'host' => $host,
  'port' => $port,
  ...
  );
  $rs = $z->search('isbn' => $isbn);
  foreach my $i (1..$rs->count) {
  $book = $rs->match($i);
  print $book->title_proper, "\n";
  ... other MARC::Record operations here ...
  }
But it will be a little while till it's ready for release.  Bug me if 
you want me to hurry.  :-)

HTH,

Paul.

--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: newbie question: isbn -> marc?

2003-08-21 Thread Paul Hoffman
On Thursday, August 21, 2003, at 02:57  PM, Bill White wrote:

Newbieness continues... I've downloaded & installed Net::Z3950 but it
looks like I can't get through the firewall here at work.  Is there a
way to use a proxy with this module?
Well, I'm pretty clueless when it comes to proxies, but googling 
'Net::Z3950 AND firewall' turned up the same complaint in a message to 
the Koha list:

1. z3950 searches are not working for me.  I get the following errors 
in my
/var/log/koha files.  Any ideas?

Event: '?? Connection.pm:138' died and then $Event::DIED died with: 
Can't
call method "attributes" without a package or object referenc\
e at /usr/lib/perl5/site_perl/5.6.1/i386-linux/Event.pm line 88,  
line
20.
The author later answered himself:

I figured this one out.  Damn firewall!  ;-)
Initially, I thought all the z3950 requests were going out on port 210,
but it turns out it is hitting the library of congress on port 7090
Here's the original post:

  http://lists.katipo.co.nz/public/koha/2002/001088.html

I don't know if this helps or not, but at least you're not alone.  :-)

OTOH, I've seen the same error message from Event.pm ("Can't call 
method "attributes" without a package or object reference") and I 
*think* it was the result of my using the wrong search syntax or 
options, or otherwise screwing up -- and I'm *not* behind a firewall.

Paul.

--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: newbie question: isbn -> marc?

2003-08-22 Thread Paul Hoffman
On Friday, August 22, 2003, at 03:29  AM, Ashley Sanders wrote:

Have you seen the perl binding of Zoom; an easy to use perl
interface to z39.50:
http://zoom.z3950.org/bind/perl/index.html
Yes, thanks.  (Is Net::Z3950 still the only Perl binding to [i.e., 
implementation of] ZOOM?)

Paul.

--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: NACO Normalization and Text::Normalize

2003-08-25 Thread Paul Hoffman
On Monday, August 25, 2003, at 03:29  PM, Brian Cassidy wrote:

The basis for this message is to get a feeling whether or not I should
submit a module that will do NACO normalization
(http://lcweb.loc.gov/catdir/pcc/naco/normrule.html) to CPAN.
[...]
So, away I went and came back with normalize() sub which does the 
trick.
Fabulous!  (Disclaimer: I'd never heard of NACO normalization before, 
but it sounds like it could be useful -- for MARC bib records, too.)

I now wonder if this code would have greater utility as a module on
CPAN.
Yes, please!  (You're not BRICAS on cpan.org, are you?)

And if I do decide to upload it to CPAN, perhaps a base class
(Text::Normalize) should be created to which NACO normalization could 
be
added as a subclass.
I would recommend putting it in the MARC::* namespace, since it's 
specific to MARC records -- maybe MARC::Transform::NACO or some such.

A class hierarchy rooted at MARC:: Transform might be useful, if (for 
example) people wanted to apply arbitrary transformations to a single 
record:

   my @records = ... some MARC::Record objects ... ;
   my @transforms = (
   MARC::Transform::Delete9xx->new,
   MARC::Transform::StripInitialArticles->new,
   some_other_transforms(),
   );
   foreach my $t (@ transforms) {
   $t->transform($_) foreach @records;
   }
Thanks for your hard work.

Paul.

--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: Inserting a list into an object's attribute

2003-09-25 Thread Paul Hoffman
On Wednesday, September 24, 2003, at 05:44  PM, Eric Lease Morgan wrote:

How do I insert a list into an object's attribute?

[...]

I suppose I could stuff a reference to the @ids array into 
$self->{term_ids}
like this:

  if (@ids) { $self->{term_ids} = [EMAIL PROTECTED] }
Exactly!

But then how do I dereference items in my object?
sub foo {
my ($self) = @_;
my $ids = $self->term_ids;
# How to dereference a variable that holds an array ref:
foreach (@$ids) { ... }
}
sub bar {
my ($self) = @_;
# How to dereference an expression that returns an array ref:
my @ids = @{ $self->term_ids };
foreach (@ids) { ... }
}
HTH,

Paul.

--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: creating a standard Perl module

2003-10-01 Thread Paul Hoffman
On Tuesday, September 30, 2003, at 06:14  PM, Andy Lester wrote:

At 5:09 PM -0500 9/30/03, Eric Lease Morgan wrote:
How do I create a standard Perl module intended for distribution?
Steal.  Find a good distribution (say, MARC::Record) and adapt from 
there.

I never use h2xs any more.
I second that!  I would start out this way: download a distribution by 
a CPAN author who's got a few modules under their belt (say, INGY or 
ABW or our own PETDANCE) and delete everything but the following:

  Changes
  lib/  (delete its contents)
  Makefile.PL
  MANIFEST.SKIP
  README
  t/(delete its contents)
Whip up a dead simple module Foo in lib/Foo.pm, something like this:

  package Foo;
  use strict;
  use vars qw($VERSION);
  $VERSION = '0.01';
  sub new { bless {}, shift }
  sub bar { rand }
Whip up a dead simple test in t/bar.t:

  use Test::More tests => 5;
  use_ok( 'Foo' );
  my $foo = Foo->new;
  ok( $foo );
  isa_ok( $foo, 'Foo' );
  my $rand = $foo->bar;
  ok( $rand >= 0.0, 'greater than or equal to 0.0' );
  ok( $rand < 1.0,  'less than 1.0' );
Just something simple to help you get the right distribution details 
taken care of without having to deal with your modules themselves (yet).

Hack on MANIFEST.SKIP if you like -- for example, I'm running Mac OS X 
so I make sure the following line is in there (.DS_Store files have no 
meaning except on Mac OS X [and darwin generally?]):

  \.DS_Store$

Hack on Makefile.PL (just delete anything that doesn't make sense), 
then do something like this:

  % perl Makefile.PL
  % make manifest
  % make distcheck
  % make skipcheck
  % perl Makefile.PL
  % make
  % make test
Once you've got `make' and `make test' running without errors, rewrite 
Changes and README, move in your modules, etc.

Reading material:

  http://www.cpan.org/modules/00modlist.long.html#Part1-Modules:C
  http://www.cpan.org/modules/04pause.html
Try googling for "my first CPAN module" and see what comes up -- there 
may be some good advice for new CPAN authors.
at some point
Also, you might want to look into Module::Build -- you can use it 
instead of, or in addition to, ExtUtils::MakeMaker.

HTH,

Paul.

--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: Return values from MARC::Record

2003-11-06 Thread Paul Hoffman
On Thursday, November 6, 2003, at 01:14  PM, Leif Andersson wrote:

With the same BAD record we try $subfield = eval { 
$record->field($tag)->subfield($sub) }
This is the only case where we have to put the code in eval.
Should MARC:: take care of the eval for us? I am beginning to think so.
No, it can't.  Just add your own error checking, something like this 
for example:

   my $field = $record->field($tag) || die "No tag '$tag' in record";
   $subfield = $field->subfield($sub);
MARC::Record::field has no way of knowing that your code will invoke 
the method 'subfield' on the value it returns.  You could also do it 
like this if you want to keep things concise:

   $subfield = ($record->field($tag) || die "No tag '$tag' in 
record")->subfield($sub);

But that's getting a wee bit obfuscated.

Paul.

--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: Return values from MARC::Record

2003-11-07 Thread Paul Hoffman
On Thursday, November 6, 2003, at 08:45  PM, Leif Andersson wrote:

Assume we have a record with two 035 fields
035 -- $91234567
035 -- $a(XX)12345678
Now, this code will get the 035 $9 subfield:
$subfield  = eval { $record->field('035')->subfield('9') };
@subfields = eval { $record->field('035')->subfield('9') };
But this it will fail getting 035 $a
$subfield  = eval { $record->field('035')->subfield('a') };
@subfields = eval { $record->field('035')->subfield('a') };
I don't see how this can be.  Am I missing something?

Paul.

-Ursprungligt meddelande-
Från: Paul Hoffman [mailto:[EMAIL PROTECTED]
Skickat: den 6 november 2003 22:30
Till: Leif Andersson
Kopia: [EMAIL PROTECTED]
Ämne: Re: Return values from MARC::Record
On Thursday, November 6, 2003, at 01:14  PM, Leif Andersson wrote:

With the same BAD record we try $subfield = eval {
$record->field($tag)->subfield($sub) }
This is the only case where we have to put the code in eval.
Should MARC:: take care of the eval for us? I am beginning to think 
so.
No, it can't.  Just add your own error checking, something like this
for example:
my $field = $record->field($tag) || die "No tag '$tag' in record";
$subfield = $field->subfield($sub);
MARC::Record::field has no way of knowing that your code will invoke
the method 'subfield' on the value it returns.  You could also do it
like this if you want to keep things concise:
$subfield = ($record->field($tag) || die "No tag '$tag' in
record")->subfield($sub);
But that's getting a wee bit obfuscated.

Paul.
--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: Return values from MARC::Record

2003-11-07 Thread Paul Hoffman
On Friday, November 7, 2003, at 08:23  AM, Paul Hoffman wrote:

On Thursday, November 6, 2003, at 08:45  PM, Leif Andersson wrote:

Assume we have a record with two 035 fields
035 -- $91234567
035 -- $a(XX)12345678
Now, this code will get the 035 $9 subfield:
$subfield  = eval { $record->field('035')->subfield('9') };
@subfields = eval { $record->field('035')->subfield('9') };
But this it will fail getting 035 $a
$subfield  = eval { $record->field('035')->subfield('a') };
@subfields = eval { $record->field('035')->subfield('a') };
I don't see how this can be.  Am I missing something?
D'oh!  Of course it fails to find the 035 $a subfield -- the call to 
$record->field('035') in scalar context only returns the *first* 035 
field.

My bad.

As for the problem at hand, map and grep are your friends:

   my @fields = $record->field('035');
  # --> list of all 035 fields
   my @possible_subfields = map { $_->subfield('a') } @fields;
  # --> list of results of calling subfield('a') on all 035 fields
   my @subfields = grep { defined } @possible_subfields;
  # --> existing 035 $a subfields only
Or, more succinctly:

   @subfields = grep { defined } map { $_->subfield('a') } @fields;

Most Perl programmers are familiar with this idiom, so I wouldn't worry 
about it being hard to understand.

Paul.

-Ursprungligt meddelande-
Från: Paul Hoffman [mailto:[EMAIL PROTECTED]
Skickat: den 6 november 2003 22:30
Till: Leif Andersson
Kopia: [EMAIL PROTECTED]
Ämne: Re: Return values from MARC::Record
On Thursday, November 6, 2003, at 01:14  PM, Leif Andersson wrote:

With the same BAD record we try $subfield = eval {
$record->field($tag)->subfield($sub) }
This is the only case where we have to put the code in eval.
Should MARC:: take care of the eval for us? I am beginning to think 
so.
No, it can't.  Just add your own error checking, something like this
for example:
my $field = $record->field($tag) || die "No tag '$tag' in record";
$subfield = $field->subfield($sub);
MARC::Record::field has no way of knowing that your code will invoke
the method 'subfield' on the value it returns.  You could also do it
like this if you want to keep things concise:
$subfield = ($record->field($tag) || die "No tag '$tag' in
record")->subfield($sub);
But that's getting a wee bit obfuscated.

Paul.
--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/

--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: Return values from MARC::Record

2003-11-07 Thread Paul Hoffman
On Friday, November 7, 2003, at 02:07  AM, Leif Andersson wrote:

At the top of MARC::Record we add:
use Want;
This would introduce a dependency on Perl 5.6, which Want requires.  
MARC::Record currently only requires 5.5, as far as I can tell.

Paul.

--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: Return values from MARC::Record

2003-11-07 Thread Paul Hoffman
On Friday, November 7, 2003, at 11:29  AM, Leif Andersson wrote:

You mean 5.005?
Maybe that's what is required.
Oops, yes, thanks for the correction.

Paul.

--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: Early Confusion with MARC::Record

2003-11-14 Thread Paul Hoffman
On Thursday, November 13, 2003, at 10:38  PM, Morbus Iff wrote:

Next up, I've been using the camel.usmarc file as the "file.dat"
equivalent in all the examples. When I ran the first example, I got:
  ActivePerl with ASP and ADO / Tobias Martinsson.
  ...
  Cross-platform Perl / Eric F. Johnson.
which confused me. Is the "title" of a book always considered the 
title AND
author?
Warning: IANAC[ataloger], but...

The 245 field ("Title statement") *must* have an $a subfield ("title" 
or "title proper" without subtitles, according to my copy of OCLC's 
Bibliographic Formats and Standards, 1993, which is normative only for 
WorldCat I believe).  All other subfields, including $c ("remainder of 
title page transcription/statement of responsibility") are technically 
optional but should be used whenever applicable.

Basically, from my imperfect understanding, if you can find a statement 
of responsibility on the title page, then the $c subfield should be 
used.  (Right?)  This may be an editor, in which you'll have something 
like "$c edited by ...".  Of course, it's not always obvious what 
should be considered the title page.

Or is the "author" and "statement of responsibility" the .. . .
same thing?
Not exactly, since the statement of responsibility may designate an 
editor.

If there is a person (or more than one) responsible for the creation of 
the work, then their name (or other identifying phrase, e.g., "Author 
of 'Let's have a revolution!'") should go in a 100 field ("Main 
entry--personal name").  This excludes editors, translators.  In 
practical terms, as I understand it, if their name is on the title 
page, then they also belong in 245 $a.

I suggest looking at MARC records for works that you own, comparing the 
MARC record with the title page etc.  That should help you get a better 
feel for practical MARC usage more quickly than just reading 
documentation.

Paul.

--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: Early Confusion with MARC::Record

2003-11-14 Thread Paul Hoffman
On Friday, November 14, 2003, at 09:40  AM, Bryan Baldus wrote:

I turned "strict_off", to get through the entire file, but
the exported records have had "Invalid indicators forced to blanks." I 
would
like to get these records, and any others that generate errors, and 
save
them (as originally read) to a separate file, in order to correct the
errors.  Since I don't fully understand Perl,
Who does?  :-)

my solution was to modify
MARC::File, by adding a method, "skipget()", based on the existing 
"skip()",
but returning $rec : undef, instead of 1 : undef. My understanding is 
that
this should return the raw, unchanged marc string from the original 
file.
Just FYI, you can modify MARC::File without modifying MARC/File.pm 
itself.  The trick is simply to use a fully qualified name when 
defining the new MARC::File method:

#!/usr/bin/perl -w
use strict;
$| = 1;
use MARC::File;
sub MARC::File::skipget {
... your code here ...
}
... the rest of your script here ...
Better yet, since skipget() might end up in MARC::File some day, you 
can do this:

if (UNIVERSAL::can('MARC::File', 'skipget')) {
warn "Edit this script--MARC::File now has a skipget() method";
} else {
*MARC::File::skipget = sub {
... your code here ...
    };
}
These tricks sometimes come in handy.

Paul.

--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: MARC::Record in CVS and testing

2003-11-25 Thread Paul Hoffman
On Monday, November 24, 2003, at 07:12  PM, Morbus Iff wrote:

Before committing code back to CVS please run the test suite:
If a test doesn't pass please figure out why. Or ask on the list. The
test suite currently fails after some recent changes that you made.
Sorry about that - I've never done test building before, so I'm
not yet in the habit of doing the above. I'll take a look shortly.
Are you familiar with Test::More?  It has some cool features that can 
be tricky (conditionally skipping tests, TODO tests, etc.), so holler 
if you have questions.  I haven't examined MARC::Record's test suite 
closely, but what I've seen looks very well done, so you should be able 
to learn from it.

Paul.

--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: Extracting data from an XML file

2004-01-05 Thread Paul Hoffman
On Monday, January 5, 2004, at 03:54  PM, Eric Lease Morgan wrote:

To create my HTML files with rich meta data, I need to extract bits and
pieces of information from the teiHeader of my originals. The snippet 
of
code below illustrates how I am currently doing this with XML::LibXML:

[...]

The code works, but is really slow. Can you suggest a way to improve 
my code
or use some other technique for extracting things like author, title, 
and id
from my XML?
Check out XML::Twig, which uses XML::Parser.  It gives you -- in tree 
form -- only those elements you're interested in.  From the README:

   One of the strengths of XML::Twig is that it let you work with files 
that
   do not fit in memory (BTW storing an XML document in memory as a 
tree is
   quite memory-expensive, the expansion factor being often around 10).

   To do this you can define handlers, that will be called once a 
specific
   element has been completely parsed.

I *think* your code would then look like this:

   use XML::Twig;

   my ($author, $title, $id);

   my $twig = XML::Twig->new('twig_roots' => {
  'teiHeader/fileDesc/titleStmt/author' => sub { $author = 
$_[1] },
  'teiHeader/fileDesc/titleStmt/title'  => sub { $title  = 
$_[1] },
  'teiHeader/fileDesc/publicationStmt/idno' => sub { $id = 
$_[1] },
   })->parsefile('/foo/bar.xml');

   $twig->purge;

This is totally untested -- I don't even have XML::Twig installed, I'm 
just going by the documentation on CPAN.

For more info (including a tutorial) see 
http://www.xmltwig.com/xmltwig/>.

Paul.

--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: Extracting data from an XML file

2004-01-06 Thread Paul Hoffman
On Monday, January 5, 2004, at 10:27  PM, Eric Lease Morgan wrote:

Fourth, I tried both of these approaches plus my own, and timed them. 
I had
to process 1.5 MB of data in nineteen files. Tiny. Ironically, my 
original
code was the fastest at 96 seconds.
Yikes!

The XSLT implementation came in second
at 101 seconds,
Yikes again.

and the XML::Twig implementation, while straight-forward
came in last as 141 seconds. (See the attached code snippets.)
Did you try using 'twig_roots' instead of 'TwigHandlers' in the 
constructor?  Also, it might speed up if you purge the twig at the end 
of each handler; this is supposed to release memory.

  # using XML::Twig
  print "Processing $file...\n";
  my ($author, $title, $id);
  my $author_xpath = 'teiHeader/fileDesc/titleStmt/author';
  my $title_xpath = 'teiHeader/fileDesc/titleStmt/title';
  my $id_xpath = 'teiHeader/fileDesc/publicationStmt/idno';
  my $twig = new XML::Twig('twig_roots' => {
$author_xpath => sub {$author = $_[1]->text; $twig->purge },
$title_xpath  => sub {$title  = $_[1]->text; $twig->purge },
$id_xpath => sub {$id = $_[1]->text; $twig->purge }});
  $twig->parsefile($file);
  $twig->purge;
  print "  author: $author\n   title: $title\n  id: $id\n\n";
Have you considered using a regular expression to extract the teiHeader?

Maybe the folks at PerlMonks would have some helpful suggestions.

Paul.

--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: MARC.pm questions

2004-05-18 Thread Paul Hoffman
On Friday, May 14, 2004, at 09:14  PM, Eli Naeher wrote:
b. Is there a way to manually specify a 008 with MARC.pm? The 
following doesn't work:

$marc->addfield({record=>"$record", field=>"008",
  value=> [a=>$eight]});
You're stringifying $record unnecessarily; I doubt this is the problem, 
but you never know.  (What's in $record?)  Try 
$marc->addfield({record=>$record, ...}) instead.

(The examples in MARC.pm use unnecessary stringification, so I don't 
blame you!)

Paul.
--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: Displaying diacritics in a terminal vs. a browser

2004-07-01 Thread Paul Hoffman
Unless I'm very much mistaken, Chris's code is outputting UTF-8 to
the terminal, not MARC-8.
The key is to find a terminal program that correctly displays UTF-8.
I doubt you'll have any trouble finding one -- for example, there
are at least two for Mac OS X alone (Terminal.app and iTerm).
Depending on your platform, freshmeat.net or tucows.com may be the
place to go.  This thread from the linux-utf8 list may also be
helpful (I googled for 'terminal UTF-8'):
http://mail.nl.linux.org/linux-utf8/2003-07/msg00231.html
Paul.
On Thursday, July 1, 2004, at 11:22  AM, Houghton,Andrew wrote:
From: Christopher Morgan [mailto:[EMAIL PROTECTED]
Sent: 01 July, 2004 10:50
Subject: Displaying diacritics in a terminal vs. a browser
I use the $cs->to_utf8 conversion from MARC::Charset to
display MARC Authority records in a browser, and the
diacritics display properly there.
But they don't display properly via SDTOUT in my terminal
window (I get two characters instead of one -- one with the
letter and one with the accent mark). Am I doing something
wrong? I'm using:
binmode (STDOUT, ":utf8");
Is there any way around this problem, or is it a limitation
of terminal displays?
I'm not sure what MARC::Charset does internally, but MARC-8
defines the diacritic separate from the base character.  So
even using binmode(STDOUT,":utf8") will produce two characters,
one for the base character followed by the diacritic.  If you
want them combined then you need to combine them.
It just so happens that I have recently been converting MARC-XML
to RDF.  The RDF specification mandates Unicode Normal form C,
which means that the base character and the diacritic are
combined.  MARC-XML uses Unicode Normal form D, which means that
the base character is separate from the diacritic.  So I hacked
together some Perl scripts to convert Unicode NFD <-> Unicode NFC.
The scripts require Perl 5.8.0.
I was talking with a colleague, just yesterday, about whether we
should unleash these on the Net...  They need to be cleaned up a
little and need some basic documentation on how to run the Perl
scripts.
Andy.
Andrew Houghton, OCLC Online Computer Library Center, Inc.
http://www.oclc.org/about/
http://www.oclc.org/research/staff/houghton.htm
--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: Syntax for Multiple conditions

2005-03-23 Thread Paul Hoffman
On Mar 21, 2005, at 1:47 PM, Bryan Baldus wrote:
See perldoc perlop for Equality Operators. I believe:
   if ($field_100 && (($fic == '1') and ($juvie == 'j'))) {
should be
   if ($field_100 && (($fic eq '1') and ($juvie eq 'j'))) {
Also, the parens aren't necessary:
if ($field_100 and $fix eq '1' and $juvie eq 'j')
Or equivalently:
if ($field_100 && $fix eq '1' && $juvie eq 'j')
Or equivalently (not that I would recommend this syntax):
if ($field_100 && $fix eq '1' and $juvie eq 'j')
This is because 'a eq b' expressions are evaluated before 'c && d' 
expressions, which are evaluated before 'e and f' expressions.

Paul.
--
Paul Hoffman :: [EMAIL PROTECTED] :: http://www.nkuitse.com/


Re: History of MARC/Perl

2008-11-06 Thread Paul Hoffman
On Thu, Nov 06, 2008 at 10:19:48AM -0600, Hahn, Harvey wrote:
> Galen Charlton wrote:
> |As part of a class project, I and my colleagues Diana Weaver and Kevin
> |Ford plan to create a digital archive to record the history of
> |MARC/Perl (i.e., MARC.pm and MARC::Record).
> 
> As Ed Summers pointed out, "It seems that the archives for the original
> list at vims.edu are no longer available."  However, they *are*
> partially available on the Internet Archive.  If you go to
> <http://www.archive.org/> and search for "http://vims.edu/perl4lib"; (an
> educated guess on my part, based on Ed Summers' info; perhaps there were
> other later addresses along the way) on the Wayback Machine, you will
> find 26 snapshots of that address.  However, except for one circular
> reference (Apr 15, 2001), only the pages from Dec 6, 1999, to Jun 30,
> 2001, actually contain any information; the remaining pages show only an
> empty directory.
> 
> If you use the latest address (Jun 30, 2001), you will find viewable
> articles and viewable Perl4Lib archive messages from Jul 29, 1999 to Jul
> 30, 2000.  (Based on a very cursory examination, these include the
> seemingly earliest list messages relating to MARC.pm and its
> development.)  You can also find downloadable files of the earliest
> versions of MARC.pm available there via archived CPAN links.  (I checked
> the current CPAN site, and the oldest--that is, less than 1.0--are not
> there; but they *are* available through the IA Perl4Lib links to the
> IA-archived CPAN site.)  The link to the IA-archived MARC.pm SourceForge
> site also has some of the early/experimental versions (but not the
> earliest, as found on IA CPAN) of the software that you're seeking.
> 
> Hope this gives you some leads to the early history that you're
> documenting!
> 
> Harvey

You might also check BackPAN:

http://backpan.perl.org/
http://backpan.cpan.org/

The oldest I saw (from a cursory search) was MARC-0.81 from 1999-10-05:

http://backpan.perl.org/authors/id/B/BB/BBIRTH/

Paul.

-- 
Paul Hoffman <[EMAIL PROTECTED]>


Re: A library agnostic datastructure for MARC ?

2010-11-12 Thread Paul Hoffman
On Fri, Nov 12, 2010 at 01:55:32AM +0100, Marc Chantreux wrote:
> hi galen,
> 
> On Thu, Nov 11, 2010 at 07:40:35PM -0500, Galen Charlton wrote:
> > I don't see how a structure like this gets you anywhere closer to an
> > abstraction layer that would permit somebody to code in terms of
> > semantic concepts like title and author instead of MARC tags,
> 
> It doesn't: the fact is we're working on libraries to do that (see
> MARC::Mapper from my other mail) and i really would like to interact
> both with
> 
> - MARC::Record: it's heavily used in the koha ILS
> - the Frederic Demians's MARC lib which is much more modern

Do you mean marc-moose, a.k.a. Marc?  No offense to Frederic but it
seems like overkill to me.  But then I'm biased -- see below.

> - what we at biblibre call a SimpleRecord which is just a hash of non
>   ordered fields
> 
> i don't want to write a web of gateways for all those structures and
> those to come so i propose to have a common way to share between all of
> our works.
> 
> For example: MARC::Template and ISO2709 have internal code to build
> MARC::Records and SimpleRecords so they depends on MARC::Record. I
> really would like to drop this code for something more generic and
> simple.
> 
> > you're looking for a serialization or data structure that is more
> 
> i'm not talking about serialization at all, i'm talking about sharing
> data between marc related tools as PSGI does for the web thing. sorry if
> i wasn't clear.

You might be interested in my *non*-object-oriented module -- see
http://search.cpan.org/dist/MARC-Loop/ -- that parses records into a
structure like this:

($leader, \...@fields)

Where @fields is a list of array refs, each like this:

[ $tag, \$rawfield, $delete_flag, $i1, $i2, @subfields ]

And @subfields is a list of array refs, each like this:

[ $subid, \$subval ]

No objects, just data, so you get to (read: have to) slice and dice the
data yourself.  And of course there's a function that goes the other way
-- and a looping function that was my main justification for writing the
thing in the first place:

use MARC::Loop qw(marcloop TAG DEL);
marcloop {
my ($leader, $fields) = @_;
# Strip all local fields and print the record if there were any
my $changed;
$changed = $_->[DELETE] = 1 for grep { $_->[TAG] =~ /9/ } @$fields;
print marcbuild($leader, $fields) if $changed;
} \*STDIN;

It's meant to be simple (one module, three main functions, ca. 340 LOC
total), robust (won't barf on invalid UTF-8 or bad leaders), and *fast*
-- but if you prefer an object-oriented interface then it's probably not
a good fit.

Paul.

-- 
Paul Hoffman 


Re: MARC-in-JSON in MARC::Record

2011-01-18 Thread Paul Hoffman
On Tue, Jan 18, 2011 at 08:01:53AM -0500, Galen Charlton wrote:
> Hi,
> 
> On Mon, Jan 17, 2011 at 9:30 PM, Dueber, William  wrote:
> > Note that as of this point,  the marc-in-json spec goes as high as the
> > Record object. A set of records could be represented by, say, the obvious
> > JSON-array of Record objects (which may necessitate a JSON pull-parser if
> > you’ve got a bunch of them) or using some sort of End-of-Record delimiter —
> > I’m a fan of eliminating optional-whitespace newlines from the JSON and
> > putting one JSON object on each line (“newline-delimited JSON”) for
> > easy-peasy processing.
> 
> I'll make MARC::Batch support both.  Newline-delimited JSON has the
> virtue of simplicity, but isn't correct, dagnabbit!  Fortunately,
> JSON::XS can fake a pull parser well enough for our purposes.

IMO this belongs in a separate module, not in MARC::Batch or
MARC::Record.  Small pieces, loosely joined!

Paul.

-- 
Paul Hoffman 


Re: MARC-in-JSON in MARC::Record

2011-01-18 Thread Paul Hoffman
On Tue, Jan 18, 2011 at 09:09:59AM -0500, Galen Charlton wrote:
> Hi,
> 
> On Tue, Jan 18, 2011 at 9:06 AM, Paul Hoffman  wrote:
> > IMO this belongs in a separate module, not in MARC::Batch or
> > MARC::Record.  Small pieces, loosely joined!
> 
> MARC::Record and MARC::Batch are frameworks that invoke
> MARC::File::USMARC (and MARC::File::JSON and MARC::File::XML).  It is
> already modular.

Doy!  Sorry, I don't know what I was thinking...

Paul.

-- 
Paul Hoffman 


Re: Add 006 and 007 tags to records?

2011-05-12 Thread Paul Hoffman
On Thu, May 12, 2011 at 11:58:07AM -0400, Nolte, Jennifer wrote:
> I am working on a shell script that will process the MARC records that
> represent items from our e-serials management database, and the thing
> I am stuck on is adding brand new 006 and 007 fixed field tags to each
> record. I am using the marcedit.pl fieldaddtobeg function, but it
> corrupts the records and makes them unparseable.
> 
> Is there a quick and easy way to do this within a script? (I know Perl
> works best with MARC but I am open to any suggestions).

I've written a bunch of MARC manipulation scripts meant to be used at
the command line or in shell scripts; appending 006 and 007 fields would
go something like this:

$ f006='ebd...'
$ f007='aj cafua'
$ marcappend 006 [ $f006 ] 007 [ $f007 ] < $file | marcgroom -0

It should be pretty obvious what marcappend does; marcgroom reorders
fields within a group (-0 to reorder 0xx fields, -1 to reorder 1xx
fields, etc.).

Holler if you're interested.  These two particular scripts only depend
on Getopt::Long -- I wrote them at a time when I preferred to include
MARC record parsing code in each script rather than using MARC::Record
or some such.

Paul.

-- 
(Insert cute and/or geeky .sig here)


Re: Anybody know what this USMARC.pm error is?

2011-05-23 Thread Paul Hoffman
On Sat, May 21, 2011 at 04:10:49PM -0500, Mike Barrett wrote:
> When I run the code below, it works fine for a couple thousand MARC records,
> then starts this:
> str outside of string at C:/Perl64/lib/bytes_heavy.pl line 11.
>  of uninitialized value in integer eq (==) at
> C:/Perl64/site/lib/MARC/File/USMARC.pm line 175.
>  of uninitialized value $tagdata in substr at
> C:/Perl64/site/lib/MARC/File/USMARC.pm line 178.
> str outside of string at C:/Perl64/lib/bytes_heavy.pl line 11.

Are those the exact error messages?

> It does that a few dozen times, then finally dies with:
> str outside of string at C:/Perl64/lib/bytes_heavy.pl line 11.
>  of uninitialized value in integer eq (==) at
> C:/Perl64/site/lib/MARC/File/USMARC.pm line 175.
>  of uninitialized value $tagdata in substr at
> C:/Perl64/site/lib/MARC/File/USMARC.pm line 178.
>  of uninitialized value $tagdata in split at
> C:/Perl64/site/lib/MARC/File/USMARC.pm line 195.
>  of uninitialized value $indicators in concatenation (.) or string at
> C:/Perl64/site/lib/MARC/File/USMARC.pm line 200.
> 't call method "as_string" on an undefined value at getsomefields.pl line
> 25.

Sorry, I can't help with the Perl code, but it sounds to me like a bad
record.

> Here's the record it appears to have choked on while pulling the 245:

Could you please repost the record as an attachment?

Paul.

-- 
Paul Hoffman 


Re: Invalid UTF-8 characters causing MARC::Record crash.

2011-06-17 Thread Paul Hoffman
Ed,

On Fri, Jun 17, 2011 at 10:53:00AM +0100, Edmund Chamberlain wrote:
> Firstly, hello! Its my first time posting and possibly somewhat 
> predictably with a call for help with Unicode stuff.

Ah, yes...

> I've just checked the archive and seen this thread and am having a 
> similar problem, a badly encoded character is causing a while loop 
> through MARC::Batch->next to crash out with:
> 
> utf8 "\x87" does not map to Unicode at 
> /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/Encode.pm line 173.
> 
> I've tried pasting Al's modified decode subroutine and package to the 
> script, but it is still failing. One of the offending records is 
> isolated and attached.
> 
> Any suggestions welcome with regards to further modifying the sub or 
> alternatives to MARC::Batch->next would be welcome. For the scope of the 
> project, I'm limited to large batch files of Marc21.

You could try MARC::Loop, which doesn't care about (or even detect)
character (mis)codings in any way -- it just sees everything as raw
bytes.  Then you can go in and wipe out (or fix) bad byte sequences,
something like this:

use MARC::Loop qw(marcloop marcbuild);
marcloop {
# This code is called once for each record read from standard input
my ($leader, $fields, $rawref) = @_;
if ($$rawref =~ /[^\x00-\x7f]/) {
# The record contains one or more non-ASCII bytes
foreach my $field (@$fields) {
my ($tag, $valref) = @$field;
$strref =~ s{
(   # Valid byte sequences for a single character:
  [\x{00}-\x{7f}]
| [\x{c2}-\x{df}][\x{80}-\x{bf}]
| \x{e0} [\x{a0}-\x{bf}][\x{80}-\x{bf}]
| [\x{e1}-\x{ec}][\x{80}-\x{bf}][\x{80}-\x{bf}]
| \x{ed} [\x{80}-\x{9f}][\x{80}-\x{bf}]
| [\x{ee}-\x{ef}][\x{80}-\x{bf}][\x{80}-\x{bf}]
| \x{f0} [\x{90}-\x{bf}][\x{80}-\x{bf}]
| 
[\x{f1}-\x{f3}][\x{80}-\x{bf}][\x{80}-\x{bf}][\x{80}-\x{bf}]
| \x{f4} 
[\x{80}-\x{8f}][\x{80}-\x{bf}][\x{80}-\x{bf}]
)
|
(
# Oops! -- invalid byte sequence, we assume all
# non-ASCII bytes starting here are bad
[^\x00-\x7f]+
)
}{
if (defined $2) {
# Substitute a "fixed" version of the bad byte sequence
print STDERR "Fixing bad byte sequence in field $tag of 
record $.\n";
fixed($2);
}
else {
# Leave it unchanged
$1;
}
}xeg;
}
print marcbuild($leader, $fields);
}
else {
print $$rawref;
}
} \*STDIN;
sub fixed {
my ($str) = @_;
# There are any number of actions you might take to "fix" invalid byte
# sequences; this is just one
return '?';
}

The downside is that MARC::Loop is a separate Perl module you'll have to
download from CPAN and install manually.  I wrote it, so I'm biased, but
I think it's good for people who prefer to (or have to) work closer to
the raw MARC record.  (Fixing miscoded records that MARC::Record et al.
couldn't handle was what motivated me to write it in the first place.)

Paul.

-- 
Paul Hoffman 


Re: Anyone create MFHD records using MARC/Perl

2011-09-09 Thread Paul Hoffman
On Fri, Sep 09, 2011 at 02:55:03PM -0700, Mark Jordan wrote:
> Anyone know if there are any reasons that MARC::Record et al can't be
> used to create dumps of MFHD? For example, are there any
> leader/indicator values that are specific to MFHD that are illegal in
> MARC bib records that might cause MARC/Perl to puke?

There shouldn't be, but if you run into problems with them you might
consider trying MARC::Loop, which doesn't know or care what the data
inside a MARC record is just as long as the record has the correct
structure (24-byte leader, valid field lengths in the directory, two
bytes of indicators in each data field, etc.).

The interface is totally different from MARC::Record, though, so don't
bother with it unless you can't use MARC::Record for some reason and
(preferrably) you grok things like this:

my ($rec_id) = map  { ${ $_->[VALREF] } }
   grep { $_->[TAG] eq '001' } @$fields;

Paul.

-- 
Paul Hoffman 


Re: Fwd: some transformations on file

2013-01-20 Thread Paul Hoffman
On Sun, Jan 20, 2013 at 06:43:38PM +0100, samuel desseaux wrote:
> *the goal is to join properly items with biblio records. 

Let's assume that you have these two files:

(B) Three MARC bibliographic records

1. 001 = 1029
2. 001 = 3884
3. 001 = 1650
(etc.)

(I) Seven MARC item records

1. 001 = 1029
2. 001 = 1650
3. 001 = 1029
4. 001 = 3884
5. 001 = 3884
6. 001 = 1650
7. 001 = 1650

Do you want to produce a *new* file of three records, like this?

1. I1 + I3
2. I4 + I5
3. I2 + I6 + I7

Is this really what you want to have in the end?

> As we have to separate files, it's a bit hard. With MarcEdit, if i 
> merge these two files, it's limited: marcedit doesn't understand that 
> one biblio record can have more than one item :-).  I won't say any 
> more about my library and his exotical old ils i've moved for koha.

It sounds as though what you *really* want in the end is a *single* file 
of three MARC records, like this:

B1 + I1 + I3
B2 + I4 + I5
B3 + I2 + I6 + I7

Is that right?  Here's a rough start in Perl:

>8>8>8>8>8>8>8
use MARC::File;
my ($file, %records);
$file = MARC::File::USMARC->in($bib_records_file);
while (my $bib_marc = read_next_record_from($file) {
my $sysnum = sysnum($bib_marc);
$records{$sysnum} = [ $bib_marc ];
}
$file->close;
$file = MARC::File::USMARC->in($bib_records_file);
while (my $item_marc = read_next_record_from($file) {
my $sysnum = sysnum($item_marc);
push @{ $records{$sysnum} }, $item_marc;
}
$file->close;
print @$_ for values %records;
>8>8>8>8>8>8---->8----

Let us know if you need help writing read_next_record_from() or 
sysnum().

Paul.

-- 
Paul Hoffman 


Re: reading and writing of utf-8 with marc::batch

2013-03-26 Thread Paul Hoffman
On Tue, Mar 26, 2013 at 04:22:03PM -0400, Eric Lease Morgan wrote:
> For the life of me I can't figure out how to do reading and writing of 
> UTF-8 with MARC::Batch.
> 
> I have a UTF-8 encoded file of MARC records. Dumping the records and 
> greping for a particular string illustrates the validity:
> 
>   $ marcdump und.marc | grep Sainte-Face

What is marcdump?

>   245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
>   610 20 _aArchiconfrérie de la Sainte-Face
>   13000 records
>   $ 
> 
> I then run a Perl script that simply reads each record and dumps it to 
> STDOUT. Notice how I define both my input and output as UTF-8:

Try *not* calling binmode and see what happens.  Or just call 
binmode(MARC) without the ':utf8' layer.

>   245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
>   610_aArchiconfrérie de la Sainte-Face
>   13000 records
>   $

This looks like double-encoding:

  6c 27 41 72 63 68 69 63  6f 6e 66 72 c3 83 c2 a9  |l'ArchiconfrÃ.©|
0010  72 69 65  |rie|

LATIN SMALL LETTER E WITH ACUTE is supposed to be c3 a9 (as it is in the 
first marcdump output) not c3 83 c2 a9.

Paul.

-- 
Paul Hoffman