Perl-based MARC record error checking update and questions

2004-06-23 Thread Bryan Baldus
I have updated my MARC related Perl modules and scripts again [1]. Most
notable this time, in addition to the updates to BBMARC.pm and
Lintadditions.pm, is the new module, Errorchecks.pm (MARC::Errorchecks) [2].
It has an associated calling program, lintallchecks.pl [3]. Both are
described below, after the questions.

Associated with the updates, I have a few questions/problems.

Question 1. As part of my updates to [MARC::] Lintadditions.pm [4] and
[MARC::] BBMARC.pm[5], I added validation checks against MARC code lists for
languages, geographic areas, and countries. During the validation, for each
field, I read data from the end of the module into an array (with
readcodedata(), in both BBMARC and Lintadditions), and then use grep to
search for a match, first in the valid codes, then the invalid codes. 
This seems like it imposes a significant amount of work, calling the
readcodedata() subroutine every time I call validate008() or 
check_043 (or check_041 once it is completed) to populate the valid and
invalid code arrays.

I could make those arrays global variables, but if so, what is the best
place to put them?
Is there an easier/more efficient way to handle these validation checks?


--
Question 2: My new module, [MARC::] Errorchecks.pm has a subroutine,
check_all_subs($record), which is designed to call all of the checking
subroutines in the module, compile a list/array of warnings/errors, and
return the array reference. For all of the checks except check_003 and
check_010, the following works:

push @errorstoreturn, (@{[subroutine_name]($record)});
([subroutine_name] being replaced with the appropriate call).

With check_003 and check_010, I get errors relating to uninitialized use of
empty array references. I believe this has to do with 
next-like returns from those subroutines, such as:

unless (($record-field('010'))  
($record-field('010')-subfield('a'))) {return;}

As a workaround, I used the following:

my $errorsin003 = check_003($record);
push @errorstoreturn, @$errorsin003 if ($errorsin003);

Is this what I should do for all of my calls? Should I be returning
something in check_003? If I return a defined/non-zero value, it seems like
my @errorstoreturn array will contain a number of extra, non-error values.

--
Question 3: With all of the checks, using lintchecksall.pl, which calls
everything in MARC::Lint, MARC::Lintadditions, and 
MARC::Errorchecks, the program runs slow. Without rewriting the code as
object-oriented (which, with my limited experience and knowledge is not
currently anticipated any time soon), what sort of optimizations could I
make?

--
Thank you for any suggestions you might have.
--

As stated above and on my home page, I have updated the following:

New module:

Errorchecks.pm (MARC::Errorchecks): Collection of error checking subroutines
similar to MARC::Lint and MARC::Lintadditions. This is currently version
0.95 due to problems with the subroutine calls to check_003 and check_010.
Warnings by the interpreter indicate use of uninitialized Array references
(probably when the program gets to a record without one of those fields).
This module will be updated with additional subroutines, similar to the way
Lintadditions is updated. It is mainly for checking fields which require
data from other parts of the record (Lint's check_xxx subroutines seem to be
limited to single-field checking).

Associated script for using MARC::Errorchecks: lintallchecks.txt. This can
replace most of the error checking scripts, along with the checking portion
of the cleanup full record scripts. It should also work without changes as
Errorchecks.pm is updated with new subroutines.

Changes to my main modules:

Lintadditions.pm:

version 1.01: Updated June 17, 2004. Released June 20, 2004.

-Added validation of 043 against GAC list.
-Added check_082.
-Added checks for $b, $h, $n, and $p in 245.
-Other changes/fixes.

BBMARC.pm:

Version 1.04: Updated June 16, 2004, released June 20, 2004

-Updated as_formatted2() to work with MARC::Record 1.38 
(is_control_field() instead of is_control_tag()
-Fixed bug in validate008 for visual materials running time (hypen was not
escaped, so it was being interpreted as a range indicator).
-Added parse008date($) to allow user to enter yymmdd and get
\tmm\tdd\t$error string back (for other uses).
-Added DATA containing codes from the MARC lists for Countries, Geographic
Areas, and Languages, to 2003. Each code set is separated by tabs, and
Obsolete codes are given following each set of valid codes, in the same
format.
-Added readcodedata() subroutine for reading in the data and returning the
data in an array for use by validation code, such as in validate008()
-Modified validate008 subroutine to use the DATA to validate 

baffling perl/linux problem

2004-06-23 Thread Jon Legree
Desperately seeking the help of a perl/linux-meister

I consider myself an intermediate perl programmer, but this problem has had
me completely stumped for days. We've been using the Public Access Terminal
Control program (http://patc.sourceforge.net) to control internet access and
it has worked perfectly for us until last week. The perl script which
authenticates users suddenly stopped working properly, although there were
no changes made to the perl code.

The script uses regular expression matching and grep to find users in a flat
file database of card numbers, and in a daily-generated list of guest user
numbers. If a user attempts to login using a valid barcode that is in the
database, the script authenticates the user and logs them on as it should.
If the user types in an invalid barcode number, the script refuses the login
as it should.

THE PROBLEM:
If a user enters a single non-numeric character - any letter of the
alphabet -  or any random sequence of letters, the script will authenticate
the user as a guest and log them on (guest user bnumbers start with a
letter).

THE QUESTION:
Is there anything in the system environment (Red Hat linux 7.1, perl 5.6.0 -
upgraded to 5.6.1 in an attempt to fix the problem, apache 1.3.x) that would
suddenly cause grep or regular expression functions, or simple file
access/reading functions to stop working or not work right? This system has
worked perfectly for 3 years, and nothing has been changed recently.


Any suggestions, comments, assistance will be greatly appreciated.

TIA

Jon Legree
Library Technology Specialist
Yorba Linda Public Library
Yorba Linda, CA
http://www.ylpl.lib.ca.us