New version of MARC::Lint available

2014-07-21 Thread Bryan Baldus
I have posted a new version of MARC::Lint to CPAN [1]. This version applies the 
changes found in MARC 21 updates 17 [2] and 18 [3].

[1] http://search.cpan.org/~eijabb/MARC-Lint_1.48/
[2] http://www.loc.gov/marc/up17bibliographic/bdapndxg.html
[3] http://www.loc.gov/marc/bibliographic/bdapndxg.html

Thank you for your time.

Bryan Baldus
bryan.bal...@quality-books.com
eij...@cpan.org
http://home.comcast.net/~eijabb/



RE: Customizing MARC::Errorchecks

2012-07-17 Thread Bryan Baldus
On Tuesday, July 17, 2012 4:34 PM, Shelley Doljack [sdolj...@stanford.edu] 
wrote:
I'm playing around with using MARC::Errorchecks for reviewing ebook records we 
get from vendors. I want to make some modifications to the module, but I find 
that if I do so in a similar manner described in the tutorial for customizing 
MARC::Lint, by making a subclass of the module, it doesn't work. Is this not 
possible with Errorchecks?

Indeed, MARC::Errorchecks was not written in the object-oriented style that 
MARC::Lint uses. Skimming through the code just now (I've not worked with it as 
regularly as I might like to be able to keep it fresh in my memory), I believe 
it is essentially a collection of subs with a wrapper sub to call each 
check--check_all_subs() calls each of the checking subroutines and returns the 
arrayref of any errors found. When I wrote it I was still early in learning 
Perl (and while I've gotten better since then, lack of recent practice working 
with it hasn't necessarily improved my knowledge of the language), so I'm sure 
it's not the most optimized code possible. check_all_subs() and the POD 
comments could serve as an index to each of the checks, with the SYNOPSIS 
showing examples of how to call the individual checks.

That said, if you have ideas for additions or changes, or other questions, I 
welcome hearing about them, either to add to the base module or to help with 
creating a related module of your own. I do know that I need to get working on 
the changes required for RDA records, but haven't really even started looking 
into the challenges those will pose (though that will likely result in a new 
module or more devoted just to RDA, and will also likely require 
changes/subclasses to MARC::Lint and MARC::Lintadditions).

Also of note, I have a newer version I've just uploaded to CPAN [1] with the 
following changes (in addition to those listed below, I plan on removing 
MARC::Lint::CodeData from the Errorchecks distribution and then requiring 
MARC::Lint, which includes CodeData (to hopefully resolve issues with 
installing both module packages at the same time due to this file):

Version 1.16: Updated May 16-Nov. 14, 2011. Released 7-17-2012.
 -Removed MARC::Lint::CodeData and require MARC::Lint
 -Turned off check_fieldlength($record) in check_all_subs()
 -Turned off checking of floating hyphens in 520 fields in 
findfloatinghyphens($record)
 -Updated validate008 subs (and 006) related to 008/24-27 (Books and Continuing 
Resources) for MARC Update no. 10, Oct. 2009 and Update no. 11, 2010; no. 12, 
Oct. 2010; and no. 13, Sept. 2011.
 -Updated %ldrbytes with leader/18 'c' and redefinition of 'i' per MARC Update 
no. 12, Oct. 2010.

Version 1.15: Updated June 24-August 16, 2009. Released , 2009.

 -Updated checks related to 300 to better account for electronic resources.
 -Revised wording in validate008($field008, $mattype, $biblvl) language code 
(008/35-37) for '   '/zxx.
 -Updated validate008 subs (and 006) related to 008/24-27 (Books and Continuing 
Resources) for MARC Update no. 9, Oct. 2008.
 -Updated validate008 sub (and 006) for Books byte 33, Literary form, 
invalidating code 'c' and referring it to 008/24-27 value 'c' .
 -Updated video007vs300vs538($record) to allow Blu-ray in 538 and 's' in 07/04.

[1] While the CPAN indexer works on that: 
http://www.cpan.org/authors/id/E/EI/EIJABB/MARC-Errorchecks-1.16.tar.gz
, I've also posted the file to my website: 
http://home.comcast.net/~eijabb/bryanmodules/MARC-Errorchecks-1.16.tar.gz, 
with text versions of each file visible in:
http://home.comcast.net/~eijabb/bryanmodules/MARC-Errorchecks-1.16

#

Finally, I meant to mention it on this list earlier, but I've posted a new 
version of MARC::Lint, 1.45, to CPAN [2], with the current development version 
(as of now, same as CPAN's version), in SourceForge's Git repository [3]. 
Updates to that module include:
 - Updated Lint::DATA section with Update No. 10 (Oct. 2009) through Update No. 
14 (Apr. 2012)
 - Updated _check_article with the exceptions: 'A  ', 'L is '

#

[2] http://search.cpan.org/~eijabb/MARC-Lint-1.45/
[3] http://marcpm.git.sourceforge.net/git/gitweb.cgi?p=marcpm/marcpm;a=summary

I hope this helps,

Bryan Baldus
Cataloger
Quality Books Inc.
1-800-323-4241x402
bryan.bal...@quality-books.com
eij...@cpan.org
http://home.comcast.net/~eijabb/

RE: MARC::Record / MARC::File::XML bug when fields contain newlines?

2012-01-12 Thread Bryan Baldus
On Thursday, January 12, 2012 11:59 AM, arvinport...@lycos.com 
[mailto:arvinport...@lycos.com]  wrote:
I could have sworn I have processed MARC records containing newlines with no 
problems in the past (I.e., not records converted from XML), though I've never 
tried to validate them with MARCEdit.
...
Looks like MARC::Record is doing its job correctly. Perhaps changing 
MARC::File::XML is in order.

MARC::File::USMARC includes a line in sub _next:

 # remove illegal garbage that sometimes occurs between records
$usmarc =~ s/^[ \x00\x0a\x0d\x1a]+//;

If I remember correctly, I believe this was added a few years ago in response 
to similar questions about new lines appearing in records (or after someone 
experienced problems with new lines and/or end-of-file characters in files of 
records--the new line removal may have always been there; I think I may have 
added 1A after finding it in some files I was working with).

I'm not familiar with MARC::File::XML to know how it deals with end of line 
characters.



Bryan Baldus
Cataloger
Quality Books Inc.
The Best of America's Independent Presses
1-800-323-4241x402
bryan.bal...@quality-books.com
eij...@cpan.org
http://home.comcast.net/~eijabb/


RE: Typo in MARC::Record tutorial.

2011-05-15 Thread Bryan Baldus
On Sunday, May 15, 2011 4:40 PM, Mike Barrett [coffeeisl...@gmail.com] wrote:

In the MARC::Batch example is this line:
5 my $batch = MARC::Batch('USMARC', 'file.dat');

I just found out it should be:
5 my $batch = MARC::Batch-('USMARC', 'file.dat');

Or (based on code in programs I've been using):

my $batch = MARC::Batch-new('USMARC', 'file.dat');

##

Bryan Baldus
bryan.bal...@quality-books.com
eij...@cpan.org
http://home.comcast.net/~eijabb/

RE: Moose based Perl library for MARC records

2010-11-11 Thread Bryan Baldus
2010/11/11 Frédéric DEMIANS f.demi...@tamil.fr:
 Thanks all for your suggestions. I have to choose another name for sure.
 Marc::Moose seems to be a reasonable choice. But I'm very tempted by a
 shorter option: MarcX, MarcX::Record, MarcX::Parser, MarcX::Reader::Isis,
 etc. Any objection?


Since MARC is an acronym, I believe all of its letters should be capitalized. 
Trying to remember to lowercase some of them while coding would make me less 
likely to want to use your modules.

As for adding another top level instead of keeping MARC:: as the primary prefix 
for the modules, since the modules you are working on seem to be dealing with 
manipulating standard MARC records rather than something new called MarcX, 
I'd say MARC:: would be the place I'd expect to find such modules.

Thursday, November 11, 2010 8:28 AM Dueber, William [dueb...@umich.edu]:
I think we should revisit Biblio::. Yes, I know MARC isn't used only for 
bibliographic data, but it's sure as hell not used to speak of outside the 
library/museum world. 'Biblio' might not be perfect, but it's certainly not 
misleading in any meanigful way.

As mentioned above, MARC::* is where I'd be likely to look for modules related 
to manipulating MARC records. Maybe it's because I haven't needed any of the 
Biblio::* modules, but I'd be less likely to look there for MARC manipulation 
modules. Since the modules under discussion appear to be an alternative to the 
current standard modules for MARC manipulation, the MARC::Record family, it 
seems like something within MARC::* would be appropriate (as long as the names 
don't interfere with the existing modules but instead can be used in 
cooperation with them).


Bryan Baldus
bryan.bal...@quality-books.com
eij...@cpan.org
http://home.comcast.net/~eijabb/



RE: MARC::Field-subfields function

2010-09-08 Thread Bryan Baldus
On Wednesday, September 08, 2010 3:51 PM, Justin Rittenhouse 
[mailto:jritt...@nd.edu] wrote:
I'm relatively new to Perl and very new to the MARC::Record module.  I'm 
trying to use the subfields function (my @subfields = $field-subfields();), 
but I'm getting an error:
Can't use an undefined value as an ARRAY reference at 
/usr/lib64/perl5/vendor_perl/5.8.8/MARC/Field.pm line 275.
I'm not familiar enough with Perl to figure out what the function is actually 
doing, so I can't figure out if this is a bug or if I missed something in the 
tutorial.  Other functions off of the $field variable work (I can pull the 
tag, indicator, and as_string functions).

It's difficult to say what went wrong without a little more context. In 
MARC::Lint, to access the subfields of a field, the following code appears 
fairly frequently to break down the subfields into code+data pairs in an array:

#where $field is a MARC::Field object

my @subfields = $field-subfields();
my @newsubfields = ();

while (my $subfield = pop(@subfields)) {
my ($code, $data) = @$subfield;
unshift (@newsubfields, $code, $data);
} # while

###

What does your code look like in the area that is producing the error?

Thank you,

Bryan Baldus
bryan.bal...@quality-books.com
eij...@cpan.org
http://home.comcast.net/~eijabb/



MARC::Lint 1.44

2009-08-29 Thread Bryan Baldus
MARC::Lint's most recent version is maintained in CVS at SourceForge, 
http://marcpm.cvs.sourceforge.net/viewvc/marcpm/marc-lint/, with 
less frequent updates to CPAN, when a significant enough number of 
changes had been made. I have now posted 1.44 to CPAN, which should 
include MARC updates 8 and 9, as well as some other minor changes.


Please let me know of any problems.

Thank you,

Bryan Baldus
bryan.bal...@quality-books.com
eij...@cpan.org
http://home.inwave.com/eija


RE: [Patch] Escape marc tag/code/indicators in Marc::File::XML

2009-07-22 Thread Bryan Baldus
On Wednesday, July 22, 2009 4:10 PM, Galen Charlton 
[galen.charl...@liblime.com] wrote:
Funny you should mention CVS.  I have a general question for the
MARC/Perl hackers: Ed mentioned a while back moving from CVS to a more
modern VCS such as Subversion or (my preference) Git.  I'm willing to
do the legwork to get the repositories moved.  Thoughts?

Speaking as a hobbyist programmer, I've only used CVS, and would hope that a 
move to a different system wouldn't make it a more complicated or difficult to 
use system. Until last November, my main development machine was (and still 
would be) a PowerMac 7500/G3 with MacOS 9. When I tried to update SourceForge 
CVS this May using my Mac, I believe my SSH login failed (it had worked fine in 
August 2008), so I switched to updating SourceForge CVS using WinCvs on my 
Windows Vista laptop (Nov. 2008). I'm not sure what changed to prevent the Mac 
from being able to get a SSH connection to SourceForge, but I chalked it up to 
being an age thing (SourceForge update making old operating systems obsolete; 
or some change to SSH that I couldn't figure out how to fix in the MacSSH 
client; it does seem like it took a little bit of work getting WinCvs set up, 
as well), and that from now on, the Windows machine will be what I need to use 
to be able to update anything on SourceForge. So, as long as there is an 
easy-to-use Windows-based client for the other version control systems, then I 
probably wouldn't have a problem with switching.

Thank you for your time,

Bryan Baldus
bryan.bal...@quality-books.com
eij...@cpan.org
http://home.inwave.com/eija


MARC Errorchecks and Lint Module updates

2008-05-25 Thread Bryan Baldus
I have updated MARC::Errorchecks in CPAN, releasing version 1.14, and 
have updated MARC::Lint in CVS on SourceForge. Changes for each are 
listed below.


MARC::Errorchecks changes:

Version 1.14: Updated Oct. 21, 2007, Jan. 21, 2008, May 20, 2008. 
Released May 25, 2008.


 -Updated %ldrbytes with leader/19 per Update no. 8, Oct. 2007. Check 
for validity of leader/19 not yet implemented.
 -Updated _check_book_bytes with code '2' ('Offprints') for 
008/24-27, per Update no. 8, Oct. 2007.

 -Updated check_245ind1vs1xx($record) with TODO item and comments
 -Updated check_bk008_vs_300($record) to allow leaves of plates (as 
opposed to leaves, when no p. or v. is present), leaf, and 
column(s).
 -Updated test in Errorchecks.t to remove check for LCCN starting 
with year greater than the current year. This was at 2008, which is 
no longer later. A test may be implemented in the future that will be 
less likely to break with the passage of time.



MARC::Lint changes:

- Updated _check_article with the exception 'A to '
- Updated Lint::DATA section with Update No. 8 (Oct. 2007)




Please let me know of any problems, suggestions, etc.

Thank you,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija


MARC-Lintadditions 1.13 update

2007-10-21 Thread Bryan Baldus
I have posted version 1.13 of MARC::Lintadditions to my home page 
[1]. Changes are listed below. To install, place the .pm file in your 
Perl's site/lib/MARC, next to MARC::Lint, MARC::Record, etc.


Included in the tar.gz file (and unzipped version) is 
lintadditions.t.pl.txt, a test file that should pass if everything 
is installed properly.


Version 1.13: Updated Oct. 21, 2007. Released Oct. 21, 2007.

 -Updated check_100 (and by call, all check_1xx, check_7xx, and check_8xx):
 --Non-numeric reduced from non-digits to [0-5, 79], since 6 and 8 
follow different rules.

 --Added check for punctuation preceding $e.
 -Updated check_260, check_440, and check_490 to deal with subfield 6 
being 1st when checking for subfield a as first subfield.


[1] http://home.inwave.com/eija/bryanmodules/

Please let me know of any problems, corrections, suggestions, or questions.

Thank you,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija


MARC::Lint and Errorchecks updated on CPAN

2007-10-03 Thread Bryan Baldus
I've posted updated versions to CPAN of MARC::Lint (v. 1.43) and 
MARC::Errorchecks (v. 1.13). I've also uploaded new versions of 
MARC::Lintadditions (v. 1.12) and a stand-alone copy of 
MARC-Lint-CodeData (v. 1.18) to my personal home page [1].


Note: MARC::Lintadditions is provided as a stand-alone module and 
must be installed manually (copy the .pm to the MARC:: folder, next 
to Lint, Record, Errorchecks, etc.). I still hope to integrate most 
of its checks into MARC::Lint, but progress so far has been rather 
slow due to other projects.


Other notes: The version of MARC::Lint::CodeData provided with Lint 
and Errorchecks should be identical. I've experienced difficulty 
installing both modules through PPM on Windows, perhaps due to 
CodeData being included with both modules.


Changes for each appear below:

MARC::Lint:

1.43Wed October 3 19:36:00 CDT 2007

[THINGS THAT MAY BREAK YOUR CODE]

- Updated Lint::DATA section with Update No. 7 (Oct. 2006)

- MARC::Lint is incompatibile with Business::ISBN versions 
2.00-2.02_01.

Business::ISBN versions below 2 and 2.02_02 or above should work.

- Updated check_record's treatment of 880 fields. Now if the tagno is
880, check_record attempts to look at subfield 6 for the linked tagno
and uses that as the basis for the tagno to be checked.

- Updated _check_article to account for 880, using subfield 6 linked
tagno instead.
- Updated _check_article to account for articles followed parentheses,
apostrophes and/or quotes. Also related bug fixes for counting
punctuation around the article.

- For subfield 6, it should always be the 1st subfield 
according to MARC
21 specifications, so check_245 has been updated to account 
for subfield

6 being 1st, rather than requiring subfield a to be 1st.

- Added new test, test880and6.t for 880 field and for subfield 6.

- Added TODO concerning subfield 9. This subfield is not officially
allowed in MARC, since it is locally defined. Some way needs to be made
to allow messages/warnings about this subfield to be turned off.

- Added TODO concerning subfield 8. This subfield could be the 1st or
2nd subfield, so the code that checks for the 1st few subfields
(check_245, check_250) should take that into account.

- Updated MARC::Lint::CodeData with most recent version.

###

MARC::Errorchecks:

Version 1.13: Updated Aug. 26, 2007. Released Oct. 3, 2007.

 -Uncommented valid MARC 21 leader values in %ldrbytes to remove 
local practice. Libraries wishing to restrict leader values should 
comment out individual bytes to enable errors when an unwanted value 
is encountered.

 -Added ldrvalidate.t.pl and ldrvalidate.t tests.
 -Includes version 1.18 of MARC::Lint::CodeData.


###

MARC::Lintadditions:

Version 1.12: Updated Mar. 1-Aug 26, 2007. Released Oct. 3, 2007.

 -Updated check_042 with new code, ukblderived, from Technical Notice 
for Aug. 13, 2007.
 -Updated check_042 with new code, scipio, from Technical Notice for 
Mar. 1, 2007.
 -Updated check_xxx methods (check_250) to account for subfield '6' 
as 1st subfield.



###

MARC::Lint::CodeData.pm:

Versions 1.15 to 1.18: Updated Feb. 28, 2007-Aug. 14, 2007.


 -Added new source codes from Technical Notice of Aug. 13, 2007.
 -Added new source codes from Technical Notice of July 13, 2007.
 -Added new source codes from Technical Notice of Apr. 5, 2007.
 -Added new country and geographic codes from Technical Notice of 
Feb. 28, 2007.

 -Added 'yu ' to list of obsolete codes.

###

[1] http://home.inwave.com/eija/bryanmodules/

Please let me know of any problems, corrections, or suggestions.

Thank you for you assistance,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija


MARC::Record record length and leader bug?

2007-09-28 Thread Bryan Baldus
A recent posting on the OCLC-CAT discussion list (Help needed de-duping,
editing and exporting a raw MARC file with Connexion) mentions difficulty
the poster is experiencing with using records with too many 949 fields for
MarcEdit to load. This led me to attempt to create a test MARC file to see
if I could replicate the problem. In the process, I believe I may have found
a problem with the way MARC::Record updates the leader for the record
length. Starting with a file containing a minimal raw MARC record (leader,
001 of '1', 008, and 245 of '.'), I ran the file through the loop:

while (my $record = $batch-next()) {
for my $fieldno (0..4810) { #where 4810 was the approximate number
of fields needed to push the record length past 9
my $new_field = MARC::Field-new('949', '', '', a =
$fieldno);
$record-append_fields($new_field);
} #for fields

print OUT $record-as_usmarc(); #where OUT is an export file previously
opened

} # while



The output file shows the start of the leader as 100032pam 22577931.
MARC::Record::set_leader_lengths has a line substr($self-{_leader},0,5)  =
sprintf(%05d,$reclen);. Is this supposed to limit the $reclen to 5
characters, or does sprintf %05d simply append the necessary 0s to make
sure the length is at least 5 digits? Since a record length over 9 is
impossible, it might be good to have MARC::Record complain about exceeding
the record size limit if the $reclen  9, and to not exceed 5 characters
when setting the record length.

Please correct me if I am wrong. Thank you for your assistance,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 


RE: Removing duplicate fields with MARC::record

2007-07-30 Thread Bryan Baldus
Note: my comments are untested and may not work without modification. Some
parts left to the reader to complete.

On Monday, July 30, 2007 2:16 PM, Michael Bowden wrote:
   @m856 = sort {$a cmp $b} @m856;

@m856 has MARC::Field objects. Comparing them as such are unlikely to
produce desired results.
better might be @m856 = sort {$a-as_usmarc() cmp $b-as_usmarc()} @m856,
but then you lose the field object. Better might be to leave out that step
and go on to:

   my %seen = ();
   my @new856 = ();

Instead of going through all fields in the record, you could go through the
856s you have gathered, add them to the %seen hash as usmarc (to facilitate
comparisons), and, as subsequent ones are already seen, delete the field.
After that, you could sort the fields, delete them, and then add back the
sorted fields.

  if (@m856) {
 foreach $f (@m856) {
   #add this field to seen fields if not seen
   unless ($seen{$f-as_usmarc}){
  $seen{$f-as_usmarc} = $f;
   }#unless seen this field's exact data
   else {
  #seen it, so delete current
 $record-delete_field($f);
   } #else seen this field
 } #foreach 856

my @new856 = (); #add values of %seen, sorted according to keys of %seen
###sort remaining/deduplicated 856 fields, delete existing fields, and then
add sorted fields back.
###where @new856 contains the values of %seen, sorted according to the keys
of %seen

   $record-insert_fields_ordered( @new856 );

#

I hope this helps,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 



RE: Using MARC::Record to delete fields

2007-07-16 Thread Bryan Baldus
On Monday, July 16, 2007 8:41 AM, Michael Bowden wrote:
Our MARC records have several 035 fields.  I want to delete all of the 035s

except for the 1st one.

I've modified your code below, removing the foreach field loop. The modified
code remains unfinished, as I'll leave it to you to determine the best way
to remove $first035 from @m035.




while (my $record = $batch-next()) {
   #get first 035 to retain 
   my $first035 = $record-field('035');
   #get all 035s
   my @m035 = $record-field('035');

   ###
   ###remove 1st 035 from @m035 array using array manipulation techniques
   ###

   #remove remaining 035s
   $record-delete_field(@m035);

   print $record-as_formatted(),\n\n;
 
}



I hope this helps,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 



RE: conditional statement: indicator not blank

2007-04-30 Thread Bryan Baldus
On Monday, April 30, 2007 2:18 PM, Corey Harper wrote:
I tried the following variations to no avail:
* != '' (no space)
* != ' ' (with space)
* != undef
* != null
* != '#'
* !- '_'

I ended up having to use the following, which achieved the desired 
effect with any of the above in the first slot:

if ($field_7xx-indicator(2) != '' || $field_7xx-indicator(2) == 0) {

Would it not be better to use string comparison operators, ne and eq, since
indicators may not necessarily be numeric?

Please correct me if I am wrong.

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 


RE: trailing $ on Cat source 040 field

2007-03-30 Thread Bryan Baldus
On Friday, March 30, 2007 12:29 PM, Jackie Shieh wrote:
I was working with a group of records that
the cataloging source for some of them ends
with a dollar sign (see attached records):

040   $aEL$$bspa$cEL$$dDLC
040   $QD$$cQD$

Sometimes, it helps to convert MARC file
to MARCMaker format to review. When I do,
curiously, MARC::Record is not able to read
it back in as_usmarc as the trailing dollar
immediately followed by another subfield has
caused MARC::Record treat it as an empty subfield.

When I open the .mrc file you attached, I see that the 040 reads
EL\x1F$bspa\x1FcEL$. In other words, you have a subfield $ instead of
subfield b. The .mkr file you attached has EL$$bspa$cEL$.

When I convert the .mrc file into a .mrk using MarcEdit, I get:
EL${dollar}bspa$cEL{dollar}, which is technically how the field appears. If
I convert the file with MARC::File::MARCMaker [1], I get:
EL$$bspa$cEL{dollar}. This points to a possible bug in MARCMaker, in that it
makes it impossible to reverse the process and produce an identical .mrc
file. I believe the reason it didn't change the dollar sign to {dollar} in
the 1st instance is because it only converts subfield data, not subfield
codes (assuming that the codes will be characters not needing to be escaped.
This may be a flawed assumption.).

Editing the 040 to have the dollar sign in subfield a followed by the
delimiter character produces EL{dollar}$bspa$cEL{dollar} when using both
MarcEdit and the Perl module.

[1] Latest version on
http://marcpm.cvs.sourceforge.net/marcpm/marc-marcmaker/, CPAN version on
http://search.cpan.org/~eijabb/MARC-File-MARCMaker-0.05/. SourceForge
version has recently updated mrc2mkr and mkr2mrc programs in bin/.

I hope this helps,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 


RE: MARC::Batch

2007-02-26 Thread Bryan Baldus
On Monday, February 26, 2007 11:38 AM, John D Thiesen wrote:
This is presumably an obvious question to most of you, but where do I 
get MARC::Batch?

MARC::Batch is included as part of the MARC::Record distribution. Version
2.0.0 was recently (Jan. 25, 2007) released on CPAN:
http://search.cpan.org/~mikery/MARC-Record-2.0.0/

The development/most recent version is available in CVS on SourceForge:
http://marcpm.cvs.sourceforge.net/marcpm/marc-record/
(http://sourceforge.net/cvs/?group_id=1254)

I hope this helps,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 


RE: Module update for MARC::Record

2007-01-18 Thread Bryan Baldus
I think, because of the number and size of the changes involved, it
would be good to stamp the next version of MARC::Record as 2.0.0.

I very much support its release as v. 2.0.0 (or anything starting with 2).
This distinguishes the new versions requiring modern Perl (post-5.8.0) from
the earlier versions. I haven't used v. 2.x much, but it doesn't seem to be
causing problems in the limited uses I've had with it (but then I'm lucky
enough not to need Unicode at the moment). The main problems I've
experienced have just had to due with the initial installation and updating
due to CPAN/PPM vs. SourceForge versions.

Thank you,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 


Re: MARC21 record to CIP block

2006-07-16 Thread Bryan Baldus

At 4:22 PM -0700 7/13/06, [EMAIL PROTECTED] wrote:

Hi there,

However, can anyone here tell me of any tool or tutorialof how to
create a CIP block out of a MARC 21 record?


I have posted a module, MARC::PCIPmaker.pm to:
http://home.inwave.com/eija/inprocess/. It should work to produce 
an ISBD/PCIP-block display from a raw MARC record (even one coded as 
full-level, though it is designed for CIP-level records). The code is 
based on a straight translation of code, originally in Visual Basic 
(with which I am rather unfamiliar, so the translation may be less 
than perfect), from the Library of Congress' CIP program.


The module is available in tar.gz or directly (with txt extension 
added for download/viewing on Web), but I haven't had a chance to 
create the necessary installation files, so it will need to be 
manually added to the same folder/directory as MARC::Record to be 
used.


Synopsis:

#open MARC file, get MARC::Record/MARC::Batch object
#$PCIPrecord = $batch-next in while loop, as generally constructed 
for most other MARC reading Perl programs


#convert MARC::Record into raw MARC (ISO 2709) format string
my $record_as_marc = $PCIPrecord-as_usmarc();

my ($PCIPblock, @errorsinPCIP) = MARC::PCIPmaker::makecard($record_as_marc);

if (@errorsinPCIP) {
print The following errors were found in generating the PCIP 
block\n, join \n, @errorsinPCIP, \n;

} #if errors
else {
print OUT $PCIPblock;
}#else no errors



I hope this helps. Please let me know of any problems or suggestions.

Bryan Baldus
[EMAIL PROTECTED]
http://home.inwave.com/eija


RE: MARC21 record to CIP block

2006-07-14 Thread Bryan Baldus
I've been working on code to produce a CIP (P-CIP in my case) block from a
MARC record, using a very literal translation of Visual Basic code into
Perl. Currently, it should be able to produce the datablock, but does not
yet insert line breaks/formatting. The module is currently tailored for
QBI's PCIP, but if I have a chance, I may post some of the code to my site
this weekend.

Example of input and output:

MARC (MARCMaker format provided for readability):

=LDR  00880nam  22002898a 4500
=001  qbi02200951\
=002  006bb
=003  IOrQBI\\
=005  20030103071854.0
=008  021205s2003iluabf\\\b001\0deng\d
=010  \\$a  200199
=020  \\$a199649
=037  \\$a$bQBI
=040  \\$aIOrQBI$cIOrQBI
=999  \\$aPCIP for QBI Web pages
=050  \4$aBF575.H27$bS65 2002
=082  04$a158.1$221
=100  1\$aSmith, Rob$q(Robert Bobbie Bob),$d1966-
=245  14$aThe library, the phonebook, and the philosophical origins of
happiness /$cby Rob Smith and Bob Jones.
=250  \\$a1st ed.
=263  \\$a03--
=300  \\$ap. cm.
=504  \\$aIncludes bibliographical references and index.
=650  \0$aHappiness.
=650  \0$aLibraries$xPsychological aspects.
=650  \0$aTelephone$vDirectories$xPsychological aspects.
=700  1\$aJones, Bob$q(Bob Robert Rob),$d1981-


becomes:

Smith, Rob (Robert Bobbie Bob), 1966-
  The library, the phonebook, and the philosophical origins of happiness /
by Rob Smith and Bob Jones. -- 1st ed.
   p. cm.
  Includes bibliographical references and index.
  LCCN 200199
  ISBN 1-996-4-9
 1. Happiness. 2. Libraries--Psychological aspects. 3.
Telephone--Directories--Psychological aspects.  I. Jones, Bob (Bob Robert
Rob), 1981- II. Title. 
  BF575.H27S65 2002
  158.1--dc21
  qbi02200951 

-- 

Bryan Baldus
[EMAIL PROTECTED]
http://home.inwave.com/eija
 


Re: MARC::Lint bug?

2006-06-18 Thread Bryan Baldus

At 6:04 PM -0400 6/16/06, Edward Summers wrote:

On Jun 16, 2006, at 1:27 PM, Bryan Baldus wrote:

MARC::Lint has been revised in SourceForge CVS so that $rules-{$repeatable}
is now $rules-{'repeatable'} for field repeatability.


Are you able to push this out to CPAN?

//Ed


I'll try to produce a CPAN upload this week.

Thank you,
Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija


RE: MARC::Lint bug?

2006-06-16 Thread Bryan Baldus
On Thursday, June 15, 2006 9:19 AM, I wrote:
I think I may have discovered a bug in the way MARC::Lint 
parses tag data.

[or sets rules for repeatability of fields vs. allowed subfields.]

MARC::Lint has been revised in SourceForge CVS so that $rules-{$repeatable}
is now $rules-{'repeatable'} for field repeatability.

Please let me know of any problems.
Thank you,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija


MARC::Lint bug?

2006-06-15 Thread Bryan Baldus
I think I may have discovered a bug in the way MARC::Lint parses tag data.
In _parse_tag_rules:

my $rules = ($self-{_rules}-{$tagno} ||= {});
$rules-{$repeatable} = $repeatable;

then:
for my $line ( @lines ) {
my @keyvals = split( /\s+/, $line, 3 );
my $key = shift @keyvals;
my $val = shift @keyvals;
# Do magic for indicators
if ( $key =~ /^ind/ ) {
$rules-{$key} = $val;
#}

I think having $rules-{$repeatable} and $rules-{$key} (where $key is the
subfield code and $repeatable is passed in from the
tagno_repeatability_description line of the tag data) is causing $repeatable
to be added as an allowable subfield code. I discovered this when wondering
why an 082 $a[B]$ROU$214 did not report an error. I plan on looking at this
tonight or this weekend. What would you suggest as the best way to resolve
this problem? My current line of thinking would have me revising
$rules-{$repeatable} to $rules-{'repeatable'}, and leaving the subfields
as $rules-{$key}. Does this sound reasonable?

Thank you for your assistance,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 


Module updates and HTML parsing question

2006-06-07 Thread Bryan Baldus
I have recently posted updates to several modules, as described below (and
at http://home.inwave.com/eija/).

As mentioned below I'm trying to write something to parse the lists of
updated name authority records with closed dates (posted regularly on OCLC's
website http://www.oclc.org/rss/feeds/authorityrecords/default.htm). I
don't have much experience working with HTML/XML, so I welcome any
suggestions you may have on the best way to parse these files into a plain
text, non-Unicode, tab-separated file of old_heading \t new_heading pairs.
I am not able to install any modules that require compiling, and would like
the solution to work on Mac (Classic) and Windows platforms without having
to be concerned much about character encodings. My plan is to bring each
.htm file up in my Web browser (IE), and then save as a Web page, HTML only,
with the default Unicode (UTF-8) encoding. After saving the files into a
directory, the parsing program will look at each .htm file, pull out the
changed names, and put them into the single plain text file described above.

Thank you for any assistance you may be able to provide,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija

#

Module updates:


MARC::Lint.pm:
(in CVS on SourceForge; not yet updated on CPAN)

  -DATA section updated recently to MARC Update no. 6 (Oct. 2005).

#

MARC::Errorchecks.pm:
(posted to CPAN and on my site)

Version 1.11: Updated June 5, 2006. Released June 6, 2006.

  -Implemented check_006($record) to validate 006 (currently only does 
length check).

  -Revised validate008($field008, $mattype, $biblvl) to use internal 
sub for material specific bytes (18-34)

  -Revised validate008($field008, $mattype, $biblvl) language code 
(008/35-37) to report new 'zxx' code availability when ' ' (3-blanks) 
is the code in the record.

  -Added 'mgmt.' to %abbexceptions for check_nonpunctendingfields($record).

#

MARC::Lint::CodeData.pm:

(Most current version is available through CVS on SourceForge with 
MARC::Lint. Also included in MARC::Errorchecks)

  -Versions 1.05-1.08 were updated with additions of codes from 
technical notices.

#

Lintadditions.pm:
(available on my site only, since I'm still trying to merge most of 
these checks into MARC::Lint, once I find time to write tests for 
each.)

Version 1.10: Updated Oct. 17, 2005-May 18, 2006. Released June 6, 2006.

  -Added check_024() for UPC and EAN validation. Uses 
Business::Barcode::EAN13 and Business::UPC for these checks.

  -check_042() updated with valid source codes from MARC list for sources.

  -check_050() updated to report cutters not preceded by period.

  -Misc. bug fixes, including turning off uninitialized warnings for 
short 007 bytes.

#

MARC::Global_Replace.pm:
(available only on my site, still in pre-alpha stage, so in 
/inprocess/ rather than /bryanmodules/)

Version 0.05--Updated May 1, 2006. Released June 6, 2006.

  -Revised identify_changed_hdgs($field, \%heading_data, 
\%changed_hdgs_sub_a) attempting to resolve problem of closed dates 
vs. open.

Version 0.04--Updated Feb. 13, 2006. Unreleased

  -Modified identify_changed_hdgs($field, \%heading_data, 
\%changed_hdgs_sub_a) to not report headings where new and old are 
identical.

  -Need to strip ending periods for match to work!!
  -Testing needed for sears heading changes--currently appears to fail to
match

#

Script updates:
(available only on my site, still in pre-alpha stage, so in /inprocess/)

LCSHchangesparserpl107.txt

Version 1.07: Updated May 8, 2006

  -Revised changed heading regex to include \ (e.g. ATT)

Version 1.06: Updated Oct. 5, 2005

  -Added 682 parsing

  -New_tag is set to 682 when headings are extracted from that field

  -Global_Replace will need to take these into account during parsing 
and comparison, since there is a chance that the parsing done by this 
script will produce unexpected/unreliable results.

  -682 parsing is incomplete and will likely fail on headings with
qualifiers.

Version 1.05: Updated Aug. 25, 2005


  -Revised parsing to account for some lines previously counted as bad.

#

parsedeathdateslists.pl.txt
(available only on my site, in pre-pre-alpha stage, so in /inprocess/)

No version. Very preliminary test code.

  -Help needed in stripping entities other than subfield delimiter.

  -Help needed in selecting best HTML/XML parser for OCLC's closed dates
lists.

  -Requires pure Perl solution (I have no ability to use a compiler or to 
install extra, non-Perl programs, so only modules that came with Perl 5.6 or
5.8.0 or that are simply pm files for the site/lib directory)

  -Cross-platform capable, non-Unicode/capable of stripping non-ASCII 
characters without worrying about Mac (Classic) vs. Windows character 
sets.

#
#



RE: Question about MARC::RECORD usage

2006-05-03 Thread Bryan Baldus
On Wednesday, May 03, 2006 9:28 AM, Ed @ Go Britain wrote:
In the 245 record it is 
possible to have numerous $n and $p fields which need to be 
output with formating between the fields.

My knowledge of PERL isn't too good and I'm struggling to know 
how to extract these repeated subfields and place formatting 
between the subfields in the prescribed order $a, $b, $n, $p, 
$c. Both n and p could be repeated several times. 

There are times when the proper order would be $a, $n, $p, $b, $c, as well,
aren't there?

At the moment I take each field into a variable eg 

$Field245c = $record-subfield('245','c');

and then output these as follows

   if ($Field245c)
{
$EntryBody = $EntryBody .  --  . $Field245c;
}

However, this approach assigns the first occurance of a 
subfield and I haven't yet discovered a tachnique for 
accessing further subfields.


According to the POD in MARC::Field:
Or if you think there might be more than one you can get all of them by
calling in a list context:

my @subfields = $field-subfield( 'a' );

Alternatively, get all subfields in the field and parse as needed:

my $field245 = $record-field('245');
my @subfields = $field245-subfields();

while (my $subfield = pop(@subfields)) {
my ($code, $data) = @$subfield;
#do something with data 

#or add code and data to array
unshift (@newsubfields, $code, $data);
} # while


###

I hope this helps,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 


RE: Deleting a subfield using MARC::Record

2006-05-01 Thread Bryan Baldus
OK -- here's the call for a vote. All interested perl4lib members are 
encouraged to participate by emailing the list.

+1

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 


Re: Slowdown when fetching multiple records

2006-02-20 Thread Bryan Baldus

At 4:14 PM + 2/19/06, Tony Bowden wrote:

So, it presumably is an issue with the Library of Congress server.
Is there some sort of automatic throttling there? Or is there likely to
be some sort of option that I should be setting, but not?


I've not used Perl-based Z39.50 searching, but I think LC requires a 
6-second pause between searches, to reduce burden on their servers. 
While searching using MarcEdit 4.6 (or pre-5), we were locked out for 
the day for not following this new (Oct. 2005?) rule. MarcEdit 5 beta 
allows a 6-second pause for servers that might require it, like LC's.


I don't know how the Perl modules handle this pause for batch 
searching LC, but it might be why you are experiencing a delay.


I hope this helps,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija


RE: MARC.pm unblessed reference

2005-11-22 Thread Bryan Baldus
 -Original Message-
 From: Aaron Huber [mailto:[EMAIL PROTECTED] 
 Sent: Monday, November 21, 2005 10:33 PM
 To: perl4lib@perl.org
 Subject: MARC.pm unblessed reference
 
 
 Hi All,
 
 I am a complete newbie to this and have been testing out  MARC.pm. 
 I'm trying to return just the ISBN values from a group of MARC
 records. It works fine when I specify the record number, but when I
 put it through the loop it returns the above error.

My recommendation would be that you try to switch from MARC.pm to the
MARC::Record distribution
(see a previous posting to this list
http://www.nntp.perl.org/group/perl.perl4lib/2166). 

Then the code below should accomplish what you want (though I have not
tested it as it appears in this message--see
http://home.inwave.com/eija/fullrecscripts/Extraction/extractisbn.txt for
the original version, for which you would need to also install my
MARC::BBMARC (from my website or MARC::Errorchecks from CPAN, which includes
that module)). In my modified version below, I extract only subfield 'a' of
the 020, and I clean the field to (hopefully) leave only the ISBN portion of
the field--removing any qualifiers. It should be easy enough to modify the
code to extract the entire 020 field as_string() and report it.

I hope this helps,
Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija



#!perl

=head2

Extracts ISBN from file of MARC records.

=cut

###
### Initialize includes ###
### and basic needs ###
###
use strict;
#MARC::Batch is installed as part of MARC::Record
use MARC::Batch;

#
### Start main program ##
#

my $inputfile= 'marc.dat';
#my $exportfile = 'out.dat';
#open(OUT, $exportfile) or die Problem opening $exportfile, $!;

#initialize $batch as new MARC::Batch object
my $batch = MARC::Batch-new('USMARC', $inputfile);
## Start extraction #

my $runningrecordcount=0;
 Start while loop through records in file #
while ( my $record = $batch-next() ) {
$runningrecordcount++;

#get control number for reporting
#my $controlno = $record-field('001')-as_string() if
($record-field('001'));

### loop through each 020 field ###
for my $field020 ( $record-field(020) ) {
my $isbn = $field020-subfield('a') if
($field020-subfield('a'));

if (defined ($isbn)) {
#remove any hyphens
$isbn =~ s/\-//g;
#remove nondigits
$isbn =~ s/^\D*(\d{9,12}[X\d])\b.*$/$1/;

# Now report it
print $runningrecordcount, : , $isbn, \n;

} # if isbn defined
} # for
} # while

##
### Main program done.  ##
##

#
### END OF PROGRAM ##
#
 



MARC-File-MARCMaker to CPAN

2005-10-31 Thread Bryan Baldus
Version 0.05 of MARC::File::MARCMaker has been released to CPAN
(http://search.cpan.org/~eijabb/MARC-File-MARCMaker-0.05/). It has no
internal changes from version 0.04, previously mentioned as being uploaded
to SourceForge, but is simply a version update for initial CPAN release.

Also, I've updated MARC::Doc::Tutorial.pod in CVS on SourceForge
(http://cvs.sourceforge.net/viewcvs.py/marcpm/marc-record/lib/MARC/Doc/Tutor
ial.pod?rev=1.30view=log) with a section on MARCMaker.

Thank you,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 


RE: MARC-File-MARCMaker in CVS

2005-10-25 Thread Bryan Baldus
On Monday, October 24, 2005 9:53 PM, Edward Summers wrote:

Are you planning to release MARC::File::MARCMaker to CPAN? 

I'll plan on finding time to do so this weekend, unless there are
objections/reasons for not uploading.

Also, it might be worthwhile adding a section to MARC::Doc::Tutorial if you
have the energy.

I'll look into doing this.

Thank you,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija

p.s. My announcement message should have read version 0.04 rather than
version 0.4. 


marclint update

2005-09-19 Thread Bryan Baldus
I have updated the marclint program in CVS on SourceForge to report 
errors encountered during the decoding process from raw MARC to 
MARC::Record objects. I also changed tabs to 4 spaces. Please let me 
know if this causes problems with anything.


Thank you,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija


RE: MARC::Record elementary question?

2005-08-29 Thread Bryan Baldus
When I try to run a program just to read through the file 
and display the 082 field of each successive record, I get this message:

Bareword qr not allowed while strict subs in use at 
C:\progra~1\perl\lib\MARC/Record.pm line 209.

I think this is a bug in v. 1.38 of the MARC::Record module. It appears to
be fixed in version 1.39_02 (which, though a developer release, seems stable
for how I've been using it--non-unicode records).

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 



marclint and decoding errors

2005-08-21 Thread Bryan Baldus
Recently, I was presented with a file of MARC 21 (ISO 2709) records 
containing indicators of hex 00. When these are decoded by 
MARC::Record, they generate error messages, from MARC::Field-new():


Invalid indicator \x00 forced to blankInvalid indicator \x00 
forced to blank


If this message had come from MARC::File::USMARC, it would have read 
Invalid indicators \$indicators\ forced to blanks $location for 
tag $tagno\n.


Can the warning message from MARC::Field be updated with the tagno to 
help with identifying which field has the problem?


Also, I'm considering revising the marclint program included in bin/ 
of the MARC::Lint distribution to include any decoding errors 
encountered, as seen below. Is there anything I should consider 
before doing this, or, would this cause problems for anyone using 
marclint?


If this is not a problem, I'll also change the tabs to 4 spaces for 
indentation.


(changes indicated with # at the end of the line)

while ( my $marc = $file-next() ) {
if ( not $marc ) {
warn $MARC::Record::ERROR;
++$errors{$filename};
} else {
++$counts{$filename};
}

#store warnings in @warningstoreturn #+
my @warningstoreturn = (); #+

#retrieve any decoding errors #+
#get any warnings from decoding the raw MARC #+
push @warningstoreturn, $marc-warnings(); #+

$linter-check_record( $marc );

#add any warnings from MARC::Lint #+
push @warningstoreturn, $linter-warnings; #+

if ( @warningstoreturn ) { #revised
print join( \n,
$marc-title,
@warningstoreturn, #revised
,
,
);
++$errors{$filename};
}
} # while

##

Thank you,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija


MARCMaker etc. updates

2005-08-15 Thread Bryan Baldus
I have updated my site with some in-process modules and a new version of the
LCSH changes parser script (in http://home.inwave.com/eija/inprocess/).

MARC::File::MARCMaker.pm is an early version of a module to convert files to
and from MARCMaker format (as used by MarcEdit and the LC tools
http://www.loc.gov/marc/makrbrkr.html. Much of the code was originally
part of the MARC.pm module (character conversion is essentially used
unmodified from that module). The current version appears to successfully
convert files both ways, but has not been fully tested. A future version of
the distribution should include a program similar to marcdump (from
MARC::Record)--1 or 2 programs to convert records both to and from the
format.

MARC::Global_Replace.pm is a very early-stage version of a module to
facilitate global subject heading changes. At present, it appears to
successfully identify changed headings in MARC records (using the included
global_replace_ident.pl script
http://home.inwave.com/eija/inprocess/MARC-Global_Replace0.03/bin/global_re
place_ident.txt (.txt for download)), but has not really been tested in any
serious way.

The LCSH changes parser script
http://home.inwave.com/eija/inprocess/LCSHchangesparserpl104.txt creates a
file (or set of files), allhash.txt, which is used by MARC::Global_Replace.
It takes a folder of LCSH weekly lists (saved as text from the LC site
http://www.loc.gov/catdir/cpso/ and produces files of the changed headings
(along with a bad.txt file containing headings not yet accounted for by the
script).

I have also posted a new version of MARC::Errorchecks (1.09) to CPAN.
Changes are listed there and on my site.

I welcome any comments and suggestions (to [EMAIL PROTECTED]).

Thank you,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 


Errorchecks etc. updates

2005-07-16 Thread Bryan Baldus

(Resending--with apologies for duplication or bouncing)

I have posted updates to several of my MARC-related modules and 
scripts. Changes to each of the main files are listed below.


The files have been posted to my home page 
(http://home.inwave.com/eija). Errorchecks (MARC::Errorchecks) is 
also available on CPAN. CodeData (MARC::Lint::CodeData) is included 
as part of Errorchecks, as well as part of MARC::Lint (in SourceForge 
CVS).


MARC::File::MARCMaker is a very preliminary version. I welcome any 
assistance in improving it. I haven't had time yet to develop working 
tests or test files, though I do have those used for MARC.pm.


The LCSH changes parser script will eventually be used with a new 
module I'm working on, tentatively named MARC::Global_Replace. The 
module is intended to automatically update headings from their old 
form to the newest form. Currently it is almost capable of 
identifying subfield 'a' 6xx headings that have an old heading in the 
LCSH weekly lists.


Module updates:

Lintadditions.pm:

Version 1.09: Updated Mar. 31-Apr., 2005. Released July 16, 2005.

 -check_260() updated to report error if subfield 'a' and 'b' are not present.
 -More '==' etc. changed to 'eq' etc. for indicators.
 -check_082() updated to set $dewey to empty string if no 082$a is 
present before checking for 3 digits.


Errorchecks.pm:

Version 1.08: Updated Feb. 15-July 11, 2005. Released July 16, 2005.

 -Added 008errorchecks.t (and 008errorchecks.t.txt) tests for 008 validation
 -Added check of current year, month, day vs. 008 creation date, 
reporting error if creation date appears to be later than local time. 
Assumes 008 dates of 00mmdd to 70mmdd represent post-2000 dates.
 --This is a change from previous range, which gave dates as 00-06 as 
200x, 80-99 as 19xx, and 07-79 as invalid.
 -Added _get_current_date() internal sub to assist with check of 
creation date vs. current date.
 -findemptysubfields($record) also reports error if period(s) and/or 
space(s) are the only data in a subfield.
 -Revised wording of error messages for validate008($field008, 
$mattype, $biblvl)

 -Revised parse008date($field008string) error message wording and bug fix.
 -Bug fix in video007vs300vs538($record) for gathering multiple 538 fields.
 -added check in check_5xxendingpunctuation($record) for 
space-semicolon-space-period at the end of 5xx fields.

 -added field count check for more than 50 fields to check_fieldlength($record)
 -added 'webliography' as acceptable 'bibliographical references' 
term in check_bk008_vs_bibrefandindex($record), even though it is 
discouraged. Consider adding an error message indicating that the 
term should be 'bibliographical references'?

 -Code indenting changed from tabs to 4 spaces per tab.
 -Misc. bug fixes including changing '==' to 'eq' for tag numbers, 
bytes in 008, and indicators.


MARC::Lint::CodeData:

Version 1.02: Updated June 21-July 12, 2005. Released (to CPAN) with 
new version of MARC::Errorchecks.


 -Added GAC and Country code changes for Australia (July 12, 2005 update)
 -Added 6xx subfield 2 source code data for June 17, 2005 update.
 -Updated valid Language codes to June 2, 2005 changes.


Module in process:

MARC::File::MARCMaker.pm: (zipped and uncompressed as /marc-marcmaker/)

Version 0.02: Updated July 12-13, 2005. Released July 16, 2005.

 -Preliminary version of encode() for fields and records
 -Appears to work when no special chars are present (including dollar signs).
 -See TODO.txt and readme0.02.txt for list of items still needing to 
be done and other notes.
 -Note: This is a pre-alpha release, and little testing has been done 
on the results of decode() or encode().


Added and changed scripts:

Updated LCSH Changes Parser script, LCSHchangesparserpl103.txt:

 -Now creates files with tab-separated lines: old_tag \t old_hdg \t 
new_tag \t new_hdg.

 -Better parsing of weekly files.

#

I welcome any comments and suggestions.

Thank you,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija


RE: LC call number sorting utilities

2005-03-29 Thread Bryan Baldus
On Sunday, March 27, 2005 7:09 PM, Michael Doran wrote:
I recently converted a Library of Congress (LC) call number
normalization routine (that I had written for a shelf list application)
into a couple of Perl LC call number sorting utilities.  

Thank you for this. It seems to work well (45000+ numbers sorted, a quick
scroll-through seems to show everything sorted correctly). However, as
written, it seems to bog down on my machine after a few thousand numbers. 

Instead of:

@input_list = (@input_list, $call_no); 
and 
@sorted_list = (@sorted_list, $call_no_array{$key});

perhaps:

push @input_list, $call_no;
and
push @sorted_list, $call_no_array{$key};

might help to speed things up (it did in my case).

I hope this helps. Thank you,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 


RE: Sort with MARC::Record

2005-01-31 Thread Bryan Baldus
Has anyone sorted a file of hundreds of records by 001?

I haven't done sorting yet, but you may want to see if my MARC::BBMARC's
[1], MARC::BBMARC::updated_record_hash()  sub may be of use. It reads a file
of MARC records and stores them in a hash with the 001 as key, the raw MARC
as value. It should be fairly simple, then, to use this to output the
desired records in the proper order. It should work ok on small files of
MARC records, but depending on your system's memory, may die a horrible
death on large record sets.

My extractbycontrolno script [2] reads a file of control numbers (using
BBMARC's updated_record_array() to save memory) and a separate file of MARC
records, and outputs the matching records. It doesn't do any sorting, so it
depends on the order it finds the records.

[1] MARC::BBMARC is available directly from my homepage at:
http://home.inwave.com/eija/bryanmodules/, 
or bundled with MARC::Errorchecks on CPAN
http://search.cpan.org/~eijabb/MARC-Errorchecks-1.06/
[2]
http://home.inwave.com/eija/fullrecscripts/Extraction/extractbycontrolno.txt

Hope this helps,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 



RE: MARC::Lint update

2005-01-24 Thread Bryan Baldus
There were some unitialized variable warnings, nothing serious. 'make
test' will run perl under the warnings pragma, so 'use warnings' in your
module will help you catch this sort of thing early.

I generally 'use warnings' or use the -w flag in the modules and scripts
I've been writing. I didn't notice it was missing. I need to add strict and
warnings to CodeData, as well. In modules/package files, is it practice to
leave out the shebang (#!perl) line, since the file is not generally
executed directly? If so, is that the reason for 'use warnings' vs. -w? 

 I don't know what editor you use, but .it's been the norm for marc/perl 
module folks to not embed tabs in source code for indentation. vim and 
emacs both support mapping a tab to spaces when you hit the tab key.

I use BBEdit Lite, which has a good global search/replace function. In the
future, I'll try to remember to convert the indentation tabs to 4 spaces per
tab. Are non-indentation tabs ok? In MARC::Lint::CodeData, I used split on
\t to split the codes into a hash. Since some codes have or need spaces,
splitting on \s would probably not work as well.

Thank you for your assistance,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 


MARC::Lint update

2005-01-23 Thread Bryan Baldus
The SourceForge CVS version of MARC::Lint has been updated with new 
checks (041, 043), revisions to check_245, a new internal 
_check_article method, the addition of MARC::Lint::CodeData (for 041, 
043, etc.), and 2 new tests. Watch for further added check_xxx 
methods in the near future, as I move them out of MARC::Lintadditions 
into MARC::Lint.

Thank you,
Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija


RE: Ignoring Diacritics accessing Fixed Field Data

2005-01-11 Thread Bryan Baldus
On Tuesday, January 11, 2005 2:13 PM, Michael Doran wrote:

Assuming that you asking how to strip out the MARC-8 combining diacritic
characters, try inserting the substitution commands listed (as shown below)
just prior to the substr commands:
 my $ME = $field-subfield('a');
  $ME =~ s/[\xE1-\xFE]//g;
 my $four100 = substr( $ME, 0, 4 );

 my $TITLE = $field-subfield('a');
  $TITLE =~ s/[\xE1-\xFE]//g;
 my $four245 = substr( $TITLE, 0, 4 );


You might want to change the procedure for getting the title to skip
articles (untested, may need corrections):

#given $record being the MARC::Record object, and exactly 1 245 field being
present, as required by MARC21 rules
my $titleind2 = $record-$field('245')-indicator(2);
my $TITLE = $field-subfield('a');
$TITLE =~ s/[\xE1-\xFE]//g;
my $four245 = substr( $TITLE, 0+$titleind2, 4 ) if $titleind2 =~/^[0-9]$/;
#the if statement should be unnecessary, since 245 2nd indicator should
always be some number, but just in case.

Hope this helps,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 


Re: Module to read Isis data

2005-01-10 Thread Bryan Baldus
: '579a7c6901c654bdeac10547a98e5b71'
not ok 60 - md5 4
# Failed test (RAM Disk:002_isis.t at line 119)
#  got: 'e605bf7847b50064459fe1071bb8b4df'
# expected: '7d2adf1675c83283aa9b82bf343e3d85'
not ok 61 - md5 5
# Failed test (RAM Disk:002_isis.t at line 119)
#  got: '0e27001d65f9a7d7be485c5f13e17bb8'
# expected: 'daf2cf86ca7e188e8360a185f3b43423'
ok 62 - The object isa Biblio::Isis
ok 63 - count is 5
ok 64 - read_cnt
ok 65 - returns 2 elements
ok 66 - cnt 1 ORDN same
ok 67 - cnt 1 ABNORMAL same
ok 68 - cnt 1 N same
ok 69 - cnt 1 LIV same
ok 70 - cnt 1 K same
ok 71 - cnt 1 ORDF same
ok 72 - cnt 1 FMAXPOS same
ok 73 - cnt 1 NMAXPOS same
ok 74 - cnt 1 POSRX same
ok 75 - cnt 2 ORDN same
ok 76 - cnt 2 ABNORMAL same
ok 77 - cnt 2 N same
ok 78 - cnt 2 LIV same
ok 79 - cnt 2 K same
ok 80 - cnt 2 ORDF same
ok 81 - cnt 2 FMAXPOS same
ok 82 - cnt 2 NMAXPOS same
ok 83 - cnt 2 POSRX same
ok 84 - fetch 1
ok 85 - MFN 1 702:0 ^aHolder^bElizabeth
ok 86 - MFN 1 990:0 2140
ok 87 - MFN 1 990:1 88
ok 88 - MFN 1 990:2 HAY
ok 89 - MFN 1 675:0 ^a159.9
ok 90 - MFN 1 210:0 ^aNew York^cNew York University press^dcop. 1988
ok 91 - MFN 1 801:0 ^aFFZG
ok 92 - fetch 2
ok 93 - MFN 2 215:0 ^aIX, 275 str.^d23 cm
ok 94 - MFN 2 200:0 ^aPsychoanalysis and psychology^eminding the 
gap^fStephen Frosh
ok 95 - MFN 2 990:0 2140
ok 96 - MFN 2 990:1 89
ok 97 - MFN 2 990:2 FRO
ok 98 - MFN 2 210:0 ^aNew York^cUniversity press^d1989
ok 99 - MFN 2 700:0 ^aFrosh^bStephen
ok 100 - fetch 3
ok 101 - MFN 3 200:0 ^aPsychoanalitic politics^eJacques Lacan and 
Freud's French Revolution^fSherry Turkle
ok 102 - MFN 3 990:0 2140
ok 103 - MFN 3 990:1 92
ok 104 - MFN 3 990:2 LAC
ok 105 - MFN 3 210:0 ^aLondon^cFree Associoation Books^d1992
ok 106 - MFN 3 700:0 ^aTurkle^bShirlie
ok 107 - MFN 3 686:0 ^a2140
ok 108 - MFN 3 686:1 ^a2140
ok 109 - fetch 4
ok 110 - MFN 4 200:0 ^aKey studies in psychology^fRichard D. Gross
ok 111 - MFN 4 210:0 ^aLondon^cHodder  Stoughton^d1994
ok 112 - MFN 4 10:0 ^a0-340-59691-0
ok 113 - MFN 4 700:0 ^aGross^bRichard
ok 114 - fetch 5
not ok 115 - MFN 5 200:0 1\#^aPsychology^fCamille B. Wortman, 
Elizabeth F. Loftus, Mary E. Marshal
# Failed test (RAM Disk:002_isis.t at line 104)
#  got: 1
# expected: 0
not ok 116 - MFN 5 225:0 1\#^aMcGraw-Hill series in Psychology
# Failed test (RAM Disk:002_isis.t at line 104)
#  got: 1
# expected: 0
not ok 117 - md5 1
# Failed test (RAM Disk:002_isis.t at line 119)
#  got: 'fbaa4b35c85b289e9fec15ba0f99b14a'
# expected: 'f5587d9bcaa54257a98fe27d3c17a0b6'
not ok 118 - md5 2
# Failed test (RAM Disk:002_isis.t at line 119)
#  got: '14f828e2049a5d8523b6301c7009a3fe'
# expected: '3be9a049f686f2a36af93a856dcae0f2'
not ok 119 - md5 3
# Failed test (RAM Disk:002_isis.t at line 119)
#  got: '67d92a83434115acd98c4cb28b2784ec'
# expected: '3961be5e3ba8fb274c89c08d18df4bcc'
not ok 120 - md5 4
# Failed test (RAM Disk:002_isis.t at line 119)
#  got: 'e605bf7847b50064459fe1071bb8b4df'
# expected: '5f73ec00d08af044a2c4105f7d889e24'
not ok 121 - md5 5
# Failed test (RAM Disk:002_isis.t at line 119)
#  got: '0e27001d65f9a7d7be485c5f13e17bb8'
# expected: '843b9ebccf16a498fba623c78f21b6c0'
ok 122 - deleted found
ok 123 - MFN 3 is deleted
ok 124 - deleted not found
ok 125 - MFN 3 is deleted
# Looks like you planned 110 tests but ran 15 extra.

##

Hope this helps,
Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija


RE: MARC::Record tests

2005-01-06 Thread Bryan Baldus
There is code in MARC::File::MicroLIF::_get_chunk that handles DOS
(\r\n) and Unix (\n) line endings, but not Mac (\r).

This is true, and it seems to work. Unfortunately, it is not reached by the
test, since the test calls decode() directly, instead of going through
_next() or _get_chunk.

Perhaps the:
# for ease, make the newlines match this platform
$lifrec =~ s/[\x0a\x0d]+/\n/g if defined $lifrec;

in _next() should be moved (or added as duplicate code) to decode() just
between the lines:
my $marc = MARC::Record-new();
### $text =~ s/[\x0a\x0d]+/\n/g if defined $text;
my @lines = split( /\n/, $text );

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija


MARC::Errorchecks and other updates

2004-10-19 Thread Bryan Baldus
I have updated my modules again. Changes are listed below. I have also
uploaded MARC::Errorchecks to CPAN [1].
I welcome any comments, questions, or suggestions. I have included
MARC::BBMARC in the MARC::Errorchecks CPAN distribution (though I have no
idea whether the Makefile.PL will install it automatically).

Changes (Oct. 17, 2004)

MARC::Errorchecks:

Version 1.03: Updated Aug. 30-Oct. 16, 2004. Released Oct. 17. First CPAN
version.

 -Moved subs to MARC::QBIerrorchecks
 --check_003($record)
 --check_CIP_for_stockno($record)
 --check_082count($record)
 -Fixed bug in check_5xxendingpunctuation for first 10 characters.
 -Moved validate008() and parse008date() from MARC::BBMARC (to make
MARC::Errorchecks more self-contained).
 -Moved readcodedata() from BBMARC (used by validate008)
 -Moved DATA from MARC::BBMARC for use in readcodedata() 
 -Remove dependency on MARC::BBMARC
 -Added duplicate comma check in check_double_periods($record)
 -Misc. bug fixes
 Planned (future versions):
 -Account for undetermined dates in matchpubdates($record).
 -Cleanup of validate008
 --Standardization of error reporting
 --Material specific byte checking (bytes 18-34) abstracted to allow 006
validation.

MARC::Lintadditions:
Version 1.05: Updated Aug. 30-Oct. 16, 2004. Released Oct. 17, 2004.

 -Moved institution-specific code from check_040 to MARC::QBIerrorchecks.
 --check_040 still present to check $b language (currently commented-out)
 -Moved check_037 to MARC::QBIerrorchecks.
 -Updated check_082 to ensure decimal after 3rd digit in numbers longer than
3 digits.
 -Moved validate007([EMAIL PROTECTED]) from MARC::BBMARC (to make
MARC::Lintadditions more self-contained).
 -Fixed problem in 6xx check for subfield _2 (changed '==' to 'eq').
 -Updated validate007([EMAIL PROTECTED]) (bug fixes, misc. revisions)
 -Updated check_050 to check for unfinished cutters (single capital letter
followed by space or nothing)

MARC::BBMARC:

Version 1.07: Updated Aug. 30-Oct. 16, 2004. Released Oct. 17, 2004.
Included with MARC::Errorchecks upload to CPAN.

 -Moved subroutine getcontrolstocknos() to MARC::QBIerrorchecks
 -Moved validate007() to Lintadditions.pm
 -Moved validate008() and related subs to Errorchecks.pm
 --(Left readcodedata() in BBMARC, but it is now duplicated in
Errorchecks.pm, along with a modified version in Lintadditions.pm).
 --Also left parse008date, which may have uses outside of error checking.
 -Updated read_controlnos([$filename]) with minor changes. 
 --This subroutine could be rewritten in a more general way, since it simply
reads all lines from a file into an array and returns that array.

 

[1] Distribution on CPAN:
http://search.cpan.org/~eijabb/MARC-Errorchecks-1.03/ or 
http://www.cpan.org/modules/by-module/MARC/MARC-Errorchecks-1.03.tar.gz

Thank you,

Bryan Baldus
Cataloger
Quality Books Inc.
[EMAIL PROTECTED]
http://home.inwave.com/eija


RE: DOS EOF character in MARC files

2004-09-14 Thread Bryan Baldus
I don't know if it is correct, but I tried to write a test for the 
DOS EOF removal in MARC::File::USMARC [1]. This is the first test 
file I've tried to write, so please let me know where I went wrong.

 From the dosEOFtest.t file description:

Checks t/dosEOF.usmarc, which contains one record,
which is just sample1.usmarc with \x1a added as a final character.

When writing tests, should one take into account the file 
systems, or is the test end-user expected to deal with this? For 
example, in order to use  dosEOFtest.t under MacPerl, I need to 
change:
my $record_file_name = '/t/dosEOF.usmarc';
to
my $record_file_name = ':t:dosEOF.usmarc';

 From the previous message:

(BTW, when modifying existing code, use the same conventions,
in this case \x1a rather than \1A.)

Sorry about that. I wasn't sure if case mattered, so I used \x1A 
(which BBEdit Lite gave me when I copied in the raw character).

In MARC::File::USMARC, the constants:

use constant SUBFIELD_INDICATOR = \x1F;
use constant END_OF_FIELD   = \x1E;
use constant END_OF_RECORD  = \x1D;

have capital letters, while the garbage removal uses lowercase (as 
are 2 in comments in decode():
$usmarc =~ s/^[ \x00\x0a\x0d]+//;

my $dir = substr( $text, LEADER_LEN, $data_start - LEADER_LEN - 1 ); 
# -1 to allow for \x1e at end of directory

# character after the directory must be \x1e
(substr($text, $data_start-1, 1) eq END_OF_FIELD

#

[1] http://home.inwave.com/eija/bryanmodules/ , file 
dosEOFtest.tar.gz or the directory dosEOFtest for uncompressed versions.

Thank you,
Bryan Baldus
Cataloger (Quality Books Inc.)
http://home.inwave.com/eija/


New versions of MARC error checking modules

2004-08-24 Thread Bryan Baldus
I have once again updated my error checking modules. I believe I have
finished adding most of the new checks I wanted, though I have a few in mind
still. Among these, rewriting the 007 and 008 validation subroutines, adding
006 validation, additional punctuation checks (before title subfields in 6xx
and 7xx), check for underscores in fields (should not appear unless they
stand for the subfield delimiter), and geographic coding in 6xx vs. topical
coding, using a list of common geographic headings (e.g. if United States
appears in 650 subfield a, then error).

I am also working on a global subject heading replacement program, in
conjunction with my LCSH Weekly List Changes Parser.

I am considering distributing my modules (and scripts--probably in the same
tar.gz file) on CPAN, once I figure out how to do so (in an easy manner
using MacPerl). Before I do this, I am thinking my MARC::BBMARC module may
need a new name. Right now it is just named BBMARC for my initials+MARC. It
is a collection of functions with little in common, other than that they
help my MARC related .pls and .pms function. I should probably move the
validate007 and 008 functions to Lintadditions.pm and Errorchecks.pm, making
those modules more self-contained. Would it be advisable to change BBMARC's
name, and if so, do you have suggestions for a new name?

Changes:

(Aug. 22, 2004):

Module updates:

Errorchecks.pm (http://home.inwave.com/eija/bryanmodules/):

Version 1.02: Updated Aug. 11-22, 2004. Released Aug. 22, 2004.

-Implemented VERSION (uncommented) 
-Added check for presence of 040 (check_040present($record)).
-Added check for presence of 2 082s in full-level, 1 082 in CIP-level
records (check_082count($record)). 
-Added temporary (test) check for trailing punctuation in 240, 586, 440,
490, 246 (check_nonpunctendingfields($record)) 
--which should not end in punctuation except when the data ends in such. 
-Added check_fieldlength($record) to report fields longer than 1870 bytes. 
--This should be rewritten to use the length in the directory of the raw
MARC. 
-Fixed workaround in check_bk008_vs_bibrefandindex($record) (Thanks again to
Rich Ackerman). 

Lintadditions.pm (http://home.inwave.com/eija/bryanmodules/):

Version 1.04: Updated Aug. 10-22, 2004. Released Aug.22, 2004.

-Implemented VERSION (uncommented) 
-Revised check_050 exception (Thank you to all who posted about this). 
-Moved VERSION HISTORY to end of module. 
-Added preliminary checking of 245 2nd indicator in check_245 (Thanks to Ian
Hamilton). 

BBMARC.pm (http://home.inwave.com/eija/bryanmodules/):

Version 1.06: Updated Aug. 10-22, 2004. Released Aug. 15, 2004.

-Implemented VERSION (uncommented) 
-Added subroutine getcontrolstocknos() 
-General readability cleanup (added tabs) 
-Bug fix in validate008 for date2 check 

Planned (next release):

-Cleanup of validate008 (and validate007) 
--Standardization of error reporting 
--Material specific byte checking (bytes 18-34) abstracted to allow 006
validation. 

Added and changed scripts:

-Updated LCSH Changes Parser script, LCSHchangesparser2.txt
(http://home.inwave.com/eija/inprocess/LCSHchangesparser2.txt):
--Adds 500 to tag number if it is 1xx, so that it becomes 600-655, in
preparation for use in global replacement. 
--Misc. fixes.
-lintwithadditionsselective.txt
(http://home.inwave.com/eija/fullrecscripts/lintwithadditionsselective.txt)
--Similar to lintwithadditions, but designed to call only specific check_xxx
functions in either MARC::Lint or MARC::Lintadditions.
--This has been tested only minimally, but may see future use as a basis for
test files.
-- 

As usual, I welcome comments, suggestions, questions, etc.

Thank you for your assistance,

Bryan Baldus
Cataloger
Quality Books, Inc.
[EMAIL PROTECTED]
http://home.inwave.com/eija


RE: Warnings during decode() of raw MARC

2004-08-18 Thread Bryan Baldus
How do you typically do the install? MARC::Record is included at the
ActiveState PPM Repository, so it should do these things on a Windows
platform...assuming nmake or some sort of make variant is being used. 

At home (on the Mac), I just drop the MARC folder in my site_lib folder in
the MacPerl folder (MacPerl adds site_perl to @INC automatically, I
believe). This is after I expand the tar.gz files with Stuffit Expander.
There used to be an installer.pl for Mac, but it no longer works with the
current version of MacPerl (5.8.0a). Since the documentation in MacPerl
seems to indicate that it is not compatible with MakeMaker?, drag-drop
installation seems to be the easiest alternative. To convert line endings, I
use either BBEdit Lite, a 3rd party program, or a script I just wrote that
should convert line endings and change the Type and Creator (to TEXT and
BBEdit).

In Windows, I take the folder from home, convert the line endings from Mac
to DOS, and then drop the MARC folder in C:\Perl\site\lib\. 

This (dragging and dropping) seems to work fine for most stand-alone modules
(where a C compiler is not needed). In some cases, I do look at the
Makefile.PL, for example with MARC::Charset where it was necessary to create
a database file of EastAsian character sets. Of course, once I got that
installed (through drag-dropping), it gave a number of errors (when I ran
the tests), probably because of my operating system (MacOS 9.2.2) not
working well with Unicode?

I do generally try running each of the test files when I first install a new
module, just to make sure they work ok, but I've not usually bothered to
look at how the tests or the Makefile.PL work. This is one reason I haven't
tried to distribute my modules through CPAN.

 
Bryan Baldus
http://home.inwave.com/eija/
(http://home.inwave.com/eija/readme.htm)


MARC error checking with Perl updates and question

2004-08-09 Thread Bryan Baldus
I have once again updated by error checking modules [1], (MARC::)Errorchecks
.pm and (MARC::)Lintadditions.pm. I am running out of new things to check
for, though I do have a few ideas in mind, including attempting to find
miscoded geographical headings and topical headings (e.g. if United States
appears in a 6xx subfield other than 651$a or 6xx$z, it may be miscoded
(though not always), or if Dogs is in 651$a or 6xx$z, it is probably
miscoded), as well as the items in the Current planned in progress tasks
on my site.

I have added a question concerning grep at the end of this message. Thank
you for any assistance you may be able to provide. 

Changes:

(Aug. 8, 2004):

Module updates:

Errorchecks.pm:
Version 1.01: Updated July 20-Aug. 7, 2004. Released Aug. 8, 2004.

-Temporary (or not) workaround for check_bk008_vs_bibrefandindex($record)
and bibliographies.
-Removed variables from some error messages and cleanup of messages.
-Code readability cleanup.
-Added subroutines
--check_240ind1vs1xx($record) -- Reports errors based on whether 240 and 1xx
are both present and first indicator is 1 or 0.
--check_041vs008lang($record) -- Compares first code in subfield 'a' of 041
vs. 008 bytes 35-37.
--check_5xxendingpunctuation($record) -- Looks for final punctuation in
several of the 5xx fields.
--findfloatinghypens($record) -- Looks for space-hyphen-space in each field
(in a list of given fields)
--video007vs300vs538($record) -- In video records, compares 007 values vs.
300 and 538 fields. Limited to VHS, DVD, and Video CD.
--ldrvalidate($record) -- Checks for valid bytes in the user-changable
leader bytes.
--geogsubjvs043($record) -- Reports missing 043 if 651 or 6xx$z is present.
has list of exceptions (e.g. English-speaking countries)
--findemptysubfields($record) -- Looks for empty subfields (e.g.
$x$xPsychology.)

Changed subroutines:
-check_bk008_vs_300:
--added cross-checking for codes a, b, c, g (ill., map(s), port(s)., music)
--added checking for 'p. ' or 'v. ' or 'leaves ' in subfield 'a'
--added checking for 'cm.', 'mm.', 'in.' in subfield 'c'
--revised check for 'm', phono. (which QBI doesn't currently use)
--Added check in check_bk008_vs_bibrefandindex($record) for 'Includes
index.' (or indexes) in 504
---This has a workaround I would like to figure out how to fix

Lintadditions.pm:

version 1.03: Updated July 20-Aug. 7, 2004. Released Aug. 8, 2004.

-Added check_1xx and check_7xx sets.
-Added checks for non-filing indicator in 130, 630, 730, 740 and 830.
-Added indicator check for 700--ind1 == 3 - error.
-Added validation of 041 against MARC Code List for Languages.
-Added check_028 and check_037.
-Removed some variables from warning messages.
-Added check_050.
-Added check_040 (IOrQBI specific).
-Added check_440 and check_490.
-Added check_246.
-Changed check_245 ending punctuation errors based on MARC21 rule change vs.
LCRI 1.0C from Nov. 2003.
-Added check for square brackets in 245 $h.
-Added check for 260 ending punctuation.

Added and changed scripts:

Most of these are test scripts created while writing the subroutines listed
above.

The subroutines in the modules may have code not in the scripts, so it is
best to use the module rather than the script for those checks (the last 3
full record scripts).

Full record:

-fieldsubfieldcounts.txt -- Field and subfield count--will report totals for
each tag and subfield.
--First version: Field tag counts only.
-testnewerrorchecks.txt -- Test script to call new subroutines in
Errorchecks.pm (MARC::Errorchecks).
-ldrvalidatescript.txt -- In Errorchecks.pm
-viddvdvsvhs.txt -- In Errorchecks.pm.
-findemptysubfields.txt -- Looks for empty subfields. Skips 037 in CIP-level
records. In Errorchecks.pm.

Cleanup:
-
-find050doubleperiod.txt -- Test regex for finding pattern in 050$a.
Preliminary code for MARC::Lintadditions::check_050()
-removetitlefromlintrpt.txt -- Removes titles from lintallchecks' output
file.
-findmissing300apunctuation.txt -- Looks for missing period after p or v in
300a extract file. Initial step for
MARC::Errorchecks::check_bk008_vs_300($record) code.

- 
Question:

In the following code, is there a more efficient way to write the grep for
Includes index(es). to get the same result?

### workaround ###
my @indexin504 = grep {$_ =~  /Includes(.*)index(es)?(\.)*/; push
@indexalone, $1.$3; $_ =~ /Includes(.*)index(\.)*/;}
 @fields504;
#look for 'Includes index.' in 504
foreach my $indexalonein504 (@indexalone) {
#report error if have only space between 'Includes' and
'index' (followed by period)
if ($indexalonein504 =~ /^ \.$/) {
push @warningstoreturn, (504: 'Includes index'
should be 500.)
} #if index is alone in 504
} #foreach index alone
--- 
[1] My home page: http://home.inwave.com/eija/

Thank you,

Bryan Baldus
Cataloger
Quality Books, Inc

RE: retain repeatable subfields

2004-08-06 Thread Bryan Baldus
I may be wrong (as I am new to Perl), but I believe $r534-subfield('n') is
being called in a scalar context, so it retrieves only the first instance of
subfield n. Perhaps:


my @subfields = $r534-subfields();
my @newsubfields = ();
#break subfields into code-data array (so the entire field is in one array)
while (my $subfield = pop(@subfields)) {
my ($code, $data) = @$subfield;
unshift (@newsubfields, $code, $data);
} # while

would work better? Then parse the array for the desired subfields?


Please correct me if I am wrong,

Hope this helps,

Bryan Baldus
Cataloger
Quality Books, Inc.
The Best of America's Independent Presses
[EMAIL PROTECTED]


Perl-based MARC record error checking update and questions

2004-06-23 Thread Bryan Baldus
 languageand
country codes.

Version 1.03: Updated June 10, not released.

-Contained many of the changes in 1.04, but 1.04 contains the update to
validate008, so I wanted a new version.


--

[1] My home page: http://home.inwave.com/eija
[2] Link to Errorchecks current version:
http://home.inwave.com/eija/bryanmodules/MARC-Errorchecks-0.95/Errorchecks.p
m.txt
(try http://home.inwave.com/eija/bryanmodules/ if the above fails)
[3] lintallchecks.pl:
http://home.inwave.com/eija/fullrecscripts/lintallchecks.txt
[4] Link to Lintadditions current version: 
http://home.inwave.com/eija/bryanmodules/MARC-Lintadditions-1.01/Linta 
dditions.pm.txt (try http://home.inwave.com/eija/bryanmodules/ if the 
above fails)
[5] Link to BBMARC current version: 
http://home.inwave.com/eija/bryanmodules/MARC-BBMARC-1.04/BBMARC.PM.txt
(try http://home.inwave.com/eija/bryanmodules/ if the above fails)

I welcome any suggestions, questions, and comments (to this address, or to
that listed on my site).

Thank you,

Bryan Baldus
Cataloger
Quality Books Inc.
[EMAIL PROTECTED]
http://home.inwave.com/eija


BBMARC updated

2004-05-03 Thread Bryan Baldus
I have updated my MARC::BBMARC module. The new version is available at
http://home.inwave.com/eija/mac/MARC-BBMARC-1.01/BBMARC.pm.txt
(also http://home.inwave.com/eija/unix/MARC-BBMARC-1.01/BBMARC.pm.txt and
http://home.inwave.com/eija/win/MARC-BBMARC-1.01/BBMARC.pm.txt).

New to this version is validate008, which reads an 008 field (actually a
string of bytes) and reports back any invalid characters, in a tab-separated
scalar reference. It also returns a hash reference containing named
character positions, and a cleaned version of the initial string (probably
not useful, since little or no cleaning occurs in the validate008
subroutine). 

To use the new subroutine, I wrote 008checker.pl.txt (available
http://home.inwave.com/eija/mac/templatified/008checker.pl.txt).

Other changes to BBMARC were minor, and some are listed in a changes section
of the module.

I have also updated my home page with information about changes and planned
projects (http://home.inwave.com/eija/).

I welcome any comments and corrections you may have.

Bryan Baldus
Cataloger
Quality Books, Inc.
The Best of America's Independent Presses
[EMAIL PROTECTED]


RE: unsubsribe

2004-02-25 Thread Bryan Baldus
Subscription information may be found at http://perl4lib.perl.org/, which
states:

To subscribe, unsubscribe, or contribute to the list use one of the
following addresses: 

[EMAIL PROTECTED]
[EMAIL PROTECTED]

Hope this helps,

Bryan Baldus
Cataloger
Quality Books, Inc.
The Best of America's Independent Presses
1-800-323-4241x460
[EMAIL PROTECTED]

-Original Message-
From: Holly Bravender [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, February 25, 2004 10:56 AM
To: [EMAIL PROTECTED]
Subject: unsubsribe


Take me off your list!  Thank you.
Holly Bravender
Reference  Instruction Librarian
Paul V. Galvin Library 
Illinois Institute of Technology
35 W. 33rd Street
Chicago, IL  60616
www.gl.iit.edu
(312) 567-3373
[EMAIL PROTECTED] 



MARC-related scripts and code

2004-01-20 Thread Bryan Baldus
As you may recall, I am a cataloger with limited programming experience, and
I have been teaching myself Perl, using the MARC::Record modules. I have
posted the code I have been working on to my (hastily created) home page, at
http://home.inwave.com/eija/.

One of the files included is BBMARC.pm, which is designed to go in the MARC
folder/directory of MARC::Record. This file contains a number of
subroutines, including validate007() for checking that each byte of an 007
is within the range of valid values and that there is not extra data after
the format's limit. I believe the section on Motion Pictures is unfinished
(we don't have any, so I didn't go to the trouble of updating the section),
but the logic should follow that in Electronic Resources. 

Please send me any comments you might have.

Thank you,

Bryan Baldus
Cataloger
Quality Books, Inc.
The Best of America's Independent Presses
[EMAIL PROTECTED]