Re: sending marc records into a script that uses MARC::Batch
I think you have to check for warnings as you read each record, so try moving your error handing code right after the batch->next() call. But Robin's suggestion is good advice, and is probably a more robust way to handle the crud that can show up in a file of marc records. -Tim On Fri, May 30, 2014 at 5:20 AM, Stefano Bargioni wrote: > If I'm not wrong, > $batch->strict_off(); > will avoid your loop to print warnings and stop processing records. > HTH. Stefano > > On 29/mag/2014, at 23.13, John E Guillory wrote: > > Thanks Timothy for your help. > > > > When processing about 5 million records I would expect some crazy records. > The new script (incorporating Timothy’s suggestions) exited prematurely on > record 85,877 with: “Warnings detected: Entirely empty subfield found in > tag 260”. I know 260 is publication stuff but it’s not “required”. I’m > deliberately printing warnings but again the script exited prematurely. > > > > Thanks for assistance. > > John > > > > > > > > > > > > > > *From:* Timothy Prettyman [mailto:timo...@umich.edu] > *Sent:* Thursday, May 29, 2014 11:23 AM > *To:* John E Guillory > *Cc:* perl4lib@perl.org > *Subject:* Re: sending marc records into a script that uses MARC::Batch > > > > For your first question, instead of: > > > > $batch = MARC::Batch->new(‘USMARC’,); > > > > use: > > > > $batch = MARC::Batch->new(‘USMARC’,STDIN); > > > > For your second, the error is likely caused when a field you're using > as_string() on doesn't exist in the record. > > > > So, you could do something like the following: > > > > $field = $record->field('008'); > > $field or do { # check for > existence of field > >print "no 008 field for record\n";# no field > >next; # skip the field > (or whatever) > > }; > > $field_008 = $field->as_string(); > > > > Hope this helps > > > > -Tim > > > > Timothy Prettyman > > LIT/Library Systems > > University of Michigan > > > > On Thu, May 29, 2014 at 12:08 PM, John E Guillory wrote: > > Hello, > > Two questions please: > > > > 1. I’ve written a script that opens a marc file for reading using > this syntax: > > > > $file = $ARGV[0]; > > $batch = MARC::Batch->new('USMARC',$file); > > > > It then loops thru the records using this syntax: > > while ( $record = $batch->next()) { > > …..check position 6, 7 of leader and position 23 of 008 and make > some changes > > } > > > > This works great. However, instead of accessing the file this way, I want > to pipe the output of a previously run marc dump command directly into this > script via the pipe. > > I understand that this can be done using this syntax:while ($line > =){ …}, but I don’t understand how to use that STDIN with > “MARC::Batch->new(‘USMARC’,$file);”This does not work:$batch = > MARC::Batch->new(‘USMARC’,); > > > > 2. My current script successfully reads and processes a marc file of > over 5 gigs!but exits entirely on record 160,585 with the error from > MARC::Batch, “Can't call method "as_string" on an undefined value at ./ > marc_batch.pl”. Documentation on using MARC::Batch says that to tell it > to continue processing even when errors are encountered one should use > strict_off(), then print/report warnings at the bottom of the script. I > don’t think my particular error is being handled by the strict_off() > setting. Doesn’t anybody know what causes/how to fix “Can’t call method > as_string?” error? Full script below—it’s pretty short, thanks to > MARC::Batch. > > > > Thanks for ensights! > > > > > > use MARC::Batch; > > > > $file = $ARGV[0]; > > chomp($file); > > > > $batch = MARC::Batch->new('USMARC',$file); > > $batch->strict_off();# otherwise script exits when encounters errors > > > > open(OUT,'>new_marc'); > > > > while ( $record = $batch->next()) { > > $leader= $record->leader(); > > $leader_pos_6 = substr($leader,6,1); > > $leader_pos_7 = substr($leader,7,1); > > > > $field = $record->field('008'); > > $field_008 = $field->as_string(); > > $field_008_position_23 = substr($field_008,23,1); > > >
Re: sending marc records into a script that uses MARC::Batch
For your first question, instead of: $batch = MARC::Batch->new(‘USMARC’,); use: $batch = MARC::Batch->new(‘USMARC’,STDIN); For your second, the error is likely caused when a field you're using as_string() on doesn't exist in the record. So, you could do something like the following: $field = $record->field('008'); $field or do { # check for existence of field print "no 008 field for record\n";# no field next; # skip the field (or whatever) }; $field_008 = $field->as_string(); Hope this helps -Tim Timothy Prettyman LIT/Library Systems University of Michigan On Thu, May 29, 2014 at 12:08 PM, John E Guillory wrote: > Hello, > > Two questions please: > > > > 1. I’ve written a script that opens a marc file for reading using > this syntax: > > > > $file = $ARGV[0]; > > $batch = MARC::Batch->new('USMARC',$file); > > > > It then loops thru the records using this syntax: > > while ( $record = $batch->next()) { > > …..check position 6, 7 of leader and position 23 of 008 and make > some changes > > } > > > > This works great. However, instead of accessing the file this way, I want > to pipe the output of a previously run marc dump command directly into this > script via the pipe. > > I understand that this can be done using this syntax:while ($line > =){ …}, but I don’t understand how to use that STDIN with > “MARC::Batch->new(‘USMARC’,$file);”This does not work:$batch = > MARC::Batch->new(‘USMARC’,); > > > > 2. My current script successfully reads and processes a marc file of > over 5 gigs!but exits entirely on record 160,585 with the error from > MARC::Batch, “Can't call method "as_string" on an undefined value at ./ > marc_batch.pl”. Documentation on using MARC::Batch says that to tell it > to continue processing even when errors are encountered one should use > strict_off(), then print/report warnings at the bottom of the script. I > don’t think my particular error is being handled by the strict_off() > setting. Doesn’t anybody know what causes/how to fix “Can’t call method > as_string?” error? Full script below—it’s pretty short, thanks to > MARC::Batch. > > > > Thanks for ensights! > > > > > > use MARC::Batch; > > > > $file = $ARGV[0]; > > chomp($file); > > > > $batch = MARC::Batch->new('USMARC',$file); > > $batch->strict_off();# otherwise script exits when encounters errors > > > > open(OUT,'>new_marc'); > > > > while ( $record = $batch->next()) { > > $leader= $record->leader(); > > $leader_pos_6 = substr($leader,6,1); > > $leader_pos_7 = substr($leader,7,1); > > > > $field = $record->field('008'); > > $field_008 = $field->as_string(); > > $field_008_position_23 = substr($field_008,23,1); > > > > if ( ($leader_pos_6 eq "a") && ($leader_pos_7 eq "m") && > ($field_008_position_23 eq "o") || ($field_008_position_23 eq "s") ) { > > > >$control_num= $record->field('001'); > >$control_num= $control_num->as_string(); > > > >print "008 position 23: $field_008_position_23 \n"; > >print "OLD leader: $leader \n"; > >$old_leader = $leader; > >substr($leader,6,1) = 'm'; > >print "NEW leader: $leader \n"; > > > >print OUT $record->as_usmarc(); > > print "$control_num|$old_leader|$leader|$field_008\n"; > > > > } else { # not a match so just print this one unchanged… > >print OUT $record->as_usmarc(); > > } > > > > } > > > > # handles errors: > > if (@warnings = $batch->warnings()) { > > print "\n Warnings detected: \n", @warnings; > > } > > > > close(OUT); > > close(LOG); > > > > > > > > John Guillory > > Louisiana Library Network > > 225.578.3758 > > >
Re: reading and writing of utf-8 with marc::batch
Do your records have the utf8 encoding byte set in the LDR? (Byte 9 should be 'a' for utf8). -Tim Timothy Prettyman University of Michigan LIbrary/LIT On Tue, Mar 26, 2013 at 4:22 PM, Eric Lease Morgan wrote: > > For the life of me I can't figure out how to do reading and writing of > UTF-8 with MARC::Batch. > > I have a UTF-8 encoded file of MARC records. Dumping the records and > greping for a particular string illustrates the validity: > > $ marcdump und.marc | grep Sainte-Face > und.marc > 1000 records > 2000 records > 3000 records > 4000 records > 5000 records > 6000 records > 7000 records > 8000 records > 9000 records > 1 records > 11000 records > 12000 records > 245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face > 610 20 _aArchiconfrérie de la Sainte-Face > 13000 records > $ > > I then run a Perl script that simply reads each record and dumps it to > STDOUT. Notice how I define both my input and output as UTF-8: > > #!/shared/perl/current/bin/perl > > # configure > use constant MARC => './und.marc'; > > # require > use strict; > use MARC::Batch; > > # initialize > binmode ( MARC, ":utf8" ); > my $batch = MARC::Batch->new( 'USMARC', MARC ); > $batch->strict_off; > $batch->warnings_off; > binmode( STDOUT, ":utf8" ); > > # read & write > while ( my $marc = $batch->next ) { print $marc->as_usmarc } > > # done > exit; > > But my output is munged: > > $ ./marc.pl > und.mrc > $ marcdump und.mrc | grep Sainte-Face > und.mrc > 1000 records > 2000 records > 3000 records > 4000 records > 5000 records > 6000 records > 7000 records > 8000 records > 9000 records > 1 records > 11000 records > 12000 records > 245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face > 610_aArchiconfrérie de la Sainte-Face > 13000 records > $ > > What am I doing wrong!? > > -- > Eric Lease Morgan > University of Notre Dame > > 574/631-8604 > > > >
Re: Deleting a subfield using MARC::Record
+1 I like Leif's proposal. It also might be useful to allow "code" to accept multiple values. -Tim Timothy Prettyman LIT/LIbrary Systems University of Michigan On May 1, 2006, at 5:41 PM, Leif Andersson wrote: +1 "count" can possibly be complemented or replaced with occurrence as suggested. It'd be nice to be able to denote last occurrence [-1]. And I suppose the indexing should be based on ordinary perl subscript indexing - i.e. governed by the value of special variable $[ $field->delete_subfield( code => $code, # of course occur => [0,2,3], # "occur" or "pos" or whatever... match => qr/pat/, # doesn't need to be repeatable ); Leif == Leif Andersson, Systems Librarian Stockholm University Library SE-106 91 Stockholm SWEDEN Phone : +46 8 162769 Mobile: +46 70 6904281
Re: deconstruct and reconstruct a field
In your foreach loop, you write a new 534 for every 500 that you find. The first one has the subfield a from it, the 2nd one has a subfield a from the first and second, etc. What you really want to do is something like: if (@f500s) { my @subfields = (); foreach my $f500 (@f500s) { $f500->subfield('a')) and do { push (@subfields, 'n', $f500->subfield('a')); $record->delete_field($f500); }; } (@subfields) and do { my $new534 = MARC::Field->new('534', '', '', @subfields); $record->insert_fields_after($f500, $new534); }; } Tim Prettyman University of Michigan Library --On Tuesday, August 3, 2004 10:24 AM -0400 Jackie Shieh <[EMAIL PROTECTED]> wrote: I am deconstructing multiple 500 note fields and putting the data in subfield n of 534 (Notes to original version). When there are more than one 500 notes, this script creates two 534 fields, one with the first previously 500 note field in subfield n, and the other 534 field with two 500 note fields. Instead of one single 534 with as many subfield n's corresponding to the original 500 note fields. Can someone help decipher this for me? Where did it go wrong?! ## if present, put data in variable to use in 534$n if ( @f500s ) { my @subfields = (); ## examine each one foreach my $f500 (@f500s) { my $n534=$f500->subfield('a'); push(@subfields, 'n', $n534); ##begin to put 500 a to 534 n my $new534 = MARC::Field->new( '534','','', @subfields); $record->insert_fields_after($f500, $new534); $record->delete_field($f500); } } # if 500 is not repsent, create a brand new 534 else { my $new534 = MARC::Field->new( '534', '','', 'p', 'Transcribed from: '); $record->insert_fields_after($f260, $new534); } Input file contains: 500_a"London : Spottiswoodes and Shaw"--T.p. verso. 500_a"Notes" (p. [31]-39) contain extracts from Sir J. Malcolm's "Central India" regarding Ahalya Baee. Output file results: 534_n"London : Spottiswoodes and Shaw"--T.p. verso. 534_n"London : Spottiswoodes and Shaw"--T.p. verso. _n"Notes" (p. [31]-39) contain extracts from Sir J. Malcolm's "Central India" regarding Ahalya Baee. = Thanks for any help!! Best, --Jackie | Jackie Shieh | Special Projects & Collections Team | Harlan Hatcher Graduate Library | University of Michigan | 920 North University | Ann Arbor, MI 48109-1205 | Phone: 734.936.2401 FAX: 734.615.9788 | E-mail: [EMAIL PROTECTED]
Re: Net::Z3950 and diacritics
(I'm sending this again, because I think my formatted record may have gotten messed up in the process of being cut/pasted. My aplogies.) I don't see how you can get a result for your search if you're using @attr 1=7. 7 is the USE attribute for an ISBN search, and your term is the local system number, I think (use attribute=12) When I do that search (@attr 1=12 3118006) against the LC bib file, using Net::Z3950 in a program essentially the same as yours, the USMARC record returned is 860 bytes long. Here's a formatted dump of the record: LDR 00860nam 22002531 4500 0013118006 0051974041700.0 008731207s1967nyuabf b000 0beng 035|9(DLC) 67029856 906|a7|bcbc|corignew|du|eocip|f19|gy-gencatlg 010|a 67029856 040|aDLC|cDLC|dDLC 050 00 |aND588.D9|bR85 082 00 |a759.3 100 1 |aRussell, Francis,|d1910- 245 14 |aThe world of D?urer, 1471-1528,|cby Francis Russell and the editors of Time-Life Books. 260|aNew York,|bTime, inc.|c[1967] 300|a183 p.|billus., maps, col. plates.|c32 cm. 490 0 |aTime-Life library of art 504|aBibliography: p. 177. 600 10 |aD?urer, Albrecht,|d1471-1528. 710 2 |aTime-Life Books. 991|bc-GenColl|hND588.D9|iR85|tCopy 1|wBOOKS 991|bc-GenColl|hND588.D9|iR85|p00034015107|tCopy 2|wCCF The "?" in the 245 and 600 fields are 0xE8, the MARC-8 code for combining umlaut/diaeresis. It's puzzling that the record you got has a different length--not sure what's going on there. Tim Prettyman University of Michigan Library # define sum constants my $DATABASE = 'voyager'; my $SERVER = 'z3950.loc.gov'; my $PORT = '7090'; # create a LOC (Voyager) 001 query my $query = "[EMAIL PROTECTED] 1=7 3118006"; # create a z39.50 object my $z3950 = Net::Z3950::Manager->new(databaseName => $DATABASE); # assign the object some z39.50 characteristics $z3950->option(elementSetName => "f"); $z3950->option(preferredRecordSyntax => Net::Z3950::RecordSyntax::USMARC); # connect to the server and check for success my $connection = $z3950->connect($SERVER, $PORT); # search my $results = $connection->search($query); # get the found record and turn it into a MARC::Record object my $record = $results->record(1); $record = MARC::Record->new_from_usmarc($record->rawdata()); # create a file name my $id = time; # write the record open MARC, "> $id.marc"; print MARC $record->as_usmarc; close MARC; This process works just fine for records that contain no diacritics, but when diacritics are in the records extra characters end up in my saved files, like this: 00901nam 22002651 ^^^ 45100080005001780080041000250350021000669060045000870 1000170013204000180014905000180016708200100018512900195245009 200224260003400316347003504900029003975040026004266340045 27100021004869910044005079910055005510990029006063118006 1974041700.0731207s1967nyuabf b000 0beng 9(DLC) 67029856 a7bcbccorignewdueocipf19gy-gencatlg a 67029856 aDLCcDLCdDLC00aND588.D9bR8500a759.31 aRussell, Francis,d1910-14aThe world of D®urer, ^^^ 1471-1528,cby Francis Russell and the editors of Time-Life Books. aNew York,bTime, inc.c[1967] a183 p.billus., maps, col. plates.c32 cm.0 aTime-Life library of art aBibliography: p. 177.10aD®urer, Albrecht,d1471-1528.2 ^^^ aTime-Life Books. bc-GenCollhND588.D9iR85tCopy 1wBOOKS bc-GenCollhND588.D9iR85p00034015107tCopy 2wCCF arussell-world-1071495663 Notice how Dürer got munged into D®urer, twice, and consequently the record length is not 901 but 903 instead. Some people say I must be sure to request a specific character set from the LOC when downloading my MARC records, specifically MARC-8 or MARC-UCS. Which one of these character sets do I want and how do I tell the remote database which one I want? -- Eric "The Ugly American Who Doesn't Understand Diacritics" Morgan University Libraries of Notre Dame (574) 631-8604
Re: Net::Z3950 and diacritics
I don't see how you can get a result for your search if you're using @attr 1=7. 7 is the USE attribute for an ISBN search, and your term is the local system number, I think (use attribute=12) When I do that search (@attr 1=12 3118006) against the LC bib file, using Net::Z3950 in a program essentially the same as yours, the USMARC record returned is 860 bytes long. Here's a formatted dump of the record: ?L?D?R? ?0?0?8?6?0?n?a?m? ? ?2?2?0?0?2?5?3?1? ? ?4?5?0?0? ?0?0?1? ? ? ? ?3?1?1?8?0?0?6? ?0?0?5? ? ? ? ?1?9?7?4?0?4?1?7?0?0?0?0?0?0?.?0? ?0?0?8? ? ? ? ?7?3?1?2?0?7?s?1?9?6?7? ? ? ? ?n?y?u?a?b?f? ? ? ?b? ? ? ? ?0?0?0? ?0?b?e?n?g? ? ? ?0?3?5? ? ? ? ?|?9?(?D?L?C?)? ? ? ?6?7?0?2?9?8?5?6? ?9?0?6? ? ? ? ?|?a?7?|?b?c?b?c?|?c?o?r?i?g?n?e?w?|?d?u?|?e?o?c?i?p?|?f?1?9?|?g?y?-?g?e?n? c?a?t?l?g? ?0?1?0? ? ? ? ?|?a? ? ? ?6?7?0?2?9?8?5?6? ? ?0?4?0? ? ? ? ?|?a?D?L?C?|?c?D?L?C?|?d?D?L?C? ?0?5?0? ?0?0? ?|?a?N?D?5?8?8?.?D?9?|?b?R?8?5? ?0?8?2? ?0?0? ?|?a?7?5?9?.?3? ?1?0?0? ?1? ? ?|?a?R?u?s?s?e?l?l?,? ?F?r?a?n?c?i?s?,?|?d?1?9?1?0?-? ?2?4?5? ?1?4? ?|?a?T?h?e? ?w?o?r?l?d? ?o?f? ?D?u?r?e?r?,? ?1?4?7?1?-?1?5?2?8?,?|?c?b?y? ?F?r?a?n?c?i?s? ?R?u?s?s?e?l?l? ?a?n?d? ?t?h?e? ?e?d?i?t?o?r?s? ?o?f? ?T?i?m?e?-?L?i?f?e? ?B?o?o?k?s?.? ?2?6?0? ? ? ? ?|?a?N?e?w? ?Y?o?r?k?,?|?b?T?i?m?e?,? ?i?n?c?.?|?c?[?1?9?6?7?]? ?3?0?0? ? ? ? ?|?a?1?8?3? ?p?.?|?b?i?l?l?u?s?.?,? ?m?a?p?s?,? ?c?o?l?.? ?p?l?a?t?e?s?.?|?c?3?2? ?c?m?.? ?4?9?0? ?0? ? ?|?a?T?i?m?e?-?L?i?f?e? ?l?i?b?r?a?r?y? ?o?f? ?a?r?t? ?5?0?4? ? ? ? ?|?a?B?i?b?l?i?o?g?r?a?p?h?y?:? ?p?.? ?1?7?7?.? ?6?0?0? ?1?0? ?|?a?D??u?r?e?r?,? ?A?l?b?r?e?c?h?t?,?|?d?1?4?7?1?-?1?5?2?8?.? ?7?1?0? ?2? ? ?|?a?T?i?m?e?-?L?i?f?e? ?B?o?o?k?s?.? ?9?9?1? ? ? ? ?|?b?c?-?G?e?n?C?o?l?l?|?h?N?D?5?8?8?.?D?9?|?i?R?8?5?|?t?C?o?p?y? ?1?|?w?B?O?O?K?S? ?9?9?1? ? ? ? ?|?b?c?-?G?e?n?C?o?l?l?|?h?N?D?5?8?8?.?D?9?|?i?R?8?5?|?p?0?0?0?3?4?0?1?5?1? 0?7?|?t?C?o?p?y? ?2?|?w?C?C?F The "?" in the 245 and 600 fields are 0xE8, the MARC-8 code for combining umlaut/diaeresis. It's puzzling that the record you got has a different length--not sure what's going on there. Tim Prettyman University of Michigan Library # define sum constants my $DATABASE = 'voyager'; my $SERVER = 'z3950.loc.gov'; my $PORT = '7090'; # create a LOC (Voyager) 001 query my $query = "[EMAIL PROTECTED] 1=7 3118006"; # create a z39.50 object my $z3950 = Net::Z3950::Manager->new(databaseName => $DATABASE); # assign the object some z39.50 characteristics $z3950->option(elementSetName => "f"); $z3950->option(preferredRecordSyntax => Net::Z3950::RecordSyntax::USMARC); # connect to the server and check for success my $connection = $z3950->connect($SERVER, $PORT); # search my $results = $connection->search($query); # get the found record and turn it into a MARC::Record object my $record = $results->record(1); $record = MARC::Record->new_from_usmarc($record->rawdata()); # create a file name my $id = time; # write the record open MARC, "> $id.marc"; print MARC $record->as_usmarc; close MARC; This process works just fine for records that contain no diacritics, but when diacritics are in the records extra characters end up in my saved files, like this: 00901nam 22002651 ^^^ 45100080005001780080041000250350021000669060045000870 1000170013204000180014905000180016708200100018512900195245009 200224260003400316347003504900029003975040026004266340045 27100021004869910044005079910055005510990029006063118006 1974041700.0731207s1967nyuabf b000 0beng 9(DLC) 67029856 a7bcbccorignewdueocipf19gy-gencatlg a 67029856 aDLCcDLCdDLC00aND588.D9bR8500a759.31 aRussell, Francis,d1910-14aThe world of D®urer, ^^^ 1471-1528,cby Francis Russell and the editors of Time-Life Books. aNew York,bTime, inc.c[1967] a183 p.billus., maps, col. plates.c32 cm.0 aTime-Life library of art aBibliography: p. 177.10aD®urer, Albrecht,d1471-1528.2 ^^^ aTime-Life Books. bc-GenCollhND588.D9iR85tCopy 1wBOOKS bc-GenCollhND588.D9iR85p00034015107tCopy 2wCCF arussell-world-1071495663 Notice how Dürer got munged into D®urer, twice, and consequently the record length is not 901 but 903 instead. Some people say I must be sure to request a specific character set from the LOC when downloading my MARC records, specifically MARC-8 or MARC-UCS. Which one of these character sets do I want and how do I tell the remote database which one I want? -- Eric "The Ugly American Who Doesn't Understand Diacritics" Morgan University Libraries of Notre Dame (574) 631-8604