MARC::File::XML 0.88
Hi all, Just a quick announcement regarding an update of MARC::File::XML to version 0.88. From the Changelog: 0.88 Wed Nov 28 2007 - String test for subfield code to avoid dropping $0 (Galen Charlton) 0.88_1 Tue Oct 23 2007 - Fixed a typo in M::F::X that could be the origin of the test failure(miker) - Removed some useless (and confusing) code that throws away some character set mapping information in the 066 (miker) Hopefully, this version will fix the install probs that some have had with 0.87 ... if not, let us know. Cheers, -- Joshua Ferraro SUPPORT FOR OPEN-SOURCE SOFTWARE President, Technology migration, training, maintenance, support LibLimeFeaturing Koha Open-Source ILS [EMAIL PROTECTED] |Full Demos at http://liblime.com/koha |1(888)KohaILS
Re: More Fun With MARC::File::XML: Solutions
Hi everyone, Just providing an update on this issue. As you may recall, I've been putting the MARC::Record suite, specifically MARC::File::XML and MARC::Charset, through some fairly rigourous tests, including a 'roundtrip' test, which converts the binary MARC-8 records to MARCXML / UTF-8 and then back to binary MARC but encoded as UTF-8. This test is available here: http://liblime.com/public/roundtrip.pl I discovered a number of bugs or issues, not in the MARC::* stuff, but in the back-end SAX parsers. I'll just summarize my discoveries here for posterity: 1. MARC::File::XML, if it encounteres unmapped encoding in a MARC-8 encoded binary MARC file (in as_xml()) will drop the entire subfield where the improper encoding exists. The simple solution is to always use: MARC::Charset-ignore_errors(1); if you expect your data will have improper encoding. 2. the XML::SAX::PurePerl parser cannot properly handle combining characters. I've reported this bug here: http://rt.cpan.org/Public/Bug/Display.html?id=19543 At the suggestion of several, I tried replacing my default system parser with expat, which cause another problem: 3. handing valid UTF-8 encoded XML to new_from_xml() sometimes causes the entire record to be destroyed when using XML::SAX::Expat as the parser (with PurePerl these seem to work). It fails with the error: not well-formed (invalid token) at line 23, column 43, byte 937 at /usr/lib/perl5/XML/Parser.pm line 187 I haven't been able to track the cause of this bug, I eventually found a workaround that didn't result in the above error, but instead, silently mangled the resulting binary MARC record on the way out: 4. Using incompatible version of XML::SAX::LibXML and libxml2 will cause binary MARC records to be mangled when passed through new_from_xml() in some cases. The solution here is to make sure you're running compatible versions of XML::SAX::LibXML and libxml2. I run Debian Sarge and when I just used the package maintainer's versions it fixed the bug. It's unclear to me why the binary MARC would be mangled, this may indicate a problem with MARC::* but I haven't had time to track it down and since installing compatible versions of the parser back-end solves the problem I can only assume it's the fault of the incompatible parsers. Issues #3 and #4 above can be replicated following batch of records through the roundtrip.pl script above: http://liblime.com/public/several.mrc If you want to test #2, try running this record through roundtrip.pl: http://liblime.com/public/combiningchar.mrc BTW: you can change your default SAX parser by editing the .ini file ... mine is located in /usr/local/share/perl/5.8.4/XML/SAX/ParserDetails.ini So the bottom line is, if you want to use MARC::File::XML in any serious application, you've got to use compatible versions of the libxml2 parser and XML::SAX::LibXML. Check the README in the perl package for documentation on which are compatible... Maybe a note somewhere in the MARC::File::XML documentation to point these issues out would be useful. Also, it wouldn't be too bad to have a few tests to make sure that the system's default SAX parser is capable of handling these cases. Just my two cents. Cheers, -- Joshua Ferraro VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE President, Technology migration, training, maintenance, support LibLimeFeaturing Koha Open-Source ILS [EMAIL PROTECTED] |Full Demos at http://liblime.com/koha |1(888)KohaILS
More Fun With MARC::File::XML
Hi everyone, Thanks for all the help thusfar ... Things are running much smoother since I installed XML::SAX::Expat. However, I'm still having a problem I haven't been able to work around. I have the following batch of 5 records: http://liblime.com/public/several.mrc They don't throw any errors in marcdump. I run them through roundtrip.pl (http://liblime.com/public/roundtrip.pl) which has a new feature ... if the new_from_xml fails, it dumps both binary and xml into two files. So ... running roundtrip on the above records dies (at least on both of the linux boxes I'm working on ) like this: $ ./roundtrip.pl several.mrc several.utf8.mrc error.xml error.mrc #4 has a problem: at ./roundtrip.pl line 30. not well-formed (invalid token) at line 23, column 43, byte 937 at /usr/lib/perl5/XML/Parser.pm line 187 When I run marcdump on error.mrc it throws an error. In several.mrc, record #4 I see B9 (british pound sign) which is in the LOC codetables: code marcB9/marc ucs00A3/ucs utf-8C2A3/utf-8 nameBRITISH POUND / POUND SIGN/name /code Looking at error.xml, line 23, column 43 I see hex value C2 followed by A3 -- Zvon has them as: http://www.zvon.org/other/charSearch/PHP/search.php?request=c2searchType=3 http://www.zvon.org/other/charSearch/PHP/search.php?request=a3searchType=3 I have no idea why C2 is in there ... C2 is not in the codetables for the UTF-8. A3 is the correct replacement for B9, so that's all good. I have MARC::Charset-ignore_errors(1); set so I would expect any encoding problems to warn me as before, and then just continue on... so I suspect it might be another system configuration problem or some other problem with the source record that I just haven't been able to spot ... any suggestions? Thanks, -- Joshua Ferraro VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE President, Technology migration, training, maintenance, support LibLimeFeaturing Koha Open-Source ILS [EMAIL PROTECTED] |Full Demos at http://liblime.com/koha |1(888)KohaILS
Re: Possible bug in XML::SAX causing new_from_xml() to croak
Hi all, Right ... I was using the PurePerl parser. As soon as I installed XML::SAX::Expat the combining characters problem went away. Thanks! -- Joshua Ferraro VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE President, Technology migration, training, maintenance, support LibLimeFeaturing Koha Open-Source ILS [EMAIL PROTECTED] |Full Demos at http://liblime.com/koha |1(888)KohaILS On Fri, May 19, 2006 at 09:57:34PM -0500, Edward Summers wrote: On May 19, 2006, at 7:59 PM, Joshua Ferraro wrote: I've attached a small script that reproduces the same error we're getting in the new_from_xml() method. Try it out and see what it does for you. Works ok for me, at least it doesn't crash :-) So ... Is there a workaround that we can use to fix the MARC::Record suite so that it won't crash when dealing with records that have combining characters ... shall I bug the maintainer? While harvesting OAI data in Perl I've routinely had the LibXML parser fail. Do you happen to know what underlying parser you are using? You can check by: use XML::SAX::ParserFactory; $parser = XML::SAX::ParserFactory-new(); print $parser; I've found XML::SAX::Expat to be much more reliable. //Ed
Re: MARC Records, XML, and encoding
Hi all, Here is an OCLC record: http://liblime.com/public/oclc1.dat I feed it into the as_xml method and I get what appears to be valid XML: http://liblime.com/public/oclc1.xml When I take that xml and feed it to the new_from_xml method and print it to a file I get the error: Cannot decode string with wide characters at /usr/local/lib/perl/5.8.4/Encode.pm line 188. The script is killed and nothing gets written to disk. I realize this problem may have to do with improper encoding in the original binary MARC file, but apparantly, there are lots of these records in circulation (I've got about a dozen from OCLC), meaning there are cataloging clients out there that can edit these records and ILSes that can import/export them. Ideally, the MARC::Record suite would also be able to gracefully handle these records. So ... any suggestions for tracking down this problem? ... and what about ideas for handling these records 'in the wild' that have some encoding problems... what do other MARC libraries do? Cheers, -- Joshua Ferraro VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE President, Technology migration, training, maintenance, support LibLimeFeaturing Koha Open-Source ILS [EMAIL PROTECTED] |Full Demos at http://liblime.com/koha |1(888)KohaILS
Re: MARC Records, XML, and encoding
Hi Ed, Interesting ... when I run marcdump I get: Recs Errs Filename - - 192 0 sample.mrc Here's the file posted on a web server (maybe a problem with the list truncating the attachment?): http://liblime.com/public/sample.mrc Could you try downloading from there and running marcdump again? Thanks, -- Joshua Ferraro VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE President, Technology migration, training, maintenance, support LibLimeFeaturing Koha Open-Source ILS [EMAIL PROTECTED] |Full Demos at http://liblime.com/koha |1(888)KohaILS On Thu, May 18, 2006 at 10:00:19AM -0500, Edward Summers wrote: On May 18, 2006, at 6:48 AM, Joshua Ferraro wrote: Anyway, if anyone can shed some light on this I'd be grateful. I believe the data loss you are seeing is due to your source records-- not to do with character translation. Just running marcdump on them generates a ton of errors (see below). I don't have the time to investigate what the errors are with your MARC records found in the wild...but if you have the time to look at the errors, and have suggestions on how to improve MARC::Record's handling of them please send your suggestions here. Recs Errs Filename - - 19267 sample.mrc //Ed
Re: MARC Records, XML, and encoding
Thanks everyone for the help thusfar. Ed and I have been chatting on code4lib ... it seems there are two problems. One is with the 9C character, which I now have a workaround for. I added the following to Charset.pm line 151: if ($marc8 =~ /\x{9C}/) { $utf8 .= ' '; $index +=1; next CHAR_LOOP; } It's not ideal, but it gets rid of that problem well enough for me. The next problem happens with the following record (number 54 in the original batch I posted): http://liblime.com/public/prob2.mrc When I run the roundtrip conversion script I get the following error: Cannot decode string with wide characters at /usr/local/lib/perl/5.8.4/Encode.pm line 188. This time, the script just dies completely and nothing is written to disk. The record passes marcdump's tests. Ed, I'm still waiting for SF to update so I can nab that test script. In the meantime, any ideas how to track this one down? Cheers, -- Joshua Ferraro VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE President, Technology migration, training, maintenance, support LibLimeFeaturing Koha Open-Source ILS [EMAIL PROTECTED] |Full Demos at http://liblime.com/koha |1(888)KohaILS On Thu, May 18, 2006 at 11:16:52AM -0500, Edward Summers wrote: So I got curious (thanks to your convo in #code4lib). I isolated the problem to one record: http://www.inkdroid.org/tmp/one.dat Your roundtrip conversion complains: -- no mapping found at position 8 in Price : 9c 7.99;Inv.# B 476913;Date 06/03/98; Supplier : Dawson UK; Recd 20/03/98; Contents : 1. The problem : 1. Don't bargain over positions; 2. The method : 2. Separate the people from the problem; 3. Focus on interests, not positions; 4. Invent options for mutual gain; 5. Insist on using objective criteria; 3. Yes, but : 6. What if they are more powerful? 7. What if they won't play? 8. What if they use dirty tricks? 4. In conclusion; 5. Ten questions people ask about getting to yes; g0=ASCII_DEFAULT g1=EXTENDED_LATIN at /usr/local/lib/perl5/site_perl/5.8.7/MARC/ Charset.pm line 126. -- So I took a look at that position in the marc record and found a 0x9C character at that position, as the error message indicates. I can't find a 0x9C in either of the mapping tables that this record purports to use: BasicLatin (ASCII): http://lcweb2.loc.gov/cocoon/codetables/42.html Extended Latin (ANSEL): http://lcweb2.loc.gov/cocoon/codetables/45.html Looks like you might want to preprocess those records before translating. Since this character routinely occurs in the 586 field you could use MARC::Record to remove the offending character before writing as XML. Hope that helps somewhat. This character conversion stuff is a major pain. //Ed
Re: Code For Web Based MARC Creation
Hi Aaron, You may also want to take a look at Koha's MARC editor. It's form-based, has support for plugins and authorized values, and allows specification of 'MARC Frameworks' for cataloging different types of materials, etc. Through the frameworks, you can hide or display the MARC tags/subfields you want to show up in a given editor 'template'. If you want to try it out, you can use the LibLime demos: http://koha.liblime.com/cgi-bin/koha/acqui.simple/addbiblio.pl To try out the plugins, click on the '...' on the right-hand side of some of the fields (except for the 005, for that all you need to do is click in the input box and it will update the timestamp). There's really no limit to the number of plugins you can set up, the demo just has a sampling of a few of them. Likewise with the number of tags/subfields to dispay in the template. As far as the backend, there is a MARC Frameworks database in MySQL that is used to keep track of which tags/subfields should be visible in a given template. When the form is submitted, it is turned into a MARC::XML::File object, then converted to a MARC::Record object and finally, fed to Koha's import routines. I should mention that the current version doesn't support repeatable subfields and subfield reordering, though the backend-framework does. Those features will be in 2.2.6 however, which should be out in about a month (the 2.2.6 MARC editor will also probably support indicator plugins ...). Cheers, -- Joshua Ferraro VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE President, Technology migration, training, maintenance, support LibLimeFeaturing Koha Open-Source ILS [EMAIL PROTECTED] |Full Demos at http://liblime.com/koha |1(888)KohaILS On Tue, Feb 28, 2006 at 08:40:36PM -0500, Aaron Huber wrote: Hi All, I would like to make a CGI form that will create MARC records using MARC::Record. Does anyone know of a project already doing somehting like this or know of somewhere where I can see code? Thanks, Aaron
Re: [Zebralist] MARC and utf-8 question
Hi Paul, This is a problem with the CPAN version of MARC::Record. For utf-8 outside the normal ascii range it's not calculating the directory offsets properly. If you upgrade MARC::Record to the sourceforge version: http://sourceforge.net/project/showfiles.php?group_id=1254 the problem will just 'go away'. (you can test this by unpacking the sourceforge version to a local dir, and adding a 'use lib' line pointing to it before you overwrite the CPAN stuff in your perl4lib) Cheers, -- Joshua Ferraro VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE President, Technology migration, training, maintenance, support LibLimeFeaturing Koha Open-Source ILS [EMAIL PROTECTED] |Full Demos at http://liblime.com/koha |1(888)KohaILS On Tue, Feb 14, 2006 at 05:50:25PM +0100, Paul POULAIN wrote: I have some questions with UTF-8 zebra ( MARC::Record MARC::XML, but I'm not sure to know which tool is responsible of my problem) In my zebra config file I have : recordType: grs.xml encoding utf-8 MARC::Record is the sourceforge 2.0 version (installed today) zebra version is 1.3.32 MARC::XML version is 0.7 YAZ version: 2.1.12 The following code (some line removed) : $Zconn-option(cqlfile = C4::Context-config(intranetdir)./zebra/pqf.properties); $Zconn-option(preferredRecordSyntax = xml); my $rs = $Zconn-search($q); for (my $i=$offset-1; $i = $maxrecordnum-1; $i++) { my $record = MARC::Record-new_from_xml($rs-record($i)-raw()); warn REC2 = .$record-as_formatted; } shows in log : Dictionnaire fran\xc3\xa7ais-anglais des termes relatifs \xc3\xa0 l'\xc3\xa9lectronique, l'\xc3\xa9lectrotechnique, \xc3\xa7 is a ç, it should be 00E7, \xc3\xa0 is a à, it should be 00E0, \xc3\xa9 is a é, it should be 00E9. (or i'm wrong somewhere, I must admit i'm a newbie at utf-8, you'll let me know) (If I directly dump the XML record returned by zebra, I get the same result, so the problem is probably not in MARC::Record or MARC::XML) Could someone help me finding the origin of the problem ?
Re: MARC::Record and Unicode?
Great ... thanks Bryan. MARC::Record2.0RC1 fixed my probs. In case anyone's wondering why I asked, I just finished version 0.01 of Open-ILS's [1] new Z39.50 Server. A bug in MARC::Record 1.x was holding me back. The server's based on Index Data's excellent Net::Z3950::SimpleServer [2]. It also uses MARC::Record to convert MARCXML (which is how Open-ILS stores bib records) to MARC21. The problem I was having with MARC::Record 1.x happened because some of the MARC records in GPLS's data were encoded with utf-8 outside the ascii range (at least I _think_ that was the problem) and MARC::Record wasn't calculating the directory offsets properly (again, I _think_). MARC::Record 2.0RC1 handled the data fine and no errors were thrown by SimpleServer and my Z-client (Yaz) when I tested. So Kudos to whoever's responsible for the utf-8 support. [1] http://openils.org [2] http://indexdata.dk/simpleserver Cheers, -- Joshua Ferraro VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE President, Technology migration, training, maintenance, support LibLimeFeaturing Koha Open-Source ILS [EMAIL PROTECTED] |Full Demos at http://liblime.com/koha |1(888)KohaILS On Mon, Jan 30, 2006 at 11:19:54AM -0600, Bryan Baldus wrote: On Monday, January 30, 2006 11:04 AM, Joshua Ferraro wrote: Hi there, I've heard there is a unicode-friendly version of MARC::Record, just wondering whether it can be found in SourceForge or CPAN. It is in SourceForge [1]. I don't believe version 2 has been released to CPAN yet. [1] http://cvs.sourceforge.net/viewcvs.py/marcpm/marc-record/ I hope this helps, Bryan Baldus [EMAIL PROTECTED] [EMAIL PROTECTED] http://home.inwave.com/eija
MARC::Record and Unicode?
Hi there, I've heard there is a unicode-friendly version of MARC::Record, just wondering whether it can be found in SourceForge or CPAN. Thanks, -- Joshua Ferraro VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE President, Technology migration, training, maintenance, support LibLimeFeaturing Koha Open-Source ILS [EMAIL PROTECTED] |Full Demos at http://liblime.com/koha |1(888)KohaILS
MARC separators stripped out(encoding prob?)
Ed et all, I've run into a strange problem with a Z3950 server that I wrote using Indexdata's SimpleServer module. The Z-Server is for the Koha Library System and I use a subroutine in Koha called MARCgetbiblio to generate MARC records (MARCgetbiblio uses MARC::Record) that I put into a string and send off to the Z-client via SimpleServer's built-in hash $args-{RECORD}. Everything seemed to be working fine while I was using my copy of Bookwhere as the Z-Client; but when I tried to use the Yaz client to connect to my server I get the following error when retrieving the MARC records: Z show 1 Sent presentRequest (1+1). Records: 1 []Record type: USmarc 001 !-- no separator at end of field -- 000 !-- no separator at end of field -- 700 !-- no separator at end of field -- (...snip) In fact, a number of Z-clients have this same problem when attempting to retrieve records from my Z-Server. When I print the string to the Z-Server log (which is printed on the terminal that the server was executed from) the separators are also missing. However, if I dump the MARC string into a file before it gets passed to the SimpleServer hash, the seperators are there and a MARC-testing program called dumpmarc.pl says that the records are fine. Also, I can see ^s in the file when I open it in vi that aren't there in the Z-Server screen log. The Z-Server can be reached at: koha.athenscounty.lib.oh.us: (no database name necessary). Could it be that the encoding is somehow wrong? Is there something unique about MARC separators? I'm out of ideas...any suggestions? I've attached sample records in both formats (with and without separators) as well as my fetch_handler code and Koha's MARCgetbiblio subroutine. Thanks, Joshua Ferraro Nelsonville Public Library Record with separators: 00786 00265 001000700030005700500170001200800410002901000130 0070020002100083035001300104090001700117124001342450062001582500012002202600 0530023233500285440002500320531003455200037665000370039672300433 942002500456952003900481^^128616^^ACLS^^20030516103009.0^^970724s1997caua 001 0 eng d^^ ^_a97208359^^ ^_a1565922840 (pa.)^^ ^_a97208359^^ ^_c99 294^_d99294^^1 ^_aSchwartz, Randal L.^^10^_aLearning Perl /^_cRandal L. Schwartz and Tom Christiansen.^^ ^_a2nd ed.^^ ^_aSebastopol, CA :^_bO'Reilly Associa tes,^_cc1997.^^ ^_axxix, 269 p. :^_bill. ;^_c24 cm.^^ 2^_aA Nutshell handbook.^ ^ ^_aUnix programming--Cover.^^ ^_aIncludes index.^^ 0^_aPerl (Computer prog ram language)^^1 ^_aChristiansen, Tom.^^ ^_aACLS^_cNF^_k005.133 Sc^^ ^_bAPL^_p 3200095485^_r29.95^_u208596^^^] Same record without separators: 00786 00265 00100070003000570050017000120080041000290100013000700200021000830350013001040900017001171240013424500620015825000120022026000530023233500285440002500320531003455200037665000370039672300433942002500456952003900481128616ACLS20030516103009.0970724s1997 caua 001 0 eng d a97208359 a1565922840 (pa.) a97208359 c99294d992941 aSchwartz, Randal L.10aLearning Perl /cRandal L. Schwartz and Tom Christiansen. a2nd ed. aSebastopol, CA :bO'Reilly Associates,cc1997. axxix, 269 p. :bill. ;c24 cm. 2aA Nutshell handbook. aUnix programming--Cover. aIncludes index. 0aPerl (Computer program language)1 aChristiansen, Tom. aACLScNFk005.133 Sc bAPLp3200095485r29.95u208596 Here's my Z-Server fetch_handler which calls the MARCgetbiblio subroutine and then places the resulting MARC record into a string and passes it to SimpleServer's hash function (and off it goes to the client). sub fetch_handler { my ($args) = @_; # warn in fetch_handler; ## troubleshooting my $offset = $args-{OFFSET}; $offset -= 1; ## $args-{OFFSET} starts on 0 ## Set the bibid to be used chomp (my $bibid = $bib_list[$offset]); my $MARCRecord = MARCgetbiblio($dbh,$bibid); my $recordstring=$MARCRecord-as_usmarc(); ## Print MARC record to the Z-Server log print here is my record: $recordstring\n; ## Dump MARC record to a file use Data::Dumper; Dumper $recordstring; open (MARC, /root/marc.dump); print MARC $recordstring; close MARC; ## Return the record string to the client $args-{RECORD} = $recordstring; } Here's the MARCgetbiblio subroutine: sub MARCgetbiblio { # Returns MARC::Record of the biblio passed in parameter. my ($dbh,$bibid)[EMAIL PROTECTED]; my $record = MARC::Record-new(); # TODO : the leader is missing $record-leader(''); my $sth=$dbh-prepare(select bibid,subfieldid,tag,tagorder,tag_indicator,su bfieldcode,subfieldorder,subfieldvalue,valuebloblink from
MARC::Record leader
Hello everyone, I am new to this list. I'm also very new to Perl so please bear with me:-). I am working on a Z3950 Server for my library (which is using Koha for ILS) and I am having trouble generating MARC records using MARC::Record. I am generatingthe records from a MySQL database and I don't know how to determine on-the-fly what the leader length and base address are (also I'm not sure how to use the set_leader_lengths access method). My code is below...any suggestions? Thanks, Joshua Ferraro Nelsonville Public Library Here is my code(this sub should build one record from Koha's marc_subfield_table using one bibid stored in @bib_list): sub fetch_handler { my ($args) = @_; # warn in fetch_handler; ## troubleshooting my $offset = $args-{OFFSET}; $offset -= 1; ## because $args-{OFFSET} 1 = record #1 chomp (my $bibid = $bib_list[$offset]); my $sql_query = SELECT tag, subfieldcode, subfieldvalue FROM marc_subfi eld_table where bibid=?; my $sth_get = $dbh-prepare($sql_query); $sth_get-execute($bibid); ## create a MARC::Record object my $rec = MARC::Record-new(); ## create the fields while (my @data=$sth_get-fetchrow_array) { my $tag = $data[0]; my $subfieldcode = $data[1]; my $subfieldvalue = $data[2]; my $field = MARC::Field-new( $tag,'','', $subfieldcode = $subfieldvalu e, ); $rec-append_fields($field); ## build the marc string and put into $record my $record = $rec-as_usmarc(); $args-{RECORD} = $record; }