Re: [CODE4LIB] Roman-script to Hebrew-script automation
Mark A. Matienzo wrote: This is great news! I'd love to see you share the code with the greater community. This may prove particularly useful for the automated addition of non-Roman data into authority records for NACO members (see [1]; see also [2]). Okay, y'all can check out the code at http://code.google.com/p/lc-hebrew-detransliteration/source/browse/#svn/trunk or svn checkout http://lc-hebrew-detransliteration.googlecode.com/svn/trunk/ lc-hebrew-detransliteration-read-only The more reusable file is the .class.php file; hebrify.php is a messier file that pulls certain fields out of MARC-broken files and spits out the XML- and III/OPAC-encoded renditions I mentioned earlier. I'll have to clean up the Expect scripts later. -- Yitzchak Schaffer Systems Librarian Touro College Libraries 33 West 23rd Street New York, NY 10010 Tel (212) 463-0400 x5230 Fax (212) 627-3197 [EMAIL PROTECTED]
[CODE4LIB] Roman-script to Hebrew-script automation
BSD Greetings all: It occurs to me now that I might have checked for existing work on the lists before I did this, but anyway -- we are in the finishing stages of creating scripts that will automatically convert a library's existing Romanized MARC Hebrew fields (e.g. Sefer {dotb}Hatan Torah) into Hebrew-script, and add them to the records already in the ILS. It's quite accurate; not bulletproof, but at least it's a way to quickly get Hebrew script into thousands of Roman-only records, where many Hebrew users (including staff) may not understand the transliteration rules 100%. The Hebrew conversion itself is done by a PHP script (haven't finished learning Perl) acting on a MARC dump of Roman-only Hebrew records in MRK (broken MARCedit) format. This outputs two files of converted fields: an XML file for proofing, and a tab-delimited text file for the inputting script to devour. This inputting is done by an Expect script using the character-based ILS client. We are an III shop. This could presumably be adapted easily enough for another ILS, whether using Expect or direct manipulation of database tables. (I'm not volunteering, though...) It would probably be easy enough to adapt to another language also, assuming that language were at least as predictable in MARC as Hebrew. (It's pretty good - my list of manual override words that the auto-algorithm botches is now totaling about 35 in preliminary testing.) Note that I can't imagine automating the other direction, Hebrew- to Roman-script, unless there's some algorithm for this already floating around out there. If anyone's interested, I'll clean up the code and open-source it. Cheers, Shabbat shalom, -- Yitzchak Schaffer Systems Librarian Touro College Libraries 33 West 23rd Street New York, NY 10010 Tel (212) 463-0400 x5230 Fax (212) 627-3197 [EMAIL PROTECTED]
Re: [CODE4LIB] Roman-script to Hebrew-script automation
Here is a bit of existing work I know of in this area. MARC::Detrans De-transliterate text and MARC records http://search.cpan.org/dist/MARC-Detrans/ There is a paper Cyril: expanding the horizons of MARC21 by Jacobs, Jane W.; Summers, Ed; Ankersen, Elizabeth, Library Hi Tech, Volume 22, Number 1, 2004, pp. 8-17(10) Good discussion of the issues this creates. The Cyril software doesn't seem to be available still. Sincerely, David Bigwood [EMAIL PROTECTED] Catalogablog http://catalogablog.blogspot.com Twitter LPI_Library Greetings all: It occurs to me now that I might have checked for existing work on the lists before I did this, but anyway -- we are in the finishing stages of creating scripts that will automatically convert a library's existing Romanized MARC Hebrew fields (e.g. Sefer {dotb}Hatan Torah) into Hebrew-script, and add them to the records already in the ILS. It's quite accurate; not bulletproof, but at least it's a way to quickly get Hebrew script into thousands of Roman-only records, where many Hebrew users (including staff) may not understand the transliteration rules 100%. The Hebrew conversion itself is done by a PHP script (haven't finished learning Perl) acting on a MARC dump of Roman-only Hebrew records in MRK (broken MARCedit) format. This outputs two files of converted fields: an XML file for proofing, and a tab-delimited text file for the inputting script to devour. This inputting is done by an Expect script using the character-based ILS client. We are an III shop. This could presumably be adapted easily enough for another ILS, whether using Expect or direct manipulation of database tables. (I'm not volunteering, though...) It would probably be easy enough to adapt to another language also, assuming that language were at least as predictable in MARC as Hebrew. (It's pretty good - my list of manual override words that the auto-algorithm botches is now totaling about 35 in preliminary testing.) Note that I can't imagine automating the other direction, Hebrew- to Roman-script, unless there's some algorithm for this already floating around out there. If anyone's interested, I'll clean up the code and open-source it. Cheers, Shabbat shalom, -- Yitzchak Schaffer Systems Librarian Touro College Libraries 33 West 23rd Street New York, NY 10010 Tel (212) 463-0400 x5230 Fax (212) 627-3197 [EMAIL PROTECTED]