Re: [CODE4LIB] Roman-script to Hebrew-script automation

2008-08-17 Thread Yitzchak Schaffer

Mark A. Matienzo wrote:


This is great news! I'd love to see you share the code with the
greater community. This may prove particularly useful for the
automated addition of non-Roman data into authority records for NACO
members (see [1]; see also [2]).


Okay, y'all can check out the code at

http://code.google.com/p/lc-hebrew-detransliteration/source/browse/#svn/trunk
or
svn checkout 
http://lc-hebrew-detransliteration.googlecode.com/svn/trunk/ 
lc-hebrew-detransliteration-read-only


The more reusable file is the .class.php file; hebrify.php is a messier 
file that pulls certain fields out of MARC-broken files and spits out 
the XML- and III/OPAC-encoded renditions I mentioned earlier.


I'll have to clean up the Expect scripts later.

--
Yitzchak Schaffer
Systems Librarian
Touro College Libraries
33 West 23rd Street
New York, NY 10010
Tel (212) 463-0400 x5230
Fax (212) 627-3197
[EMAIL PROTECTED]


[CODE4LIB] Roman-script to Hebrew-script automation

2008-08-15 Thread Yitzchak Schaffer

BSD

Greetings all:

It occurs to me now that I might have checked for existing work on the 
lists before I did this, but anyway -- we are in the finishing stages of 
creating scripts that will automatically convert a library's existing 
Romanized MARC Hebrew fields (e.g. Sefer {dotb}Hatan Torah) into 
Hebrew-script, and add them to the records already in the ILS.  It's 
quite accurate; not bulletproof, but at least it's a way to quickly get 
Hebrew script into thousands of Roman-only records, where many Hebrew 
users (including staff) may not understand the transliteration rules 100%.


The Hebrew conversion itself is done by a PHP script (haven't finished 
learning Perl) acting on a MARC dump of Roman-only Hebrew records in MRK 
(broken MARCedit) format.  This outputs two files of converted fields: 
an XML file for proofing, and a tab-delimited text file for the 
inputting script to devour.  This inputting is done by an Expect script 
using the character-based ILS client.


We are an III shop.  This could presumably be adapted easily enough for 
another ILS, whether using Expect or direct manipulation of database 
tables.  (I'm not volunteering, though...) It would probably be easy 
enough to adapt to another language also, assuming that language were at 
least as predictable in MARC as Hebrew.  (It's pretty good - my list of 
manual override words that the auto-algorithm botches is now totaling 
about 35 in preliminary testing.)


Note that I can't imagine automating the other direction, Hebrew- to 
Roman-script, unless there's some algorithm for this already floating 
around out there.


If anyone's interested, I'll clean up the code and open-source it.

Cheers, Shabbat shalom,

--
Yitzchak Schaffer
Systems Librarian
Touro College Libraries
33 West 23rd Street
New York, NY 10010
Tel (212) 463-0400 x5230
Fax (212) 627-3197
[EMAIL PROTECTED]


Re: [CODE4LIB] Roman-script to Hebrew-script automation

2008-08-15 Thread Bigwood, David
Here is a bit of existing work I know of in this area.

MARC::Detrans De-transliterate text and MARC records
http://search.cpan.org/dist/MARC-Detrans/

There is a paper Cyril: expanding the horizons of MARC21 by Jacobs, Jane
W.; Summers, Ed; Ankersen, Elizabeth, Library Hi Tech, Volume 22, Number
1, 2004, pp. 8-17(10) Good discussion of the issues this creates.

The Cyril software doesn't seem to be available still.

Sincerely,
David Bigwood
[EMAIL PROTECTED]
Catalogablog http://catalogablog.blogspot.com
Twitter LPI_Library

 Greetings all:

 It occurs to me now that I might have checked for existing work on the
lists
 before I did this, but anyway -- we are in the finishing stages of
creating
 scripts that will automatically convert a library's existing Romanized
MARC
 Hebrew fields (e.g. Sefer {dotb}Hatan Torah) into Hebrew-script, and
add
 them to the records already in the ILS.  It's quite accurate; not
 bulletproof, but at least it's a way to quickly get Hebrew script into
 thousands of Roman-only records, where many Hebrew users (including
staff)
 may not understand the transliteration rules 100%.

 The Hebrew conversion itself is done by a PHP script (haven't finished
 learning Perl) acting on a MARC dump of Roman-only Hebrew records in
MRK
 (broken MARCedit) format.  This outputs two files of converted fields:
an
 XML file for proofing, and a tab-delimited text file for the inputting
 script to devour.  This inputting is done by an Expect script using
the
 character-based ILS client.

 We are an III shop.  This could presumably be adapted easily enough
for
 another ILS, whether using Expect or direct manipulation of database
tables.
  (I'm not volunteering, though...) It would probably be easy enough to
adapt
 to another language also, assuming that language were at least as
 predictable in MARC as Hebrew.  (It's pretty good - my list of manual
 override words that the auto-algorithm botches is now totaling about
35 in
 preliminary testing.)

 Note that I can't imagine automating the other direction, Hebrew- to
 Roman-script, unless there's some algorithm for this already floating
around
 out there.

 If anyone's interested, I'll clean up the code and open-source it.

 Cheers, Shabbat shalom,

 --
 Yitzchak Schaffer
 Systems Librarian
 Touro College Libraries
 33 West 23rd Street
 New York, NY 10010
 Tel (212) 463-0400 x5230
 Fax (212) 627-3197
 [EMAIL PROTECTED]