Hi Andy,

> From: Houghton,Andrew [mailto:[EMAIL PROTECTED] 
>
> It just so happens that I have recently been converting 
> MARC-XML to RDF.  The RDF specification mandates Unicode 
> Normal form C, which means that the base character and the 
> diacritic are combined.

That's rather unfortunate, since Unicode includes the precomposed characters
largely for backward compatibility and the preferred 

> So I hacked together some Perl scripts to convert 
> Unicode NFD <-> Unicode NFC.
> 
> I was talking with a colleague, just yesterday, about whether 
> we should unleash these on the Net...  They need to be 
> cleaned up a little and need some basic documentation on how 
> to run the Perl scripts.

The W3C provides a Perl app that (I think) purports to do that [1].  I don't
know how much overlap there may be with your script, but just in case you
were not already aware of the W3C script, you may want to see if there is a
duplication of effort.

[1] "Charlint - A Character Normalization Tool" 
    http://www.w3.org/International/charlint/.

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-239-5368 cell
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/ 

> 

Reply via email to