[EMAIL PROTECTED] said: > Can someone give me a few quick examples of creating Encode::XS > objects to do simple transcoding, from XS?
Have you read the "enc2xs" man page that comes with perl 5.8? I've used it myself (having never done this before), and potentially the part that takes longest is preparing the code-point mapping table that enc2xs uses as input to create a character-set module for Encode. In my case, I noticed that the 5.8.0 release installed on our local server had a "iso-8859-6" (Arabic) module that converted ASCII digits into Arabic-Indic digits when converting to unicode. Rather than do extra scripting every time I use this module (in order to undo the digit conversion), I created an alternate version, "iso-8859-6-nd". Not only would this version leave ASCII digits alone when converting from 8859-6 to unicode, but if I convert unicode back to 8859-6, any Arabic-Indic digits in the unicode data would be converted to ASCII. First I needed a ucm file, which was simply the unicode/iso-8859-6 character map with an extra data column, where I set the digit character correspondences the way I wanted (and left everything else as-is): # # iso-8859-6-nd.ucm : Unicode Character Map for 8-bit Arabic # # This version differs from the iso-8859-6.ucm provided with the # standard Perl-5.8 Encode module by virtue of the way it treats # digit characters. 8859-6 does not include Arabic-Indic digits # and instead uses ASCII digits for all numeric strings; the # standard Encode module for this character set maps all ASCII # digits to Arabic-Indic numerals (\x{0660} - \x{0669}). The # following table leaves all ASCII digits unmodified. # <code_set_name> "iso-8859-6-nd" <code_set_alias> "iso-arabic" <mb_cur_min> 1 <mb_cur_max> 1 <subchar> \x3f # CHARMAP ... <U0030> \x30 |0 # DIGIT ZERO <U0031> \x31 |0 # DIGIT ONE <U0032> \x32 |0 # DIGIT TWO <U0033> \x33 |0 # DIGIT THREE <U0034> \x34 |0 # DIGIT FOUR <U0035> \x35 |0 # DIGIT FIVE <U0036> \x36 |0 # DIGIT SIX <U0037> \x37 |0 # DIGIT SEVEN <U0038> \x38 |0 # DIGIT EIGHT <U0039> \x39 |0 # DIGIT NINE ... <U0660> \x30 |1 # ARABIC-INDIC DIGIT ZERO <U0661> \x31 |1 # ARABIC-INDIC DIGIT ONE <U0662> \x32 |1 # ARABIC-INDIC DIGIT TWO <U0663> \x33 |1 # ARABIC-INDIC DIGIT THREE <U0664> \x34 |1 # ARABIC-INDIC DIGIT FOUR <U0665> \x35 |1 # ARABIC-INDIC DIGIT FIVE <U0666> \x36 |1 # ARABIC-INDIC DIGIT SIX <U0667> \x37 |1 # ARABIC-INDIC DIGIT SEVEN <U0668> \x38 |1 # ARABIC-INDIC DIGIT EIGHT <U0669> \x39 |1 # ARABIC-INDIC DIGIT NINE The enc2xs docs explain how to set up this file, and then gives a simple cook-book sequence of operations to process it and produce a module that you install into your "@INC" path (or into some path you can address with "-I/path"). Hope that helps. Dave Graff