> Is it possible to get a list of all mappings between characters with > diacriticals and their "flattened" ASCII equivalents? > > Similarly, is there a way to extend or modify this mapping in the > current version of MarkLogic?
I'm new to MLS, so I don't know if there is a way to do it there. Hopefully it is, I think it'd be a really nice feature w/re to content mangement (e.g., being able to build pages friendly to older browsers w/o having to go through a lot of external processing steps). Failing that, in theory I think one could do this using either Java by itself or using Java paired with XSLT or perhaps XQuery. http://www.w3.org/TR/xslt20/#element-output http://en.wikipedia.org/wiki/Unicode_normalization http://unicode.org/reports/tr15/#Decomposition What I'm thinking you could do is load create a map of the characters you want to flatten and process them with something which makes two copies of special characters in attribute fields, output outputting everything using NFD encoding. You then postprocess that with a program operating on the byte level which can strip out the 'diacritic' characters from one of the fields, leaving you with a mapping of the accented character to a flattened version. Attached below are two files, pmap.xsl and pmap.xml. The .xml was built by running Saxon against the .xsl file. It should, I think, be possible to then use Java (or C or C++ I suppose) to read in the .xml file using a byte stream, and upon hitting the 'flattened' attribute, start dropping bytes which aren't in the range 1-127, until you hit the quote. It may or may not be possible to do that in a nicer fashion using a real XML parser. I suspect most parsers will consume the NFD form and turn it into UTF-16 or whatever they use to internally represent the characters. Jim
binu7RCBDRH3q.bin
Description: Binary data
bin1to99wFXMH.bin
Description: Binary data
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - James A. Robinson [EMAIL PROTECTED] Stanford University HighWire Press http://highwire.stanford.edu/ +1 650 7237294 (Work) +1 650 7259335 (Fax)
_______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
