Hello Regina, On Wed, Apr 08, 2015 at 09:02:06PM +0200, Regina Henschel wrote: > Hi all, > > I'm going to improve the MathML type detection. Currently there exist files, > that can be opened or imported fine, when the type detection would allow it. > https://bz.apache.org/ooo/show_bug.cgi?id=126230 > > I have attached a C++ file to show what I want to do. > The problem is, that MathML does not need to be encoded in utf-8 but can > have any other encoding. For example MS Windows "Math Input Control" exports > formulas in utf-16. > > So my question is, which kind of string can I use, that is able to > detect/use utf-16 and has the needed methods similar to C++ string methods > find, rfind, insert, substring, clear, erase? Does AOO has such kind of > string?
You can use OpenOffice's rtl string and string buffer classes, together with the lower lever text conversion from https://www.openoffice.org/api/docs/cpp/ref/names/o-textcvt.h.html > It is possible to get the encoding from the MathML file or set default > utf-8, in case that information is needed for to instantiate a string > object. If the file has no information about its encoding, you will have to perform some kind of encoding detection, see Writer's ASCII filter for example: bool SwIoSystem::IsDetectableText main/sw/source/filter/basflt/iodetect.cxx used in sal_uLong SwASCIIParser::ReadChars() main/sw/source/filter/ascii/parasc.cxx Searching rtl_convertTextToUnicode in OpenGrok might give other useful hints. Regards -- Ariel Constenla-Haile La Plata, Argentina
signature.asc
Description: Digital signature