Hello Regina,

On Wed, Apr 08, 2015 at 09:02:06PM +0200, Regina Henschel wrote:
> Hi all,
> 
> I'm going to improve the MathML type detection. Currently there exist files,
> that can be opened or imported fine, when the type detection would allow it.
> https://bz.apache.org/ooo/show_bug.cgi?id=126230
> 
> I have attached a C++ file to show what I want to do.
> The problem is, that MathML does not need to be encoded in utf-8 but can
> have any other encoding. For example MS Windows "Math Input Control" exports
> formulas in utf-16.
> 
> So my question is, which kind of string can I use, that is able to
> detect/use utf-16 and has the needed methods similar to C++ string methods
> find, rfind, insert, substring, clear, erase? Does AOO has such kind of
> string?

You can use OpenOffice's rtl string and string buffer classes, together
with the lower lever text conversion from
https://www.openoffice.org/api/docs/cpp/ref/names/o-textcvt.h.html

> It is possible to get the encoding from the MathML file or set default
> utf-8, in case that information is needed for to instantiate a string
> object.

If the file has no information about its encoding, you will have to
perform some kind of encoding detection, see Writer's ASCII filter for
example:

bool SwIoSystem::IsDetectableText
main/sw/source/filter/basflt/iodetect.cxx

used in sal_uLong SwASCIIParser::ReadChars()
main/sw/source/filter/ascii/parasc.cxx

Searching rtl_convertTextToUnicode in OpenGrok might give other useful
hints.


Regards
-- 
Ariel Constenla-Haile
La Plata, Argentina

Attachment: signature.asc
Description: Digital signature

Reply via email to