Hi all,
I'm going to improve the MathML type detection. Currently there exist
files, that can be opened or imported fine, when the type detection
would allow it. https://bz.apache.org/ooo/show_bug.cgi?id=126230
I have attached a C++ file to show what I want to do.
The problem is, that MathML does not need to be encoded in utf-8 but can
have any other encoding. For example MS Windows "Math Input Control"
exports formulas in utf-16.
So my question is, which kind of string can I use, that is able to
detect/use utf-16 and has the needed methods similar to C++ string
methods find, rfind, insert, substring, clear, erase? Does AOO has such
kind of string?
It is possible to get the encoding from the MathML file or set default
utf-8, in case that information is needed for to instantiate a string
object.
Kind regards
Regina
// detect MathML
#include <iostream>
#include <string>
int main ()
{
// to be used in starmath/source/smdetect.cxx
// dummy sFragment with minimal MathML; will be variable 'aBuffer' later on
const std::string sFragment("<my:math xmlns:my =
\x0022http://www.w3.org/1998/Math/MathML\x0022 ></math>");
std::cout << "012345678901234567890123456789012345678901234567890123456789"
<< "\n";
std::cout << sFragment.c_str() << "\n";
// does it have a MathML namespace attribute? First look for URL.
std::size_t posURL = sFragment.find("http://www.w3.org/1998/Math/MathML");
if (posURL != std::string::npos)
{
// URL needs to be a attribute value, look for "="
std::size_t posEQ = sFragment.rfind("=",posURL);
if (posEQ != std::string::npos)
{
// attribute needs to be 'xmlns'
std::size_t posXMLNS = sFragment.rfind("xmlns",posEQ);
if (posXMLNS != std::string::npos)
{
// look whether a prefix to 'math' is specified
std::string sPrefix = sFragment.substr( posXMLNS+5 ,
posEQ-(posXMLNS+5) );
if (sPrefix.length() > 0)
{
// remove any whitespace
const std::string sWhitespace="\x0020\x0009\x000A\x000D";
for (unsigned i=0; i< sPrefix.length(); ++i)
{
if ( sWhitespace.find(sPrefix.at(i)) != std::string::npos )
{
sPrefix.erase(i,1);
}
}
// trim
if ( sPrefix.length() > 0 && sPrefix.find(":") == 0)
{
sPrefix.erase(0,1);
}
else
{
// don't know what SAX parser does on not well-formed XML
// for now simple ignore the trash
sPrefix.clear();
}
}
// In last step look for <math or <prefix:math
std::string sMath = "<math";
if (sPrefix.length() >0 )
{
sMath.insert(1,":");
sMath.insert(1,sPrefix);
}
std::size_t posMATH = sFragment.rfind(sMath, posXMLNS);
if (posMATH != std::string::npos)
{
std::cout << "MathML file" << "\n";
}
else
{
std::cout << "Not a MathML" << "\n";
}
}
}
}
return 0;
}
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org