Hi all,

I'm going to improve the MathML type detection. Currently there exist files, that can be opened or imported fine, when the type detection would allow it. https://bz.apache.org/ooo/show_bug.cgi?id=126230

I have attached a C++ file to show what I want to do.
The problem is, that MathML does not need to be encoded in utf-8 but can have any other encoding. For example MS Windows "Math Input Control" exports formulas in utf-16.

So my question is, which kind of string can I use, that is able to detect/use utf-16 and has the needed methods similar to C++ string methods find, rfind, insert, substring, clear, erase? Does AOO has such kind of string?

It is possible to get the encoding from the MathML file or set default utf-8, in case that information is needed for to instantiate a string object.

Kind regards
Regina



// detect MathML
#include <iostream>
#include <string>

int main ()
{
// to be used in starmath/source/smdetect.cxx
// dummy sFragment with minimal MathML; will be variable 'aBuffer' later on
  const std::string sFragment("<my:math xmlns:my = 
\x0022http://www.w3.org/1998/Math/MathML\x0022 ></math>");
  std::cout << "012345678901234567890123456789012345678901234567890123456789" 
<< "\n";
  std::cout << sFragment.c_str() << "\n";

// does it have a MathML namespace attribute? First look for URL.  
  std::size_t posURL = sFragment.find("http://www.w3.org/1998/Math/MathML";);
  if (posURL != std::string::npos)
  {
    // URL needs to be a attribute value, look for "="
    std::size_t posEQ = sFragment.rfind("=",posURL);
    if (posEQ != std::string::npos)
    {
        // attribute needs to be 'xmlns'
       std::size_t posXMLNS = sFragment.rfind("xmlns",posEQ);
       if (posXMLNS != std::string::npos)
       {
            // look whether a prefix to 'math' is specified
          std::string sPrefix = sFragment.substr( posXMLNS+5 , 
posEQ-(posXMLNS+5) );
          if (sPrefix.length() > 0)
          {
             // remove any whitespace
             const std::string sWhitespace="\x0020\x0009\x000A\x000D";
             for (unsigned i=0; i< sPrefix.length(); ++i)
             {
                if ( sWhitespace.find(sPrefix.at(i)) != std::string::npos )
                {
                   sPrefix.erase(i,1);   
                }
             }
             // trim
             if ( sPrefix.length() > 0 && sPrefix.find(":") == 0)
             {
                sPrefix.erase(0,1);
             }    
             else
             {
                // don't know what SAX parser does on not well-formed XML
                // for now simple ignore the trash
                sPrefix.clear(); 
             }
          }
          // In last step look for <math or <prefix:math
          std::string sMath = "<math";
          if (sPrefix.length() >0 )
          {
            sMath.insert(1,":");
            sMath.insert(1,sPrefix);
          }
          std::size_t posMATH = sFragment.rfind(sMath, posXMLNS);
          if (posMATH != std::string::npos)
          {
              std::cout << "MathML file" << "\n";
          }
          else
          {
              std::cout << "Not a MathML" << "\n";
          }
              
       }   
        
    }    
  }
  return 0;
}
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org

Reply via email to