https://bz.apache.org/ooo/show_bug.cgi?id=103308
dam...@apache.org changed: What |Removed |Added ---------------------------------------------------------------------------- Latest|--- |4.2.0-dev Confirmation in| | CC| |dam...@apache.org --- Comment #5 from dam...@apache.org --- (In reply to h...@apache.org from comment #2) > Fixing the method "sal_Unicode CSS1Parser::GetNextChar()" in > sw/source/filter/html/parcss1.cxx is > probably a good starting point. Yes but that's just CSS parsing, the remainder of the HTML parsing is in main/svtools/source/svhtml/parhtml.cxx, which, sadly like most of our codebase, also operates one Unicode code unit at a time, retrieved from SvParser::GetNextChar(). The function inline sal_uInt16 GetCharSize() const; got my hopes up, does it tell us the code point size? inline sal_uInt16 SvParser::GetCharSize() const { return (RTL_TEXTENCODING_UCS2 == eSrcEnc) ? 2 : 1; } No, just the bytes per BMP character for the current encoding, a useless statistic. SvParser does not have any functions for code points. We'd have to add them and change a lot of code - not just HTML parsing - to use them. -- You are receiving this mail because: You are on the CC list for the issue. You are the assignee for the issue.