The issue has been submitted to Bugzilla https://issues.apache.org/ooo/show_bug.cgi?id=125495
When importing a RTF file with Chinese numbering created by MSO, the numbering suffix were changed to strange characters ( like B, i ), as attached image file. After following the code trace on gdb, I saw that encoding of parserstate return to default in the middle, so that multibyte string were treated as ANSI strings. Because codepage encoding options like \ansicp950 appears later than the first bracket '{', the first parsing state has been pushed into the stack before correct encoding were set. Later when it was popped, the encoding of later state were affected and become the default even if \ansicp950 already appears, in consequence it affect multibyte string conversion for text token. The fix is to call setEncoding instead of setSrcEncoding when seeing encoding related control word. Updated code will overwrite the encoding of the state on top of the frame. Since setEncoding is there without anybody calling it, I wonder if it is typo of original author. The patch has been verified to work in my environment. In theory , all multibyte chracter encoded documents were affected. Please help to review & merge if possible. -- Mark Hung