http://bugzilla.lyx.org/show_bug.cgi?id=4439
This is a (hopefully correct) patch to a rather nasty bug that has been in LyX since 1.5.0, but was recently uncovered with Bo's encoding error dialog work. I have marked it a blocker, since it breaks compilation of hebrew documents as of rev. 22235 of branch (and also in trunk). However, after digging into it, I think it was more severe than we thought from the beginning, because it could result in data loss. The symptom of the bug is that the hebrew character tav (ת) is not known to LyX in the cp-1255 encoding, as reported on bugzilla. Thus all hebrew documents in "standard" encoding using this character fail to export. My research on the Internet revealed the following: In the case of some encodings, cp-1255 being one of them, iconv holds back characters in the stream and waits for the next character, in order to check if it is a combination character (in the case of cp-1255, this seems to be necessary to support nikuds). Now the tav character is the last one we check when setting up the cp-1255 encoding (in Encoding::init()), and it simply remains in inconv's stream, because iconv waits for the next character. Thus it is not added to our list of encodable characters, which manifests itself in the described symptom. The solution (and what the patch does) is to flush the stream by passing a NULL input to inconv at the very end of the conversion process. This procedure is adviced by the iconv maintainer (B. Haible), e.g. in this bugzilla report which oulines the problem very clearly: http://sources.redhat.com/bugzilla/show_bug.cgi?id=1124 HOWEVER, I'm no iconv or encoding expert, and this is a change in a very sensitive area, so please have a look and tell me if my analysis makes sense. Thanks, Jürgen
Index: src/support/unicode.cpp =================================================================== --- src/support/unicode.cpp (Revision 22305) +++ src/support/unicode.cpp (Arbeitskopie) @@ -135,6 +135,11 @@ int res = iconv(pimpl_->cd, &inbuf, &inbytesleft, &outbuf, &outbytesleft); + // flush out remaining data. This is needed because the iconv sometimes + // holds back chars in the stream, waiting for a combination character + // (see e.g. http://sources.redhat.com/bugzilla/show_bug.cgi?id=1124) + iconv(pimpl_->cd, NULL, NULL, &outbuf, &outbytesleft); + //lyxerr << std::dec; //lyxerr << "Inbytesleft: " << inbytesleft << endl; //lyxerr << "Outbytesleft: " << outbytesleft << endl;
