http://bugzilla.lyx.org/show_bug.cgi?id=4439

This is a (hopefully correct) patch to a rather nasty bug that has been in LyX 
since 1.5.0, but was recently uncovered with Bo's encoding error dialog work. 
I have marked it a blocker, since it breaks compilation of hebrew documents 
as of rev. 22235 of branch (and also in trunk). However, after digging into 
it, I think it was more severe than we thought from the beginning, because it 
could result in data loss.

The symptom of the bug is that the hebrew character tav (ת) is not known to 
LyX in the cp-1255 encoding, as reported on bugzilla. Thus all hebrew 
documents in "standard" encoding using this character fail to export.

My research on the Internet revealed the following: In the case of some 
encodings, cp-1255 being one of them, iconv holds back characters in the 
stream and waits for the next character, in order to check if it is a 
combination character (in the case of cp-1255, this seems to be necessary to 
support nikuds).

Now the tav character is the last one we check when setting up the cp-1255
encoding (in Encoding::init()), and it simply remains in inconv's stream, 
because iconv waits for the next character. Thus it is not added to our list 
of encodable characters, which manifests itself in the described symptom.

The solution (and what the patch does) is to flush the stream by passing a 
NULL input to inconv at the very end of the conversion process. This 
procedure is adviced by the iconv maintainer (B. Haible), e.g. in this 
bugzilla report which oulines the problem very clearly:
http://sources.redhat.com/bugzilla/show_bug.cgi?id=1124

HOWEVER, I'm no iconv or encoding expert, and this is a change in a very 
sensitive area, so please have a look and tell me if my analysis makes sense.

Thanks,
Jürgen
Index: src/support/unicode.cpp
===================================================================
--- src/support/unicode.cpp	(Revision 22305)
+++ src/support/unicode.cpp	(Arbeitskopie)
@@ -135,6 +135,11 @@
 
 	int res = iconv(pimpl_->cd, &inbuf, &inbytesleft, &outbuf, &outbytesleft);
 
+	// flush out remaining data. This is needed because the iconv sometimes
+	// holds back chars in the stream, waiting for a combination character
+	// (see e.g. http://sources.redhat.com/bugzilla/show_bug.cgi?id=1124)
+	iconv(pimpl_->cd, NULL, NULL, &outbuf, &outbytesleft);
+
 	//lyxerr << std::dec;
 	//lyxerr << "Inbytesleft: " << inbytesleft << endl;
 	//lyxerr << "Outbytesleft: " << outbytesleft << endl;

Reply via email to