Patrick Georgi spake unto us the following wisdom: > but skipping a character should be possible: > - build another iconv state that translates input encoding into input > encoding (unless that enables a fast-path, which I'm not sure of - > alternative might be some encoding that is the ultimate superset, if > such an encoding exists) > - push first unknown byte into it. if that creates a response already, > discard (as it might be some header sequence) and restart with the same > byte in the next step, otherwise start at the next byte > - until iconv emits a response, push byte after byte into it > - skip that many bytes in the input, replace with one "?"
This is more or less what we do in Gaim, for some of our fallback attempts. This can still lead to junk in your output, particularly given that a) there are non-UTF-8 character sets which look just like valid UTF-8 (e.g., ISO-2022-{JP,KR}), and b) there are character sets which will accept any byte as valid, though it may not be (e.g., ISO-8859-*). The bottom line, though, is that if the user (or operating system) has not successfully communicated the character set used for some chunk of data, you _cannot_ do the right thing -- the best you can do is try not to mess it up too much. For us, this basically means filter out anything that isn't UTF-8 before it gets to the user (normally replacing invalid sequences with one or more '?' characters), as our UI is guaranteed to be UTF-8 by design. With monotone you aren't given this guarantee, but a similar approach seems reasonable; try to convert it to whatever LC_CHARSET recommends, restarting one byte at a time and replacing any bytes which fail to convert with '?'. Ethan -- The laws that forbid the carrying of arms are laws [that have no remedy for evils]. They disarm only those who are neither inclined nor determined to commit crimes. -- Cesare Beccaria, "On Crimes and Punishments", 1764
signature.asc
Description: Digital signature
_______________________________________________ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel