Hi John, On Tue, Feb 05, 2008 at 07:27:11PM +0000, John McCreesh wrote: > Christian Lohmaier wrote: > >On Tue, Feb 05, 2008 at 07:34:50AM -0000, John McCreesh wrote: > >>On the Marketing Project planet, http://marketing.openoffice.org/planet > >>there are posts appearing with "non-displayable" characters. > >[...] > >As Ben's own site (and feed) is already UTF-8 and displays properly, > >there is a misconfiguration in the aggregator "planetplanet" used (or a > >bug), or the page is corrupted somewhere else in between. > > Hmm, as you say, Ben's feed is clean,
Yes - correctly declared with charset UTF-8 and the contents are actually UTF-8 (well - it misuses a single quotation as apostrophe and similar stuff, but that's a typography problem, not related to the encoding or display problems) > it passes through the Planet > aggregator cleanly, it looks ok on my server, but once it gets into cvs > on the site it's corrupt. I doubt that - could you provide a link to the page on your server? > Is there something in cvs that could mangle characters? No, cvs shouldn't mangle anything here. > Should I be > saving the files as binary in cvs? (maybe a question for the native-lang > people...)? I don't think that will help. To me the broken result looks like it has been passed through a conversion routine twice (latin1 -> utf8, with the problem being that it has already been UTF-8, not latin1) The reason why other posts from him are not mangled is, that those use numbered entities instead of the actual characters, those that don't get touched by any charset conversion. So either your planet is at fault, or whatever editor you use to merge the feed into the site is doing "clever" stuff to the data. If you suspect cvs doing bad stuff with the file, compare the md5sums of the file that you did commit and the file you can get via a checkout. ciao Christian -- NP: Korn - Fake --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
