Hi, I have made change to sitemap and dwn generation script in such way it fix Japanese pages as follows. Please rebuild pages if needed and poke me if I made mistakes.
cvs diff looks good and commited. /cvsroot/webwml/webwml/english/sitemap.wml,v <-- sitemap.wml new revision: 1.41; previous revision: 1.40 /cvsroot/webwml/webwml/english/News/weekly/dwn-to-rdf.pl,v <-- News/weekly/dwn-to-rdf.pl new revision: 1.11; previous revision: 1.10 ================================================= I thought we did good UTF-8 transition after regenerating some news pages. Alas... I found issue. http://www.debian.org/sitemap.ja.html Each line end with "ESC ( B" sequence. This is ISO2202( http://en.wikipedia.org/wiki/ISO/IEC_2022 ) code sequence indicating switch to ASCII (1 byte per character). It must have made sense when this page used 7 bit ISO2202 but it does not make sense. I do not know how to fix it. I have japanese/.wmlrc updated as: -D CUR_LANG=Japanese -D CUR_ISO_LANG=ja -D CUR_LOCALE=ja_JP.UTF-8 -D CHARSET=utf-8 -D HOME~. -D INTRO~intro -D DEVEL~devel -D DOC~doc -D DISTRIB~distrib -D MISC~misc -D BUGS~Bugs -D PICS~Pics -D STYLE~style -D VOTE~vote This code is clearly added by webwml when generating sitemap.ja.html from each file header. .... aha... sitemap.wml has funny special case. I am removing it now. I checked english source for "grep -R "Japanese" *" english/News/weekly/dwn-to-rdf.pl has funny encoded Japanese text too. It is in EUCJP. It should be "セキュリティ上の更新。" in UTF-8. This is difficult to edit since it is mixed encoding file. Since Vim is too smart for this, I used 8-bit-dumb-clean editor mcedit. Osamu -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected]

