As I mentioned in the BeanUtils vote, its RELEASE-NOTES.txt was in character set ISO-8859-1, instead of say UTF-8 (to represent the name "Tommy Tynjä").
However the RELEASE-NOTES are special in that they go into git/svn and thus the release zip/tar.gz, but also we copy them into the dist download area - see for instance http://www.apache.org/dist/commons/collections/RELEASE-NOTES-4.0.txt which (if you search for COLLECTIONS-8) should say correctly with Norwegian O-slash: > Thanks to Rune Peter Bjørnstad. but instead might (as in my Chromium browser) be shown incorrectly in "WTF8": > Thanks to Rune Peter Bjørnstad. This is because the file (at least from www.apache.org) is served as just: Content-Type: text/plain e.g. character set ISO 8859-1 (Latin 1). (Different mirrors might have a different AddDefaultCharset set - http://www.apache.org/info/how-to-mirror.html does not mandate any) I think we should correctly cater for any non-latin1-names in our release notes - people should be thanked by their real names -- not everyone wants to legally change their name to an ASCII-compatible version (says formerly "Stian Søiland"). So I had a look at the immediate files in dist, and found these non-ASCII text files: stain@biggiebuntu:~/src/95/commons$ find . -type f | grep -v .svn | xargs file | grep -v ASCII ./bcel/RELEASE-NOTES.txt: UTF-8 Unicode text ./email/RELEASE-NOTES.txt: UTF-8 Unicode text ./codec/RELEASE-NOTES.txt: ISO-8859 text, with CRLF line terminators ./logging/RELEASE-NOTES.txt: UTF-8 Unicode text ./cli/RELEASE-NOTES.txt: ISO-8859 text ./beanutils/RELEASE-NOTES.txt: C++ source, ISO-8859 text ./collections/RELEASE-NOTES.txt: UTF-8 Unicode text ./collections/RELEASE-NOTES-4.0.txt: UTF-8 Unicode text ./compress/RELEASE-NOTES.txt: UTF-8 Unicode text ./lang/RELEASE-NOTES.txt: ISO-8859 text I propose we add a default commons/.htaccess which sets something like: AddCharset UTF-8 .txt .html ..and convert the ISO-8859 ones to UTF-8; (checking manually they are latin 1 and not any of the other latin variants). We should fix both in dist and git/svn to avoid regression. As various .htaccess files are already in operation across dist (I found at least 20, including under httpd), so I think this should be OK. For the BeanUtils 1.9.3 release I thus added such an .htaccess - then we can see if that breaks anything on the mirrors. So far so good: stain@biggiebuntu:~/src/95$ curl -s -I http://www.apache.org/dist/commons/beanutils/RELEASE-NOTES.txt | grep Content-Type Content-Type: text/plain; charset=utf-8 Views..? -- Stian Soiland-Reyes http://orcid.org/0000-0001-9842-9718 --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
