As I mentioned in the BeanUtils vote, its RELEASE-NOTES.txt was in
character set ISO-8859-1, instead of say UTF-8 (to represent the name
"Tommy Tynjä").

However the RELEASE-NOTES are special in that they go into git/svn and
thus the release zip/tar.gz, but also we copy them into the dist
download area - see for instance


http://www.apache.org/dist/commons/collections/RELEASE-NOTES-4.0.txt

which  (if you search for COLLECTIONS-8) should say correctly with
Norwegian O-slash:

> Thanks to Rune Peter Bjørnstad.

but instead might (as in my Chromium browser) be shown incorrectly in "WTF8":

>  Thanks to Rune Peter Bjørnstad.


This is because the file (at least from www.apache.org) is served as just:

Content-Type: text/plain

e.g. character set ISO 8859-1 (Latin 1).


(Different mirrors might have a different AddDefaultCharset set -
http://www.apache.org/info/how-to-mirror.html does not mandate any)


I think we should correctly cater for any non-latin1-names in our
release notes - people should be thanked by their real names -- not
everyone wants to legally change their name to an ASCII-compatible
version (says formerly "Stian Søiland").


So I had a look at the immediate files in dist, and found these
non-ASCII text files:

stain@biggiebuntu:~/src/95/commons$ find . -type f | grep -v .svn |
xargs file | grep -v ASCII

./bcel/RELEASE-NOTES.txt:
UTF-8 Unicode text
./email/RELEASE-NOTES.txt:
UTF-8 Unicode text
./codec/RELEASE-NOTES.txt:
ISO-8859 text, with CRLF line terminators
./logging/RELEASE-NOTES.txt:
UTF-8 Unicode text
./cli/RELEASE-NOTES.txt:
ISO-8859 text
./beanutils/RELEASE-NOTES.txt:
C++ source, ISO-8859 text
./collections/RELEASE-NOTES.txt:
UTF-8 Unicode text
./collections/RELEASE-NOTES-4.0.txt:
UTF-8 Unicode text
./compress/RELEASE-NOTES.txt:
UTF-8 Unicode text
./lang/RELEASE-NOTES.txt:
ISO-8859 text


I propose we add a default commons/.htaccess which sets something like:

    AddCharset UTF-8 .txt .html

..and convert the ISO-8859 ones to UTF-8; (checking manually they are
latin 1 and not any of the other latin variants). We should fix both
in dist and git/svn to avoid regression.


As various .htaccess files are already in operation across dist (I
found at least 20, including under httpd), so I think this should be
OK.


For the BeanUtils 1.9.3 release I thus added such an .htaccess - then
we can see if that breaks anything on the mirrors. So far so good:

stain@biggiebuntu:~/src/95$ curl -s -I
http://www.apache.org/dist/commons/beanutils/RELEASE-NOTES.txt | grep
Content-Type
Content-Type: text/plain; charset=utf-8



Views..?

-- 
Stian Soiland-Reyes
http://orcid.org/0000-0001-9842-9718

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to