[Bug 1258546] Re: Apache2 defaults to the wrong character set, it should be UTF-8
[Expired for apache2 (Ubuntu) because there has been no activity for 60 days.] ** Changed in: apache2 (Ubuntu) Status: Incomplete = Expired -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to apache2 in Ubuntu. https://bugs.launchpad.net/bugs/1258546 Title: Apache2 defaults to the wrong character set, it should be UTF-8 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/1258546/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1258546] Re: Apache2 defaults to the wrong character set, it should be UTF-8
Thank you for taking the time to report this bug and helping to make Ubuntu better. I have checked both Precise and Trusty, and can find no windows-1252 default that you refer to. I used wget -S to see the headers returned by the Apache server, and it did not specify a character set. Could you please provide exact steps to reproduce this bug on a freshly installed Ubuntu Server system, and detail what you are expecting and what happens on your system instead? Please use wget -S or similar to demonstrate that the problem is actually with Apache, and not with a client, or the web page source or similar. Without a test case, there isn't enough information here for a developer to confirm this issue is a bug, or to begin working on it, so I am marking this bug Incomplete for now. If you can provide exact steps so that a developer can reproduce the original problem, then please add them to this bug and change the status back to New. ** Changed in: apache2 (Ubuntu) Status: New = Incomplete -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to apache2 in Ubuntu. https://bugs.launchpad.net/bugs/1258546 Title: Apache2 defaults to the wrong character set, it should be UTF-8 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/1258546/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1258546] Re: Apache2 defaults to the wrong character set, it should be UTF-8
I can do the Server system, too, but right now the steps I have followed to get the problem are: 1. install Ubuntu 12.04 desktop, or Lubuntu 14.04devel desktop (it occurs on both) 2. install Apache2, leaving default configuration settings 3. load an html page from the server in a browser (in 12.04 or 14.04devel) 4. check page info regarding Encoding Adding AddDefaultCharset utf-8 to the configuration file makes the problem go away. But could this be a problem with the browser anyway? $ wget -S http://xx.yy.zz.aa --2013-12-09 14:38:34-- http://xx.yy.zz.aa/ Connecting to xx.yy.zz.aa:80... connected. HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Mon, 09 Dec 2013 12:38:34 GMT Server: Apache/2.2.22 (Ubuntu) Last-Modified: Sat, 07 Dec 2013 14:39:28 GMT ETag: 222742-b1-4ecf2bae66f2c Accept-Ranges: bytes Content-Length: 177 Vary: Accept-Encoding Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: text/html Length: 177 [text/html] $ wget -S http://xx.yy.zz.bb --2013-12-09 14:39:46-- http://xx.yy.zz.bb/ Connecting to xx.yy.zz.bb:80... connected. HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Mon, 09 Dec 2013 12:39:46 GMT Server: Apache/2.4.6 (Ubuntu) Last-Modified: Mon, 25 Nov 2013 16:12:19 GMT ETag: b1-4ec02a0e06c9c Accept-Ranges: bytes Content-Length: 177 Vary: Accept-Encoding Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: text/html Length: 177 [text/html] -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to apache2 in Ubuntu. https://bugs.launchpad.net/bugs/1258546 Title: Apache2 defaults to the wrong character set, it should be UTF-8 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/1258546/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1258546] Re: Apache2 defaults to the wrong character set, it should be UTF-8
The one browser is Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:25.0) Gecko/20100101 Firefox/25.0 HTTP_ACCEPT Headers : text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 gzip, deflate en,en-us;q=0.7,sv;q=0.3 The other is: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:25.0) Gecko/20100101 Firefox/25.0 HTTP_ACCEPT Headers : text/html, */* gzip, deflate en-US,en;q=0.5 -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to apache2 in Ubuntu. https://bugs.launchpad.net/bugs/1258546 Title: Apache2 defaults to the wrong character set, it should be UTF-8 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/1258546/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1258546] Re: Apache2 defaults to the wrong character set, it should be UTF-8
I've done a fresh installation from the ubuntu-12.04.3-server-i386.iso image and installed Apache2. The Firefox web browser still shows that the pages being served are encoded in windows-1252 instead of UTF-8, which is what the locale is set to, or ISO-8859 which would be the old standard. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to apache2 in Ubuntu. https://bugs.launchpad.net/bugs/1258546 Title: Apache2 defaults to the wrong character set, it should be UTF-8 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/1258546/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1258546] Re: Apache2 defaults to the wrong character set, it should be UTF-8
I believe browsers typically try to guess. If Apache serves a page that doesn't have any non-ASCII characters in it, then browsers can guess, and windows-1252 would still be correct, since the document was a strict subset of this charset. What happens if you serve a UTF-8 encoded file? What does the browser do then? If you want Apache to assume that everything in /var/www is UTF-8 by default, and explicitly set that in every response, then I can understand such a request, but I think it needs to be coordinated with the Debian packaging, perhaps also including upstream's view on a suitable default. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to apache2 in Ubuntu. https://bugs.launchpad.net/bugs/1258546 Title: Apache2 defaults to the wrong character set, it should be UTF-8 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/1258546/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1258546] Re: Apache2 defaults to the wrong character set, it should be UTF-8
If I serve a UTF-8 encoded file *AND* set the default myself in Apache, then everything is fine. If the default encoding is left alone, Apache serves it up as windows-1252 and then UTF-8 encoded letters come out as garbage like this: åäöÅÄÖéÉ As seen from the browser HTTP_ACCEPT Headers, it seems to be the web server making the choice. Apache has a defaut encoding. It should be a standard, UTF-8 or ISO-8859, having non-standard windows-1252 in the default configuration just makes a mess. It's easy to fix by AddDefaultCharset to the configuration. However, it would be great if Apache worked with non- English languages out of the box, especially when the locale is set so. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to apache2 in Ubuntu. https://bugs.launchpad.net/bugs/1258546 Title: Apache2 defaults to the wrong character set, it should be UTF-8 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/1258546/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1258546] Re: Apache2 defaults to the wrong character set, it should be UTF-8
If the default encoding is left alone, Apache serves it up as windows-1252 and then UTF-8 encoded letters come out as garbage like this: åäöÅÄÖéÉ I do not see this behaviour: root@trusty:/var/www# xxd test.txt 000: 5363 6872 c3b6 6469 6e67 6572 2773 2043 Schr..dinger's C 010: 6174 0a at. root@trusty:/var/www# wget -S -O/dev/null http://localhost/test.txt --2013-12-09 15:26:28-- http://localhost/test.txt Resolving localhost (localhost)... 127.0.0.1 Connecting to localhost (localhost)|127.0.0.1|:80... connected. HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Mon, 09 Dec 2013 15:26:28 GMT Server: Apache/2.4.6 (Ubuntu) Last-Modified: Mon, 09 Dec 2013 12:19:37 GMT ETag: 13-4ed1902654840 Accept-Ranges: bytes Content-Length: 19 Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: text/plain Length: 19 [text/plain] Saving to: ‘/dev/null’ 100%[=] 19 --.-K/s in 0s 2013-12-09 15:26:28 (1.52 MB/s) - ‘/dev/null’ saved [19/19] root@trusty:/var/www# Here, Apache is just not setting an encoding. It never claims windows-1252. Apache has a defaut encoding. As you can see from the headers, this does not appear to be true. I can understand that perhaps it does in other circumstances that I haven't been able to test. If this is true, please can you provide steps to reproduce? It's easy to fix by AddDefaultCharset to the configuration. However, it would be great if Apache worked with non-English languages out of the box, especially when the locale is set so. I appreciate that there is a case to perhaps provide a default AddDefaultCharset that matches the system locale, but unfortunately it's not simple since the system locale may not match the encoding of the files you expect to serve from /var/www. This is a tricky issue, and one I think would be better addressed in Debian or upstream than for Ubuntu to diverge from Debian and upstream on this. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to apache2 in Ubuntu. https://bugs.launchpad.net/bugs/1258546 Title: Apache2 defaults to the wrong character set, it should be UTF-8 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/1258546/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1258546] Re: Apache2 defaults to the wrong character set, it should be UTF-8
If wget is not seeing the wrong encoding then it may be a problem with Firefox instead. However, the steps to reproduce are 1. install Ubuntu 12.04 desktop, or Lubuntu 14.04devel desktop (it occurs on both) 2. install Apache2, leaving default configuration settings 3. load an html page from the server in Firefox (in 12.04 or 14.04devel) 4. check page info regarding Encoding with ctrl-i -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to apache2 in Ubuntu. https://bugs.launchpad.net/bugs/1258546 Title: Apache2 defaults to the wrong character set, it should be UTF-8 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/1258546/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1258546] Re: Apache2 defaults to the wrong character set, it should be UTF-8
Sorry, your test case involving Firefox isn't sufficient to determine validity of a bug in Apache. What is Apache actually sending to Firefox in your case? -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to apache2 in Ubuntu. https://bugs.launchpad.net/bugs/1258546 Title: Apache2 defaults to the wrong character set, it should be UTF-8 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/1258546/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1258546] Re: Apache2 defaults to the wrong character set, it should be UTF-8
It looks like the problem is Firefox then. If no default is set, then it sends wget 'Content-Type: text/html'. If the default is set to utf-8, then it sends wget 'Content-Type: text/html; charset=utf-8' -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to apache2 in Ubuntu. https://bugs.launchpad.net/bugs/1258546 Title: Apache2 defaults to the wrong character set, it should be UTF-8 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/1258546/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs