Re: Why can't ap_send_error_response() count on charset?

2002-08-14 Thread Carlo Perassi

On Tue, Aug 13, 2002 at 12:52:25PM -0400, Greg Ames sent those random bytes:
> in the html.  I am curious to hear what the W3C Validator people say.

Well, my message to W3C generated a thread of ten emails.
This is a short report of their toughts.
1 - There is no need to specify a meta charset in HTML documents if the
charset is given in the Content-Type header.

But there may be an additional complication: Some 404s may be in
other encodings than iso-8859-1. In that case, the header would
be wrong. As long as this is just for the built-in 'last resort'
error message that doesn't change, it's okay. But in case it's
tagged onto any arbitrary error message, it's a problem.
(So with Greg's fix Apache should be fine - Carlo)

BTW, a related problem is the directive 'AddDefaultCharset'.
This adds a 'charset' parameter to *every* Content-Type that
doesn't already have one. This means that if you have some
gifs, they get served as Content-Type: image/gif; charset=foo.
This is of quite useless.

(About the AddDefaultCharset problem noted by Duerst)
The Apache documentation implies that, but it isn't actually the case in
my testing with Apache 1.3.26.  The charset parameter only seems to be
added for text/html and text/plain.  It's not added for image/* or
text/vnd.wap.wml.

2 - About the default HTML code provided for a 404:
(Apache developers) should change  to .   is for
XHTML/XML only, but they've specified HTML 2.0.

3 - Some of the W3C people thinks having an option 'validate error messages' in
the validator form is a good idea, because they want to be able to validate
all html.

-- 
Carlo Perassi - http://www.linux.it/~carlo/
Do only what only you can do (Edsger Wybe Dijkstra: 1930-2002)



Re: Why can't ap_send_error_response() count on charset?

2002-08-13 Thread Carlo Perassi

On Tue, Aug 13, 2002 at 11:06:57AM -0400, Greg Ames sent those random bytes:
> Can you try it again with current cvs HEAD?  I'm not familiar with the W3C
> Validator test, but I would hope that if it saw a good http Content-Type header,
> it wouldn't need the stuff in the html meta line.

Me too but I found a problem/feature due to the validator so I just wrote the
following email to the w3c validator team:

/*

Hi all
the default "404 Not Found" page generated by the latest version of Apache HTTP
Server (and the similar pages) doesn't pass the W3C Validator test
(
it's a HTML 2.0 code shipped without a meta tag with charset value: try this
foo page to see it:
http://www.apache.org/doesntexist.html
)

As I explain to the Apache developers
(
see
http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=102918549709592&w=2
and
http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=102925143132691&w=2
)
it's trivial to change the Apache C code to generate W3C pages but they have
technical reasons which don't permit to define a meta tag with charset
definition... so some minutes ago, on the Apache CVS tree it's appeared a fix
for a header problem, and as Greg Ames <[EMAIL PROTECTED]> said
"I would hope that if (the Validator) saw a good http Content-Type header,
it wouldn't need the stuff in the html meta line."

Before trying the new Apache CVS code... I found a "problem": when your
Validator found a "404" on the response header of the server, it doesn't
parse the HTML provided anymore.

see this session and, trust me, the validator doesn't parse the code below:

#
# BEGIN
#

carlo@voyager:~$ telnet www.apache.org 80
Trying 63.251.56.142...
Connected to daedalus.apache.org.
Escape character is '^]'.
GET http://www.apache.org/doesntexist.html HTTP/1.0

HTTP/1.1 404 Not Found
Date: Tue, 13 Aug 2002 15:41:38 GMT
Server: Apache/2.0.40 (Unix)
Content-Length: 287
Connection: close
Content-Type: text/html; charset=iso-8859-1



404 Not Found

Not Found
The requested URL /doesntexist.html was not found on this server.

Apache/2.0.40 Server at www.apache.org Port 80

Connection closed by foreign host.

#
# END
#

My question is: why don't you drive the Validator to parse the html code, even
when the return code is different from 200?
If you do like this, Apache team will be able to check if the fix on the code
which produces the header of the response is enough to pass the test.

Thank you.

*/

So I (we) should wait their answer.
Thanks.

-- 
Carlo Perassi - http://www.linux.it/~carlo/
Do only what only you can do (Edsger Wybe Dijkstra: 1930-2002)



Why can't ap_send_error_response() count on charset?

2002-08-12 Thread Carlo Perassi

Hi all.
In modules/http/http_protocol.c
the comment say
ap_send_error_response is used for any response that can be generated by the
server from the request record. This includes all [snip] messages that have
not been redirected to another handler via the ErrorDocument feature.
On line 2331 I read:
/* can't count on a charset filter being in place here,
 * so do ebcdic->ascii translation explicitly (if needed)
 */

It's trivial to add on line 2336 to ap_rvputs_proto_in_ascii() a string like

or so... but the comment about say "can't count on a charset".

Anyway... with the actual code, the html generated by ap_send_error_response
can't pass the W3C Validator test (with the missing meta line it would be ok).

I'd like to see the html generated by ap_send_error_response to pass the W3C
Validator test in the default configuration (say without using external html
files for 404 and so on).

The patch is trivial but I don't understand why (we) "can't count on a charset
filter being in place here".

Thank you.

-- 
Carlo Perassi - http://www.linux.it/~carlo/
Do only what only you can do (Edsger Wybe Dijkstra: 1930-2002)