On May 5, 11:13 am, "Nick Johnson (Google)" <nick.john...@google.com>
wrote:
>
> Google's servers returning a 400 for URLs containing an unescaped |
> character is not a bug in the server - they're correctly obeying the
> relevant RFCs.


By this definition, you are claiming that the main google.com search
server is wrong, because it does not refuse to accept raw pipe
characters.

Certainly, one set of Google servers are wrong, because there are two
different behaviours.

Here's a schizophrenic Google server for you, this very site:

http://groups.google.com/group/google-appengine/browse_thread/thread/4ced9847935e7b34

Type:

  this|that

into the search box in the upper right hand corner, and click "Search
this group".  It panics and redirects you to the main page at
http://groups.google.com.  You search query is lost.

Try it again, but this time click on "Search groups".  This works --
results are shown. The pipe was not encoded by my Firefox 3..0.x
browser.


> The RFC is clear as to what characters are acceptable in the query
> string part of a HTTP URL.  | is not one of these characters,


It is clear, but it also gives reasons why:

   Unsafe:

   Characters can be unsafe for a number of reasons.  The space
   character is unsafe because significant spaces may disappear and
   insignificant spaces may be introduced when URLs are transcribed or
   typeset or subjected to the treatment of word-processing programs.
   The characters "<" and ">" are unsafe because they are used as the
   delimiters around URLs in free text; the quote mark (""") is used
to
   delimit URLs in some systems.  The character "#" is unsafe and
should
   always be encoded because it is used in World Wide Web and in other
   systems to delimit a URL from a fragment/anchor identifier that
might
   follow it.  The character "%" is unsafe because it is used for
   encodings of other characters.  Other characters are unsafe because
   gateways and other transport agents are known to sometimes modify
   such characters.


NONE of this applies once the URL has reached the web server.  Why
insist that the browser send you a %7c instead of | when you're just
going to immediately decode it to | anyway?  google.com doesn't.

The google groups server is borked whatever you do. Here's the search
with an explicit %7c:

http://groups.google.com/group/google-appengine/search?group=google-appengine&q=this%7cthat&qt_g=Search+this+group


>  so if your browser is sending one, it's an issue either with your browser or
> with your URL encoding.


When millions of people browse with Firefox and Google Chrome, which
do not encode '|' when an ordinary user types it into a search box,
and your server software fails inelegantly to handle it, the fault is
with the server software.

That's why the main google.com software gets it right, and the
groups.google.com software, among other Google services, gets it
wrong...

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to