L A Walsh <w...@tlinx.org> writes: > Dale R. Worley wrote: >> But of course, no [RFC3986-conforming] URL >> contains an embedded space because that's what it >> says in RFC 3986, which is "what *defines* what a >> URL *is*"[sic; should read "is one definition of > a URL. > --- > Right, just like speed limit signs define > what the maximum speed is. > > There is the "model" and there is reality. To believe that > the model replaces and/or dictates reality is not > realistic and bordering on some mental pathology. > > I understand what you are saying Dale. My dad was a lawyer, > and life would be so much easier if specs, RFCs or other > models of reality were the only thing we had to pay attention > to. But... to do so generally creates various levels of > discomfort and/or headaches.
There's a reason why the Internet has advanced on the back of thousands of anal-retentive standards documents. There really are situations where DWIM (Do What I Mean) design makes life worse. It's plausible that in a web browser it's reasonable to allow users to type in purported URLs that are invalid, and for the browser to make its best guess as to what the user meant. This is because getting the guess wrong rarely causes troubles beyond showing the user a page that they aren't interested in; the user can just retype the right URL and get what they wanted. But every such slackness introduces uncertainty. If the user types "http://www.example.com/ " (that is, with a trailing space), should it be handled as "http://www.example.com/%20" (assuming the user wanted to access a file whose name is a single space, and providing the URL that does that) or "http://www.example.com/" (assuming that the space is a cut-and-paste error and should be ignored). As long as this is being directly monitored by the user, this works reasonably well. But once the DWIM program starts being used as a *part* of a system, things get hazardous. People start building other parts of the system assuming that the DWIM program doesn't hold them to the rules. And since the DWIM program's behavior in those outside-the-box cases isn't clearly defined, there's no protection from the situation where its guesses change, but the rest of the system depends on *particular* guesses that it used to make. In the particular case of wget, consider that portions of the URL that the user enters are extracted and used in the HTTP request. Again, there's a strict specification of what constitutes a valid HTTP request. If the user includes an invalid character in the URL, should wget simply pass it through into the HTTP request, assuming that a well-built web browser will Do What the User (probably) Meant? And it should be remembered that there's a design principle of Unix that's rarely mentioned: People write a lot of shell scripts for Unix, and the external interface of Unix commands is optimized for use within shell scripts, not for being directly executed by users. That's why most of them provide no output whatever if their execution is successful, and why most of them that do generate output provide no "headers" -- that would get in the way of handing the output to another program as input. I've even seen an exercise in a Unix training book asking the student to explain why the single header line in the output of the "ps" command is undesirable. Within that context, the point of wget is to fetch the contents of a URL that is provided by something else that *should* know what a properly formed URL is. Dale