Adam M. Costello writes: > "D. J. Bernstein" <[EMAIL PROTECTED]> wrote: > > Yes or no: Should ``dig'' convert its results from your 7-bit > > encoding to the local character set? (Assume the LANG=en_US.UTF-8 > > locale, so that the conversion is at least theoretically doable.) > Yes.
That will cause interoperability failures. It will break other programs, even though those programs are fully compliant with today's standards. Mail will bounce! Costello persists in claiming that IDNA will allow IDNs to be deployed right now _without_ any failures other than incorrect displays. That claim is simply not true. Mail will bounce. Web links will fail. If IDNA's special-purpose 7-bit encoding is supported by _no_ networking programs, or (in a fully Unicode world) by _all_ networking programs, then mail won't bounce. In the intermediate stage, when the encoding is supported by _some_ networking programs, mail will bounce. > Any user clever enough to use dig and to use programs that > parse its output is probably also clever enough to prepend "env > LANG=en_US.US-ASCII" to the command line if necessary, Costello is weaseling. He is saying that mail won't bounce _if_ the program invoking dig (in this case, a shell script) is changed _before_ the dig program is changed. But the IDNA specification does not impose any such ordering. Costello wants the dig authors to change their code right now. More importantly, there _does not exist_ an ordering compatible with all of today's cross-program data transfers. Addresses are copied from mail programs to browsers; addresses are also copied from browsers to mail programs. > the man page does not specify the output format of dig Actually, the dig manual page clearly states that dig prints answers from name servers. There is only one printed DNS record format in BIND, namely BIND's zone-file format, which is documented in detail and meant to be machine-parsed. The format is very badly engineered---it requires far too much effort to parse completely---but that's a side issue. The point is that programs _do_ read it. In fact, the current dig man page explicitly mentions at one point that, to simplify machine parsing, dig avoids one feature of the format by default. Costello is simply wrong when he claims that the dig output is meant only for display and that programs shouldn't be looking at it. > Some programs are intended to be > redirected and have human-unfriendly machine-friendly output. Some are > intended to be viewed and have human-friendly machine-unfriendly output. > Some are somewhere in between, which motivates the question here. The line that Costello is attempting to draw is directly contrary to the UNIX philosophy. UNIX programs are _designed_ to be ``in between.'' This is explained in detail in Gancarz's book. Here, for example, is an excerpt from the ``Make Every Program A Filter'' section: When you assume that the receptacle of a program's data flow might be another program instead of a human being, you eliminate those biases we all have in trying to make an application user friendly. You stop thinking in terms of menu choices and start looking at the possible places your data may eventually wind up. Try not to focus inward on what your program can do. Look instead at where your program may go. You'll then begin to see the much larger picture of which your program is a part. These ideas are most fully developed in UNIX, but they can also be seen in other operating systems, in tools ranging from copy-and-paste to object-linking frameworks. Some of these tools have a side channel for character-set information, but most of them are designed for a world with a unified character encoding---ASCII yesterday, UTF-8 tomorrow. The bottom line is that the ``dig'' problem is shared by _thousands_ of UNIX programs. They deliberately provide output in a format that can be viewed by the user, or sent to another program, or both. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago
