Hello Paul, At 15:29 02/02/11 -0800, Paul Hoffman / IMC wrote: >I am replying to all minus the IDN mailing list. I have added Patrik to >the Cc list.
I have added the IDN mailing list back in. Please keep it. I have also included the other relevant mailing lists in the cc, to bring things to a quick conclusion. >At 8:45 PM +0900 2/11/02, Martin Duerst wrote: >>Currently, neither draft-ietf-idn-nameprep-07.txt nor >>draft-hoffman-stringprep-00.txt deal with bidirectionality >>(mixing right-to-left (Arabic/Hebrew) and left-to-right >>writing directions) issues. This should be changed as soon >>as possible. > >You didn't say why this should be one as part of nameprep or stringprep. >What about bidi makes the issue a preparation issue, particularly for >prohibition? See below. >>If a label can contain both right-to-left and left-to-right >>characters, how it will be displayed, and how displayed >>labels will be entered and looked up in the DNS, is highly >>context-dependent. This is obviously very undesirable. > >This is a display issue; it has absolutely nothing to do with how names >are entered or looked up in the DNS. It is not a display issue, it is an issue of conversion between display and backing store. Assume there were two labels inside the DNS, one reading ABCdef and the other reading defABC, and both would be displayed CBAdef. Who would consider that usable for the DNS? With the current nameprep, this and similar situations will happen if e.g. upper-case is in a right-to-left script, and lower case in a left-to-right script. Many similar issues have been discussed on this list. But please note that it is not an issue of different characters in Unicode looking the same or similar. It's exactly the same characters! Also, please note that while for similar issues discussed on this list, the problem was that there was not really a workable solution, this is not the case here. Two main solutions have been proposed. >>The following is a proposal written up by Mark Davis, >>based on input from others: > >This has barely been discussed in the BIDI community; there has been >almost no review of it. Further, there was disagreement on it when Mark >presented it. There is wide agreement among all the BIDI experts that have considered this problem that a restriction of the combination of allowed characters in each label is unavoidable independent of any other aspects of the solution. And you are right that there were two different main solutions, and nobody was really sure which one to chose. But there was wide agreement that a solution is needed. The two main solutions differ in how they handle sequences of labels. The solution described in draft-duerst-iri-bidi-00.txt tries to make sure that the sequence of labels always goes the same way, left to right. So something that is logically FTP.HEBREW.COM will show up as PTF.WERBEH.MOC. In order for this to happen, special characters called LRM have to be inserted around the '.'. It would be no problem check for their presence and strip them in nameprep, but it is difficult to have them inserted when typing a domain name in or when getting a domain name back from a protocol. The other solution, with which Mati Allouche has come up after reading my draft in detail, is to give the sequences of labels their 'natural' order. The above example would turn into MOC.WERBEH.PTF. This may look very strange, but is actually quite natural for native Arabic or Hebrew users, because that's the way they read text. On various occasions that I have seen examples of Arabic domain names, they were always displayed that way. It leads to a few strange effects, such as an inversion of components of different nature in URIs (example: http://ftp.HEBREW.COM/PATH/file.html turns into http://ftp.HTAP/WERBEH.MOC/file.html), but these can be read naturally as well. The main advantage is that it requires less intervention/magic for input and for taking domain names from a protocol and putting them into a textual context. This is a big advantage. Incidentally, it's also how at least some OS handles directory paths. The only case I was able to confirm is (Japanese) Windows 2000, where a folder ABCD (shows as DCBA) with a folder EFGH (shows as HGFE) inside is shown as follows in the top bar of the explorer: D:\ D:\temp D:\temp\DCBA D:\temp\HGFE\DCBA Looks really weird when you see it the first time, but bidirectional writing comes with quite a few surprises. Mark's proposal (just following) worked out the details of the restrictions necessary for individual labels in order to work with Mati's proposal. It's the best I know, and it's an enormous improvement over just doing nothing at all and regretting it later. >>B. In any field that contains any RTL characters: >>B0. no LTR characters can occur. >>C1. a sequence of characters of type DIG can only occur at the end. >>C2. a sequence of characters of type OTHER can occur only between >>characters of type RTL. >> >>I propose that this be added as an additional step after the current >>'prohibition' step. > >Why would this be considered part of nameprep? That is, why are you >prohibiting the strings on the user side instead of on the server side? As far as I understand, nameprep is for both sides. There is a single difference, namely the treatment of unassigned codepoints, which go through on the client side, but are checked on the server side (at the time of registration). Except for this difference, both sides are the same. The reason for this, as far as I understand, is to make double sure that no illegal things get into the DNS: If illegal names won't be reachable from the client, there is no incentive for the registrars to cheat. Also, if this is handled as a 'policy' issue, different registrations may choose to use different restriction policies (if they become aware of the problem at all). This will not work because it is then impossible to provide appropriate support on the client side (browsers, mailers,...). >This is the first time that the proposal has been seen by the WG. >There is no Internet Draft There is http://www.ietf.org/internet-drafts/draft-duerst-iri-bidi-00.txt, for the more general problem of bidi in URIs. The WG has been told about it at http://www.imc.org/idn/mail-archive/msg03271.html. And there is no need for a separate Internet Draft for DNS. It should go into nameprep. >and no examples of its use. What kinds of examples are you looking for? More examples to explain the issues? A list of examples that could be used for testing? Examples for the draft itself? How many other examples are there in the nameprep draft? Regards, Martin.
