On 27 Mar 2014 20:54:57 +0100, Chris Young wrote: > OK, second attempt at international domain name support. > > Branch: chris/idna2008 > > I've had to import some unrestricted code from elsewhere, due to the > necessity of Unicode normalisation and other things. It is working > and conforming to the spec, as far as I read it. > > A couple of minor issues/todos: > 1. If an invalid URL is encountered during page layout/box conversion, > NetSurf gives a BoxConvert warning and the page is never displayed. > This is caused by my new code making nsurl_create return > NSERROR_BAD_URL when an IDN fails the compliance checks. > I've not been able to work out where in the core this error code is > terminating page layout. > Page showing this problem: > http://blogs.msdn.com/b/shawnste/archive/2006/09/14/idn-test-urls.aspx > > 2. If a frontend wants to display the UTF-8 version of an IDN then > currently the URL needs stripping into component parts, the host run > through idna_decode() and the whole thing put back together again. > This should probably be handled by nsurl but I'm not sure of the best > way to implement it. > > 3. There are some to-dos noted in code comments for further compliance > checking. They are optional in the spec, and I don't see any need to > implement them - anything invalid will be rejected by DNS. Most of > the mandatory checks seem overkill anyway, given that there is > stricter checking at DNS registration time. > I have included the optional decode-reencode check for already encoded > addresses to weed out any undecodeable nonsense the user might have > typed in, but it doesn't bother to do normalisation or validity > checking of the decoded address before re-encoding it (maybe it > should, I'm not sure, the spec was vague on this point).
Is there any interest in reviewing/merging this now 3.1 is out of the way? I'm thinking point 2 above might also tie in with Vince's proposed changes to extract escaped path elements from an nsurl, as it is a similar challenge. Chris
