On 19 September 2014 14:48, Kévin Dunglas <dung...@gmail.com> wrote: > Support of IDN in streams is a must have. > But there is a lot of other use cases for URL with IDN validation. The most > common is probably form validation (test if an user submitted URL has a > valid format and can be used to create an HTML link...). > > I'm ok making IDN validation optional and not used by default until PHP > natively support IDN in other features such as streams. > But IDN are used more and more in the wild, and from a user point of view it > is disappointing that a valid URL, working in browsers and even displayed by > Google Search is not considered as a valid URL by a PHP-based website using > filter_var() without a specific flag. > > Even some TLD are using non-ASCII characters, exemple: http://旅游气象.中国 > (popular Chinese weather site). > > About the library, I've not preference between libidn and icu. If the > licence is libidn fit better with the PHP one, libidn is probably the better > choice. Having a PHP specific implementation of STRINGPREP and Punnycode > sounds not like a good idea (reinventing the wheel, more code to maintain). > > Chris, is there a chance to have your work on streams merged in PHP 7?
It's very hacky and PoC at the moment. I've got a bunch of time-consuming personal things going on right now, but within the next couple of weeks I will try and polish it up into something serviceable, maintainable and tested/less likely to explode with edge-cases and then I'll put it up for discussion. I'm also fine if someone else wants to have a crack in the meantime, I can push my work so far to github early next week when I get access to the machine. I'd certainly like the functionality to be in 7 if it's viable from a licensing and dependency PoV - I had been holding off bringing it up to see what happened with the more general unicode support discussion (which I somewhat lost track of and seems to have died out) as there was talk of introducing a hard dependency on ICU-or-similar at one point, which would have made this a no-brainer. > What do you thing about the following planning: > - 5.7 (if exists): add IDN support in filter disabled by default. Use libidn > if selected to be used for streams too. > - 7 (if IDN support for streams is completed): validate IDN by default (what > the user expect), add a flag to disable IDN validation. Of course we'll > update the doc explaining the new behavior. > > 2014-09-19 12:28 GMT+02:00 Chris Wright <c...@daverandom.com>: >> >> On 19 September 2014 10:58, Pierre Joye <pierre....@gmail.com> wrote: >> > Hi, >> > >> > On Sep 19, 2014 4:03 PM, "Chris Wright" <c...@daverandom.com> wrote: >> >> >> >> Kévin >> >> >> >> On 18 September 2014 21:26, Kévin Dunglas <dung...@gmail.com> wrote: >> >> > Hello, >> >> > >> >> > I'm working on enhancing the FILTER_VALIDATE_URL filter ( >> >> > https://github.com/php/php-src/pull/826). >> >> > The current implementation does not support validation of >> >> > internationalized >> >> > domain names (i.e: http://www.académie-française.fr/ >> >> > <http://www.xn--acadmie-franaise-npb1a.fr/>). >> >> > >> >> > Support of IDN validation can be easily added using ICU's >> >> > uidna_toASCII() >> >> > function. >> >> > >> >> > Is it acceptable to add a dependency to ICU for ext/filter? >> >> > Another option is to add a HAVE_ICU constant in main/php_config.h and >> >> > to >> >> > validate IDN only if ICU is present. >> >> > >> >> > What strategy is preferred? >> >> >> >> I've done some work around this area previously, and all I will say >> >> is: be careful with what you do with this from a userland PoV. >> >> >> >> PHP does not natively support IDN in stream open routines or SSL >> >> verification routines. It will never support these things without at >> >> least one of: >> >> - a core dependency on ICU, libidn or similar >> >> - moving streams into an extension so a dependency can be introduced >> >> there (probably not sanely possible) >> >> - an in-house NAMEPREP implementation (this is the hard part of IDN, >> >> punycode itself is pretty trivial to implement once you have a >> >> canonical set of codepoints) >> >> >> >> These things can be implemented with *a lot* of boilerplate in >> >> userland when you have ext/intl, but it's not pretty. libcurl *can* >> >> support IDN if it was built against libidn, I'm not sure if this is >> >> currently the case in common distributions or not. Since one almost >> >> never just validates a URL string, it's usually a precursor to >> >> attempting to open it, this could lead to some pretty hefty wtfs. >> >> >> >> In short, while I'm generally for ext/filter being able to handle IDN, >> >> I *do not* believe it should do it implicitly, it should require an >> >> explicit flag, because it will break *a lot* of code if IDN is >> >> suddenly treated as valid where it previously wasn't. >> > >> > I am really not sure about that especially the enabling by default part. >> > >> > The doc is pretty clear about what this filter supports and allowing idn >> > may >> > break a lot of codes out there. >> > >> > From an implementation point of view we may not need ICU to support IDN. >> > Windows does not use it and there are license friendly decoder >> > implementations too. >> >> If we can agree on adding a core dependency on <some IDN support lib>, >> I already have an experimental local branch that adds full IDN support >> to streams. It's based on libidn but it would be easy enough to swap >> it out for something else that provides the same functionality. >> >> In my (biased) opinion, streams are a far more important element of >> IDN support. Filter validation is just polish/a nicety on top. > > > > > -- > Kévin Dunglas > > http://dunglas.fr -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php