On 19 September 2014 14:48, Kévin Dunglas <dung...@gmail.com> wrote:
> Support of IDN in streams is a must have.
> But there is a lot of other use cases for URL with IDN validation. The most
> common is probably form validation (test if an user submitted URL has a
> valid format and can be used to create an HTML link...).
>
> I'm ok making IDN validation optional and not used by default until PHP
> natively support IDN in other features such as streams.
> But IDN are used more and more in the wild, and from a user point of view it
> is disappointing that a valid URL, working in browsers and even displayed by
> Google Search is not considered as a valid URL by a PHP-based website using
> filter_var() without a specific flag.
>
> Even some TLD are using non-ASCII characters, exemple: http://旅游气象.中国
> (popular Chinese weather site).
>
> About the library, I've not preference between libidn and icu. If the
> licence is libidn fit better with the PHP one, libidn is probably the better
> choice. Having a PHP specific implementation of STRINGPREP and Punnycode
> sounds not like a good idea (reinventing the wheel, more code to maintain).
>
> Chris, is there a chance to have your work on streams merged in PHP 7?

It's very hacky and PoC at the moment. I've got a bunch of
time-consuming personal things going on right now, but within the next
couple of weeks I will try and polish it up into something
serviceable, maintainable and tested/less likely to explode with
edge-cases and then I'll put it up for discussion.

I'm also fine if someone else wants to have a crack in the meantime, I
can push my work so far to github early next week when I get access to
the machine.

I'd certainly like the functionality to be in 7 if it's viable from a
licensing and dependency PoV - I had been holding off bringing it up
to see what happened with the more general unicode support discussion
(which I somewhat lost track of and seems to have died out) as there
was talk of introducing a hard dependency on ICU-or-similar at one
point, which would have made this a no-brainer.

> What do you thing about the following planning:
> - 5.7 (if exists): add IDN support in filter disabled by default. Use libidn
> if selected to be used for streams too.
> - 7 (if IDN support for streams is completed): validate IDN by default (what
> the user expect), add a flag to disable IDN validation. Of course we'll
> update the doc explaining the new behavior.
>
> 2014-09-19 12:28 GMT+02:00 Chris Wright <c...@daverandom.com>:
>>
>> On 19 September 2014 10:58, Pierre Joye <pierre....@gmail.com> wrote:
>> > Hi,
>> >
>> > On Sep 19, 2014 4:03 PM, "Chris Wright" <c...@daverandom.com> wrote:
>> >>
>> >> Kévin
>> >>
>> >> On 18 September 2014 21:26, Kévin Dunglas <dung...@gmail.com> wrote:
>> >> > Hello,
>> >> >
>> >> > I'm working on enhancing the FILTER_VALIDATE_URL filter (
>> >> > https://github.com/php/php-src/pull/826).
>> >> > The current implementation does not support validation of
>> >> > internationalized
>> >> > domain names (i.e: http://www.académie-française.fr/
>> >> > <http://www.xn--acadmie-franaise-npb1a.fr/>).
>> >> >
>> >> > Support of IDN validation can be easily added using ICU's
>> >> > uidna_toASCII()
>> >> > function.
>> >> >
>> >> > Is it acceptable to add a dependency to ICU for ext/filter?
>> >> > Another option is to add a HAVE_ICU constant in main/php_config.h and
>> >> > to
>> >> > validate IDN only if ICU is present.
>> >> >
>> >> > What strategy is preferred?
>> >>
>> >> I've done some work around this area previously, and all I will say
>> >> is: be careful with what you do with this from a userland PoV.
>> >>
>> >> PHP does not natively support IDN in stream open routines or SSL
>> >> verification routines. It will never support these things without at
>> >> least one of:
>> >> - a core dependency on ICU, libidn or similar
>> >> - moving streams into an extension so a dependency can be introduced
>> >> there (probably not sanely possible)
>> >> - an in-house NAMEPREP implementation (this is the hard part of IDN,
>> >> punycode itself is pretty trivial to implement once you have a
>> >> canonical set of codepoints)
>> >>
>> >> These things can be implemented with *a lot* of boilerplate in
>> >> userland when you have ext/intl, but it's not pretty. libcurl *can*
>> >> support IDN if it was built against libidn, I'm not sure if this is
>> >> currently the case in common distributions or not. Since one almost
>> >> never just validates a URL string, it's usually a precursor to
>> >> attempting to open it, this could lead to some pretty hefty wtfs.
>> >>
>> >> In short, while I'm generally for ext/filter being able to handle IDN,
>> >> I *do not* believe it should do it implicitly, it should require an
>> >> explicit flag, because it will break *a lot* of code if IDN is
>> >> suddenly treated as valid where it previously wasn't.
>> >
>> > I am really not sure about that especially the enabling by default part.
>> >
>> > The doc is pretty clear about what this filter supports and allowing idn
>> > may
>> > break a lot of codes out there.
>> >
>> > From an implementation point of view we may not need ICU to support IDN.
>> > Windows does not use it and there are license friendly decoder
>> > implementations too.
>>
>> If we can agree on adding a core dependency on <some IDN support lib>,
>> I already have an experimental local branch that adds full IDN support
>> to streams. It's based on libidn but it would be easy enough to swap
>> it out for something else that provides the same functionality.
>>
>> In my (biased) opinion, streams are a far more important element of
>> IDN support. Filter validation is just polish/a nicety on top.
>
>
>
>
> --
> Kévin Dunglas
>
> http://dunglas.fr

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to