I'll implement optional (and not default) support of IDN in filter_var().

Does anyone known if it's better to use libIDN (LGPL) or ICU (custom
license deviated from the X license) from a license point of view?

2014-09-19 16:18 GMT+02:00 Chris Wright <c...@daverandom.com>:

> On 19 September 2014 14:48, Kévin Dunglas <dung...@gmail.com> wrote:
> > Support of IDN in streams is a must have.
> > But there is a lot of other use cases for URL with IDN validation. The
> most
> > common is probably form validation (test if an user submitted URL has a
> > valid format and can be used to create an HTML link...).
> >
> > I'm ok making IDN validation optional and not used by default until PHP
> > natively support IDN in other features such as streams.
> > But IDN are used more and more in the wild, and from a user point of
> view it
> > is disappointing that a valid URL, working in browsers and even
> displayed by
> > Google Search is not considered as a valid URL by a PHP-based website
> using
> > filter_var() without a specific flag.
> >
> > Even some TLD are using non-ASCII characters, exemple: http://旅游气象.中国
> <http://xn--zfv73l7xbp87c.xn--fiqs8s>
> > (popular Chinese weather site).
> >
> > About the library, I've not preference between libidn and icu. If the
> > licence is libidn fit better with the PHP one, libidn is probably the
> better
> > choice. Having a PHP specific implementation of STRINGPREP and Punnycode
> > sounds not like a good idea (reinventing the wheel, more code to
> maintain).
> >
> > Chris, is there a chance to have your work on streams merged in PHP 7?
>
> It's very hacky and PoC at the moment. I've got a bunch of
> time-consuming personal things going on right now, but within the next
> couple of weeks I will try and polish it up into something
> serviceable, maintainable and tested/less likely to explode with
> edge-cases and then I'll put it up for discussion.
>
> I'm also fine if someone else wants to have a crack in the meantime, I
> can push my work so far to github early next week when I get access to
> the machine.
>
> I'd certainly like the functionality to be in 7 if it's viable from a
> licensing and dependency PoV - I had been holding off bringing it up
> to see what happened with the more general unicode support discussion
> (which I somewhat lost track of and seems to have died out) as there
> was talk of introducing a hard dependency on ICU-or-similar at one
> point, which would have made this a no-brainer.
>
> > What do you thing about the following planning:
> > - 5.7 (if exists): add IDN support in filter disabled by default. Use
> libidn
> > if selected to be used for streams too.
> > - 7 (if IDN support for streams is completed): validate IDN by default
> (what
> > the user expect), add a flag to disable IDN validation. Of course we'll
> > update the doc explaining the new behavior.
> >
> > 2014-09-19 12:28 GMT+02:00 Chris Wright <c...@daverandom.com>:
> >>
> >> On 19 September 2014 10:58, Pierre Joye <pierre....@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > On Sep 19, 2014 4:03 PM, "Chris Wright" <c...@daverandom.com> wrote:
> >> >>
> >> >> Kévin
> >> >>
> >> >> On 18 September 2014 21:26, Kévin Dunglas <dung...@gmail.com> wrote:
> >> >> > Hello,
> >> >> >
> >> >> > I'm working on enhancing the FILTER_VALIDATE_URL filter (
> >> >> > https://github.com/php/php-src/pull/826).
> >> >> > The current implementation does not support validation of
> >> >> > internationalized
> >> >> > domain names (i.e: http://www.académie-française.fr/
> <http://www.xn--acadmie-franaise-npb1a.fr/>
> >> >> > <http://www.xn--acadmie-franaise-npb1a.fr/>).
> >> >> >
> >> >> > Support of IDN validation can be easily added using ICU's
> >> >> > uidna_toASCII()
> >> >> > function.
> >> >> >
> >> >> > Is it acceptable to add a dependency to ICU for ext/filter?
> >> >> > Another option is to add a HAVE_ICU constant in main/php_config.h
> and
> >> >> > to
> >> >> > validate IDN only if ICU is present.
> >> >> >
> >> >> > What strategy is preferred?
> >> >>
> >> >> I've done some work around this area previously, and all I will say
> >> >> is: be careful with what you do with this from a userland PoV.
> >> >>
> >> >> PHP does not natively support IDN in stream open routines or SSL
> >> >> verification routines. It will never support these things without at
> >> >> least one of:
> >> >> - a core dependency on ICU, libidn or similar
> >> >> - moving streams into an extension so a dependency can be introduced
> >> >> there (probably not sanely possible)
> >> >> - an in-house NAMEPREP implementation (this is the hard part of IDN,
> >> >> punycode itself is pretty trivial to implement once you have a
> >> >> canonical set of codepoints)
> >> >>
> >> >> These things can be implemented with *a lot* of boilerplate in
> >> >> userland when you have ext/intl, but it's not pretty. libcurl *can*
> >> >> support IDN if it was built against libidn, I'm not sure if this is
> >> >> currently the case in common distributions or not. Since one almost
> >> >> never just validates a URL string, it's usually a precursor to
> >> >> attempting to open it, this could lead to some pretty hefty wtfs.
> >> >>
> >> >> In short, while I'm generally for ext/filter being able to handle
> IDN,
> >> >> I *do not* believe it should do it implicitly, it should require an
> >> >> explicit flag, because it will break *a lot* of code if IDN is
> >> >> suddenly treated as valid where it previously wasn't.
> >> >
> >> > I am really not sure about that especially the enabling by default
> part.
> >> >
> >> > The doc is pretty clear about what this filter supports and allowing
> idn
> >> > may
> >> > break a lot of codes out there.
> >> >
> >> > From an implementation point of view we may not need ICU to support
> IDN.
> >> > Windows does not use it and there are license friendly decoder
> >> > implementations too.
> >>
> >> If we can agree on adding a core dependency on <some IDN support lib>,
> >> I already have an experimental local branch that adds full IDN support
> >> to streams. It's based on libidn but it would be easy enough to swap
> >> it out for something else that provides the same functionality.
> >>
> >> In my (biased) opinion, streams are a far more important element of
> >> IDN support. Filter validation is just polish/a nicety on top.
> >
> >
> >
> >
> > --
> > Kévin Dunglas
> >
> > http://dunglas.fr
>



-- 
Kévin Dunglas
Consultant et développeur freelance

http://dunglas.fr
Tél. : 06 60 91 20 20

Reply via email to