Lester Caine wrote on 09/12/2014 16:00:
On 09/12/14 15:30, Rowan Collins wrote:
Lester Caine wrote on 09/12/2014 15:07:
On 09/12/14 14:07, Andrea Faulds wrote:
On 9 Dec 2014, at 13:35, Lester Caine <les...@lsces.co.uk> wrote:

On 09/12/14 13:07, Andrea Faulds wrote:

On 9 Dec 2014, at 08:15, Lester Caine <les...@lsces.co.uk> wrote:

If ICU is to be adopted as the base for unicode support, then surely
everything else should follow those rules?
\uhhhh and \Uhhhhhhhh are defined along with \x{hhhhhh} so does it
make
sense to add something which is not part of ICU?
Er, where does ICU define \uXXXX and \UXXXXXX? I don't unferstand.
http://userguide.icu-project.org/strings/regexp
We aren't using ICU regular expressions, and ICU is merely an
implementation detail anyway.
Has THAT been agreed on? Surely if using ICU fully in PHP7 in place of
the patchwork of current fixes for unicode then we don't want to be
breaking thing again by odd differences from the core code for unicode?
I though the agreement was that there was no resource to create an
alternative from scratch?
I think what Andrea's getting at is that the fact that ICU is in use
under the hood shouldn't be particularly visible to users. If PHP gets
"Unicode support" (whatever that turns out to mean), what the user
should see is *PHP's Unicode facilities*; only core devs and package
maintainers will need to know that those are implemented using ICU. As
such, there's no automatic need for PHP to do everything the same way as
ICU.
That was the reason for asking ...
What is the point of all these piecemeal patches when the underlying
base has not yet been agreed on? That we are using ICU in things like
the database interfaces for unicode support would point to it being
somewhat useful if those processes produced the same code as the same
actions in PHP. ICU is well established and it's API already in use in
the same platform as PHP is running on ... so can we please treat all of
these 'patches' in the light of a proper debate on the bigger picture.
Forcing something like this through now simply does not make sense, and
while there may be no 'automatic need' for the database interface to
work the same as other parts, it would perhaps be worth a little
consideration?


I see what you mean, but I think in this case, it would make very little difference what other Unicode pieces are added, since the Unicode escape syntax will only ever be interpreted by the compiler, and no other functions will ever see what it looks like. The only exception would be things like PCRE (not ICU) regexes, where - in a single-quoted string - a visually similar syntax might exist, but there are already lots of differences between what backslash-something means in a regex and what it means in a double-quoted string literal.

--
Rowan Collins
[IMSoP]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to