php-i18n Digest 17 Mar 2006 15:49:06 -0000 Issue 320
Topics (messages 979 through 986):
Re: is_string()
979 by: Marcus Boerger
980 by: Dmitry Stogov
981 by: Derick Rethans
982 by: Dmitry Stogov
983 by: Andrei Zmievski
986 by: Pierre
Re: Hash api change
984 by: Andrei Zmievski
985 by: Marcus Boerger
Administrivia:
To subscribe to the digest, e-mail:
[EMAIL PROTECTED]
To unsubscribe from the digest, e-mail:
[EMAIL PROTECTED]
To post to the list, e-mail:
[email protected]
----------------------------------------------------------------------
--- Begin Message ---
Hello Derick,
Wednesday, March 15, 2006, 10:24:37 AM, you wrote:
> Hello!
> The past few days I've been trying to get some of our code to run on PHP
> 6 with unicode semantics turned on to be able to provide some
> benchmarking. Ofcourse I found some BC breaking behavior, where the
> following one is bringing the largest WTF factor:
> <?php
> $str = "hautamaekki";
> var_dump( is_string( $str ) );
> ?>
> [EMAIL PROTECTED]:/tmp$ php-6.0dev /tmp/string-test.php
> bool(false)
> (The reason for this is that is_string() now returns true for *binary*
> strings only). I prepared a patch [1] which reverts is_string() back to
> the "expected" behavior. This patch is also in line with the notes from
> the PDM [2]. Andrei argued that is_string() and (string) should work on
> binary strings and is_unicode() and (unicode) for "real" strings.
> However I think it is a much better idea to use is_binary()/(binary) and
> is_string()/(string) instead as this will not break so much code. To
> illustrate this with another example that I encountered in PHPUnit2 when
> trying to get our code running:
> In Framework/TestSuite.php there is the following code (shortened):
> public function __construct($theClass = '', $name = '') {
> $argumentsValid = FALSE;
> if (is_object($theClass) &&
> $theClass instanceof ReflectionClass) {
> $argumentsValid = TRUE;
> }
> else if (is_string($theClass) && $theClass !== '' &&
> class_exists($theClass)) {
> Which is f.e. called with (ezcTestSuite inherits the PHPUnit2 TestSuit
> class):
> public static function suite()
> {
> return new ezcTestSuite( "ezcConsoleToolsInputTest" );
> }
> This does not work anymore with PHP 6 as the string is now suddenly no
> string anymore... highlight confusing I would say.
> Can we please use is_binary()/(binary) for binary string types and
> is_string()/(string) for real strings as per PDM notes?
> regards,
> Derick
> [1] http://files.derickrethans.nl/patches/uc-is_string-2006-03-15.diff.txt
> [2] http://www.php.net/~derick/meeting-notes.html#different-string-types
I would prefer a slightly different approach and make is_string() return
true for both native and unicode strings. This would lead to the following
layout:
is_string(): Z_TYPE_P() == IS_STRING || Z_TYPE_P() == IS_UNICODE
(string): bypass native + unicode, convert according to unicode mode;
make_printable_zval
is_binary(): Z_TYPE_P() == IS_STRING
(binary): bypass native; make_string_zval
is_unicode(): Z_TYPE_P() == IS_UNICODE
(unicode): bypass unicode: make_unicode_string
which to me makes pretty much sense. The point is that we still have
automatic conversion mostly everywhere or will get it nearly everywhere
sooner or later. So when dealing with a string variable it doesn't really
matter what kind it is and if so one has the ability to check for the
exact type.
Best regards,
marcus
--- End Message ---
--- Begin Message ---
In case of "> is_string(): Z_TYPE_P() == IS_STRING || Z_TYPE_P() ==
IS_UNICODE",
is_string() will return TRUE for binary data in unicode mode.
It should be:
is_string(): Z_TYPE_P() == (UG(unicode)?IS_UNICODE:IS_STRING)
is_binary() and is_unicode() are OK.
Thanks. Dmitry.
> -----Original Message-----
> From: Marcus Boerger [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, March 15, 2006 1:48 PM
> To: Derick Rethans
> Cc: Unicode Mailinglist
> Subject: Re: [PHP-I18N] is_string()
>
>
> Hello Derick,
>
> Wednesday, March 15, 2006, 10:24:37 AM, you wrote:
>
> > Hello!
>
> > The past few days I've been trying to get some of our code
> to run on
> > PHP
> > 6 with unicode semantics turned on to be able to provide some
> > benchmarking. Ofcourse I found some BC breaking behavior, where the
> > following one is bringing the largest WTF factor:
>
> > <?php
> > $str = "hautamaekki";
> > var_dump( is_string( $str ) );
> > ?>
>
> > [EMAIL PROTECTED]:/tmp$ php-6.0dev /tmp/string-test.php
> > bool(false)
>
> > (The reason for this is that is_string() now returns true
> for *binary*
> > strings only). I prepared a patch [1] which reverts
> is_string() back to
> > the "expected" behavior. This patch is also in line with
> the notes from
> > the PDM [2]. Andrei argued that is_string() and (string)
> should work on
> > binary strings and is_unicode() and (unicode) for "real" strings.
> > However I think it is a much better idea to use
> is_binary()/(binary) and
> > is_string()/(string) instead as this will not break so much
> code. To
> > illustrate this with another example that I encountered in
> PHPUnit2 when
> > trying to get our code running:
>
> > In Framework/TestSuite.php there is the following code (shortened):
>
> > public function __construct($theClass = '', $name = '') {
> > $argumentsValid = FALSE;
>
> > if (is_object($theClass) &&
> > $theClass instanceof ReflectionClass) {
> > $argumentsValid = TRUE;
> > }
>
> > else if (is_string($theClass) && $theClass !== '' &&
> > class_exists($theClass)) {
>
> > Which is f.e. called with (ezcTestSuite inherits the
> PHPUnit2 TestSuit
> > class):
>
> > public static function suite()
> > {
> > return new ezcTestSuite( "ezcConsoleToolsInputTest" );
> > }
>
> > This does not work anymore with PHP 6 as the string is now
> suddenly no
> > string anymore... highlight confusing I would say.
>
> > Can we please use is_binary()/(binary) for binary string types and
> > is_string()/(string) for real strings as per PDM notes?
>
> > regards,
> > Derick
>
>
> > [1]
> >
> http://files.derickrethans.nl/patches/uc-is_string-2006-03-15.diff.txt
> > [2]
> http://www.php.net/~derick/meeting-notes.html#different-string-types
>
> I would prefer a slightly different approach and make
> is_string() return true for both native and unicode strings.
> This would lead to the following
> layout:
>
> is_string(): Z_TYPE_P() == IS_STRING || Z_TYPE_P() == IS_UNICODE
> (string): bypass native + unicode, convert according to
> unicode mode; make_printable_zval
>
> is_binary(): Z_TYPE_P() == IS_STRING
> (binary): bypass native; make_string_zval
>
> is_unicode(): Z_TYPE_P() == IS_UNICODE
> (unicode): bypass unicode: make_unicode_string
>
> which to me makes pretty much sense. The point is that we
> still have automatic conversion mostly everywhere or will get
> it nearly everywhere sooner or later. So when dealing with a
> string variable it doesn't really matter what kind it is and
> if so one has the ability to check for the exact type.
>
> Best regards,
> marcus
>
> --
> PHP Unicode & I18N Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
>
--- End Message ---
--- Begin Message ---
On Wed, 15 Mar 2006, Dmitry Stogov wrote:
> In case of "> is_string(): Z_TYPE_P() == IS_STRING || Z_TYPE_P() ==
> IS_UNICODE",
> is_string() will return TRUE for binary data in unicode mode.
>
> It should be:
> is_string(): Z_TYPE_P() == (UG(unicode)?IS_UNICODE:IS_STRING)
>
> is_binary() and is_unicode() are OK.
That's what my patch does...
Derick
--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org
--- End Message ---
--- Begin Message ---
So I agree with your patch. :)
Probaly gettype() and settype() must be changed in the same way.
Thanks. Dmitry.
> -----Original Message-----
> From: Derick Rethans [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, March 15, 2006 2:45 PM
> To: Dmitry Stogov
> Cc: 'Marcus Boerger'; 'Unicode Mailinglist'
> Subject: RE: [PHP-I18N] is_string()
>
>
> On Wed, 15 Mar 2006, Dmitry Stogov wrote:
>
> > In case of "> is_string(): Z_TYPE_P() == IS_STRING ||
> Z_TYPE_P() ==
> > IS_UNICODE",
> > is_string() will return TRUE for binary data in unicode mode.
> >
> > It should be:
> > is_string(): Z_TYPE_P() == (UG(unicode)?IS_UNICODE:IS_STRING)
> >
> > is_binary() and is_unicode() are OK.
>
> That's what my patch does...
>
> Derick
> --
> Derick Rethans
> http://derickrethans.nl | http://ez.no | http://xdebug.org
>
> --
> PHP Unicode & I18N Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
>
--- End Message ---
--- Begin Message ---
I guess I can live with this. It's just a little strange that we have
only 2 string types, but 3 casts..
I would also say we need to allow automatic upconversion on
binary/unicode concatenation and comparison using runtime encoding. We
should also restrict binary string literals to contain ASCII chars
only, to avoid encoding problems. If non-ASCII chars are required they
can use \u, \U, or \C escape sequences.
-Andrei
On Mar 15, 2006, at 2:48 AM, Marcus Boerger wrote:
I would prefer a slightly different approach and make is_string()
return
true for both native and unicode strings. This would lead to the
following
layout:
is_string(): Z_TYPE_P() == IS_STRING || Z_TYPE_P() == IS_UNICODE
(string): bypass native + unicode, convert according to unicode mode;
make_printable_zval
is_binary(): Z_TYPE_P() == IS_STRING
(binary): bypass native; make_string_zval
is_unicode(): Z_TYPE_P() == IS_UNICODE
(unicode): bypass unicode: make_unicode_string
which to me makes pretty much sense. The point is that we still have
automatic conversion mostly everywhere or will get it nearly everywhere
sooner or later. So when dealing with a string variable it doesn't
really
matter what kind it is and if so one has the ability to check for the
exact type.
Best regards,
marcus
--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
--- End Message ---
--- Begin Message ---
On Wed, 15 Mar 2006 12:26:53 -0800
[EMAIL PROTECTED] (Andrei Zmievski) wrote:
> I guess I can live with this. It's just a little strange that we have
> only 2 string types, but 3 casts..
I find that confusing too. Given the commit log, it is even more
confusing:
- Updated is_string():
If Unicode semantics is turned on, return "true" for Unicode strings
only. If Unicode semantics is turned off, return "true" for native
strings only.
It makes is_string basically useless, I can imagine two solutions:
- keep it and always returns true (unicode and binary strings are
strings anyway)
- deprecate it in php 6.0, it will still returns true. At least
people will know that they should use the new functions
There is maybe a third solution, but certainly too drastic, remove
is_string :)
--Pierre
--- End Message ---
--- Begin Message ---
On Mar 11, 2006, at 4:22 AM, Marcus Boerger wrote:
[snip]
So now HashKey matches zend_hash_key just by pure reordering.
Why do we need both HashKey and zend_hash_key?
-Andrei
--- End Message ---
--- Begin Message ---
Hello Andrei,
we need it right now becuase of the current layout. And imo that's a
design flaw which is why i brought it up here with an option on how to
change it after discussing it. Having two different layouts allows us
to not need to a pointer to the key string plus eventually allocate
memory twice. But this also forces us to copy the key struct in the
apply func that gives access to the key for every element.
If we have the layout proposed by me we could also provide an apply
version that easily gives access to the element, the key and a parameter.
Having to use the following to access the key, made most cases do the
iteration inplace. Just because the following is pretty slow for two
reasons. First va_args stuff is potentially slow an some machines and
second having to allocate a tsrm key on some machines is especially slow.
typedef int (*apply_func_args_t)(void *pDest, int num_args, va_list args,
zend_hash_key *hash_key);
ZEND_API void zend_hash_apply_with_arguments(HashTable *ht, apply_func_args_t
apply_func, int, ...);
Since normally a struct can be used very easily and efficient, where
multiple arguments would be used otherwise. The next layout would
imo be a nice addition:
typedef int (*apply_func_key_t)(void *pDest, zend_hash_key *hash_key, void
*argument TSRMLS_DC);
ZEND_API void zend_hash_apply_with_key(HashTable *ht, apply_func_key_t
apply_func, void *argument TSRMLS_DC);
This design change suggestion of course goes especially to Andi/Zeev.
marcus
Thursday, March 16, 2006, 7:11:39 AM, you wrote:
> On Mar 11, 2006, at 4:22 AM, Marcus Boerger wrote:
> [snip]
>> So now HashKey matches zend_hash_key just by pure reordering.
> Why do we need both HashKey and zend_hash_key?
> -Andrei
--
Best regards,
marcus
--- End Message ---