php-i18n Digest 15 Mar 2006 09:24:53 -0000 Issue 319

php-i18n-digest-help Wed, 15 Mar 2006 01:25:15 -0800

php-i18n Digest 15 Mar 2006 09:24:53 -0000 Issue 319

Topics (messages 976 through 978):


Re: Hash api change
        976 by: Marcus Boerger
        977 by: Dmitry Stogov

is_string()
        978 by: Derick Rethans

Administrivia:

To subscribe to the digest, e-mail:
        [EMAIL PROTECTED]

To unsubscribe from the digest, e-mail:
        [EMAIL PROTECTED]

To post to the list, e-mail:
        [email protected]


----------------------------------------------------------------------

--- Begin Message ---

Hello Dmitry,

Monday, March 13, 2006, 7:43:05 AM, you wrote:

> Hi Marcus,

> I agree that we can allocate Bucket and string in one block.
> But we will need to initialize pointer in Bucket.key.zstr and then
> dereference it every time we access string part of key.
> This will a slowdown.

If is a simple add and store operation. This usually tracks down to three
assembler ops. That is much faster than having to create a new struct for
each apply function's callback of which we have a few.

> Of course we can access it directly as arKey, but for what reason we need
> this change?

I didn't say that hash internal operations should use it. But the apply
stuff can simply bypass the adress of the zend_hash_key if we go with that
layout.

> Also each bucket will eat more RAM.

wow 4 bytes.

> One more thing: having "h" as first field of Bucket allows faster access to
> "h" or more compact code as C compiler doesn't need add offset.

On an X86 this is only a single byte more in the assembler opcodes.

> I didn't understand what you mean in "not alignable memory block in the
> integer index case".

In the current struct we force the string container to hold at least one
byte even if we don't use it. that means that three bytes are unused.
If the compiler doesn't automatically align the data the next block is
unaligned. Also if not forcing to use any byte you get back the 4 bytes
i added in the above in case of interger indices.

best regards
marcus

>> -----Original Message-----
>> From: Marcus Boerger [mailto:[EMAIL PROTECTED] 
>> Sent: Sunday, March 12, 2006 1:59 PM
>> To: Dmitry Stogov
>> Cc: [email protected]
>> Subject: Re: [PHP-I18N] Hash api change
>> 
>> 
>> Hello Dmitry,
>> 
>>   you are right about the strign allocation, we have this in 
>> 4 and 5 as well. But a minor change would even captutre that too:
>> 
>> typedef struct bucket {
>>         ulong h;        /* Used for numeric indexing */
>>         uint nKeyLength;
>>         void *pData;
>>         void *pDataPtr;
>>         struct bucket *pListNext;
>>         struct bucket *pListLast;
>>         struct bucket *pNext;
>>         struct bucket *pLast;
>>         HashKey key;
>>         union {
>>                 char  s[];   /* Must be last element */
>>                 UChar u[];   /* Must be last element */
>>         } arKey;
>> }
>> 
>> Also you had that member being [1] though the language allows 
>> to have it be an incomplete type ([]). I prefer the latter 
>> becuase that does not force to allocate a not alignable 
>> memory block in the integer index case. If we preceed this 
>> way there is no need to allocate two blocks and instead we 
>> can easily fill in the pointer for the string key by a simple 
>> addition:
>> 
>> // for the integer index case
>> bucket p = pemalloc(sizeof(bucket), persistent);
>> 
>> // for the string index case
>> bucket p = pemalloc(sizeof(bucket)+key_len, persistent);
>> p->key.arKey.s = p + offsetof(p, arKey);
>> memmove(p->key.s, key_value, key_len);
>> 
>> My opinion here is anyway that having to deal with only one 
>> struct is much easier and also faster. Being able to change 
>> the malloc to a single pointer add operation is a nice addon.
>> 
>> best regards
>> marcus
>> 
>> Sunday, March 12, 2006, 11:11:32 AM, you wrote:
>> 
>> > Hi Marcus,
>> 
>> > The idea make sense, but note that this modification will need two 
>> > malloc() for each bucket with string index. First for bucket itself 
>> > and the second for "arKey". (now they are done in one call)
>> > Also we will lose some performance in dereferencing pointer 
>> to arKey in each
>> > ZendHash operation.
>> 
>> > Thanks. Dmitry.
>> 
>> 
>> >> -----Original Message-----
>> >> From: Marcus Boerger [mailto:[EMAIL PROTECTED]
>> >> Sent: Saturday, March 11, 2006 3:22 PM
>> >> To: [email protected]
>> >> Subject: [PHP-I18N] Hash api change
>> >> 
>> >> 
>> >> Hello php-i18n,
>> >> 
>> >>   i think we should change the api a bit:
>> >> 
>> >> From:
>> >> 
>> >> typedef struct _key {
>> >>         zend_uchar type;
>> >>         union {
>> >>                 char  s[1];   /* Must be last element */
>> >>                 UChar u[1];   /* Must be last element */
>> >>         } arKey;
>> >> } HashKey;
>> >> 
>> >> typedef struct bucket {
>> >>         ulong h;                                              
>> >>   /* Used for numeric indexing */
>> >>         uint nKeyLength;
>> >>         void *pData;
>> >>         void *pDataPtr;
>> >>         struct bucket *pListNext;
>> >>         struct bucket *pListLast;
>> >>         struct bucket *pNext;
>> >>         struct bucket *pLast;
>> >>         HashKey key; /* Must be last element */
>> >> } Bucket;
>> >> 
>> >> To:
>> >> 
>> >> typedef struct _key {
>> >>         ulong h;     /* Used for numeric indexing */
>> >>         uint nKeyLength;
>> >>         zend_uchar type;
>> >>         zstr arKey;   /* Must be last element */
>> >> } HashKey;
>> >> 
>> >> typedef struct bucket {
>> >>         void *pData;
>> >>         void *pDataPtr;
>> >>         struct bucket *pListNext;
>> >>         struct bucket *pListLast;
>> >>         struct bucket *pNext;
>> >>         struct bucket *pLast;
>> >>         HashKey key; /* Must be last element */
>> >> } Bucket;
>> >> 
>> >> So now HashKey matches zend_hash_key just by pure reordering.
>> >> 
>> >> Also we should probably change the apply_with_arguments stuff
>> >> to pass the tsrm key. That is we'd change from:
>> >> 
>> >> typedef int (*apply_func_args_t)(void *pDest, int num_args,
>> >> va_list args, zend_hash_key *hash_key); ZEND_API void 
>> >> zend_hash_apply_with_arguments(HashTable *ht, 
>> >> apply_func_args_t apply_func, int, ...);
>> >> 
>> >> 
>> >> To:
>> >> 
>> >> typedef int (*apply_func_args_t)(void *pDest TSRMLS_DC, int
>> >> num_args, va_list args, zend_hash_key *hash_key); ZEND_API 
>> >> void zend_hash_apply_with_arguments(HashTable *ht TSRMLS_DC, 
>> >> apply_func_args_t apply_func, int, ...);
>> >> 
>> >> Further more our add_assoc functions currently only allow
>> >> handling of native string indexes. This should be changed to 
>> >> allow unicode indexes as well:
>> >> 
>> >> ZEND_API int add_assoc_long_ex(zval *arg, char *key, uint
>> >> key_len, long n);
>> >> 
>> >> TO:
>> >> 
>> >> ZEND_API int add_assoc_long_ex(zval *arg, zend_uchar type,
>> >> zst *key, uint key_len, long n);
>> >> 
>> >> Though maybe it is considerable to just add more functions i
>> >> don't think flooding the api is a good idea and prefer to 
>> >> only have one version that can easily deal with both.

--- End Message ---

--- Begin Message ---

Hi Marcus,

May be it will work.
I think we should make a test and measure the speed difference.

My suggestion is test bucket with first field "key" (having "h" as first)
and, first field "pData".

The only real disadvantage is 4-byte difference. :)

Alignment is not a real problem. Buckets are always aligned by C compilers
or (e)malloc().

Thanks. Dmitry.

> -----Original Message-----
> From: Marcus Boerger [mailto:[EMAIL PROTECTED] 
> Sent: Monday, March 13, 2006 12:02 PM
> To: Dmitry Stogov
> Cc: [email protected]
> Subject: Re: [PHP-I18N] Hash api change
> 
> 
> Hello Dmitry,
> 
> Monday, March 13, 2006, 7:43:05 AM, you wrote:
> 
> > Hi Marcus,
> 
> > I agree that we can allocate Bucket and string in one block. But we 
> > will need to initialize pointer in Bucket.key.zstr and then 
> > dereference it every time we access string part of key. This will a 
> > slowdown.
> 
> If is a simple add and store operation. This usually tracks 
> down to three assembler ops. That is much faster than having 
> to create a new struct for each apply function's callback of 
> which we have a few.
> 
> > Of course we can access it directly as arKey, but for what 
> reason we 
> > need this change?
> 
> I didn't say that hash internal operations should use it. But 
> the apply stuff can simply bypass the adress of the 
> zend_hash_key if we go with that layout.
> 
> > Also each bucket will eat more RAM.
> 
> wow 4 bytes.
> 
> > One more thing: having "h" as first field of Bucket allows faster 
> > access to "h" or more compact code as C compiler doesn't need add 
> > offset.
> 
> On an X86 this is only a single byte more in the assembler opcodes.
> 
> > I didn't understand what you mean in "not alignable memory block in 
> > the integer index case".
> 
> In the current struct we force the string container to hold 
> at least one byte even if we don't use it. that means that 
> three bytes are unused. If the compiler doesn't automatically 
> align the data the next block is unaligned. Also if not 
> forcing to use any byte you get back the 4 bytes i added in 
> the above in case of interger indices.
> 
> best regards
> marcus
> 
> >> -----Original Message-----
> >> From: Marcus Boerger [mailto:[EMAIL PROTECTED]
> >> Sent: Sunday, March 12, 2006 1:59 PM
> >> To: Dmitry Stogov
> >> Cc: [email protected]
> >> Subject: Re: [PHP-I18N] Hash api change
> >> 
> >> 
> >> Hello Dmitry,
> >> 
> >>   you are right about the strign allocation, we have this in
> >> 4 and 5 as well. But a minor change would even captutre that too:
> >> 
> >> typedef struct bucket {
> >>         ulong h;        /* Used for numeric indexing */
> >>         uint nKeyLength;
> >>         void *pData;
> >>         void *pDataPtr;
> >>         struct bucket *pListNext;
> >>         struct bucket *pListLast;
> >>         struct bucket *pNext;
> >>         struct bucket *pLast;
> >>         HashKey key;
> >>         union {
> >>                 char  s[];   /* Must be last element */
> >>                 UChar u[];   /* Must be last element */
> >>         } arKey;
> >> }
> >> 
> >> Also you had that member being [1] though the language allows
> >> to have it be an incomplete type ([]). I prefer the latter 
> >> becuase that does not force to allocate a not alignable 
> >> memory block in the integer index case. If we preceed this 
> >> way there is no need to allocate two blocks and instead we 
> >> can easily fill in the pointer for the string key by a simple 
> >> addition:
> >> 
> >> // for the integer index case
> >> bucket p = pemalloc(sizeof(bucket), persistent);
> >> 
> >> // for the string index case
> >> bucket p = pemalloc(sizeof(bucket)+key_len, persistent);
> >> p->key.arKey.s = p + offsetof(p, arKey);
> >> memmove(p->key.s, key_value, key_len);
> >> 
> >> My opinion here is anyway that having to deal with only one
> >> struct is much easier and also faster. Being able to change 
> >> the malloc to a single pointer add operation is a nice addon.
> >> 
> >> best regards
> >> marcus
> >> 
> >> Sunday, March 12, 2006, 11:11:32 AM, you wrote:
> >> 
> >> > Hi Marcus,
> >> 
> >> > The idea make sense, but note that this modification 
> will need two
> >> > malloc() for each bucket with string index. First for 
> bucket itself 
> >> > and the second for "arKey". (now they are done in one call)
> >> > Also we will lose some performance in dereferencing pointer 
> >> to arKey in each
> >> > ZendHash operation.
> >> 
> >> > Thanks. Dmitry.
> >> 
> >> 
> >> >> -----Original Message-----
> >> >> From: Marcus Boerger [mailto:[EMAIL PROTECTED]
> >> >> Sent: Saturday, March 11, 2006 3:22 PM
> >> >> To: [email protected]
> >> >> Subject: [PHP-I18N] Hash api change
> >> >> 
> >> >> 
> >> >> Hello php-i18n,
> >> >> 
> >> >>   i think we should change the api a bit:
> >> >> 
> >> >> From:
> >> >> 
> >> >> typedef struct _key {
> >> >>         zend_uchar type;
> >> >>         union {
> >> >>                 char  s[1];   /* Must be last element */
> >> >>                 UChar u[1];   /* Must be last element */
> >> >>         } arKey;
> >> >> } HashKey;
> >> >> 
> >> >> typedef struct bucket {
> >> >>         ulong h;                                              
> >> >>   /* Used for numeric indexing */
> >> >>         uint nKeyLength;
> >> >>         void *pData;
> >> >>         void *pDataPtr;
> >> >>         struct bucket *pListNext;
> >> >>         struct bucket *pListLast;
> >> >>         struct bucket *pNext;
> >> >>         struct bucket *pLast;
> >> >>         HashKey key; /* Must be last element */
> >> >> } Bucket;
> >> >> 
> >> >> To:
> >> >> 
> >> >> typedef struct _key {
> >> >>         ulong h;     /* Used for numeric indexing */
> >> >>         uint nKeyLength;
> >> >>         zend_uchar type;
> >> >>         zstr arKey;   /* Must be last element */
> >> >> } HashKey;
> >> >> 
> >> >> typedef struct bucket {
> >> >>         void *pData;
> >> >>         void *pDataPtr;
> >> >>         struct bucket *pListNext;
> >> >>         struct bucket *pListLast;
> >> >>         struct bucket *pNext;
> >> >>         struct bucket *pLast;
> >> >>         HashKey key; /* Must be last element */
> >> >> } Bucket;
> >> >> 
> >> >> So now HashKey matches zend_hash_key just by pure reordering.
> >> >> 
> >> >> Also we should probably change the apply_with_arguments 
> stuff to 
> >> >> pass the tsrm key. That is we'd change from:
> >> >> 
> >> >> typedef int (*apply_func_args_t)(void *pDest, int num_args, 
> >> >> va_list args, zend_hash_key *hash_key); ZEND_API void 
> >> >> zend_hash_apply_with_arguments(HashTable *ht, apply_func_args_t 
> >> >> apply_func, int, ...);
> >> >> 
> >> >> 
> >> >> To:
> >> >> 
> >> >> typedef int (*apply_func_args_t)(void *pDest TSRMLS_DC, int 
> >> >> num_args, va_list args, zend_hash_key *hash_key); ZEND_API void 
> >> >> zend_hash_apply_with_arguments(HashTable *ht TSRMLS_DC, 
> >> >> apply_func_args_t apply_func, int, ...);
> >> >> 
> >> >> Further more our add_assoc functions currently only 
> allow handling 
> >> >> of native string indexes. This should be changed to 
> allow unicode 
> >> >> indexes as well:
> >> >> 
> >> >> ZEND_API int add_assoc_long_ex(zval *arg, char *key, 
> uint key_len, 
> >> >> long n);
> >> >> 
> >> >> TO:
> >> >> 
> >> >> ZEND_API int add_assoc_long_ex(zval *arg, zend_uchar type, zst 
> >> >> *key, uint key_len, long n);
> >> >> 
> >> >> Though maybe it is considerable to just add more 
> functions i don't 
> >> >> think flooding the api is a good idea and prefer to 
> only have one 
> >> >> version that can easily deal with both.
> 
> 
> 
>

--- End Message ---

--- Begin Message ---

Hello!

The past few days I've been trying to get some of our code to run on PHP 
6 with unicode semantics turned on to be able to provide some 
benchmarking. Ofcourse I found some BC breaking behavior, where the 
following one is bringing the largest WTF factor:

        <?php
        $str = "hautamaekki";
        var_dump( is_string( $str ) );
        ?>

        [EMAIL PROTECTED]:/tmp$ php-6.0dev /tmp/string-test.php
        bool(false)

(The reason for this is that is_string() now returns true for *binary* 
strings only). I prepared a patch [1] which reverts is_string() back to 
the "expected" behavior. This patch is also in line with the notes from 
the PDM [2]. Andrei argued that is_string() and (string) should work on 
binary strings and is_unicode() and (unicode) for "real" strings. 
However I think it is a much better idea to use is_binary()/(binary) and 
is_string()/(string) instead as this will not break so much code. To 
illustrate this with another example that I encountered in PHPUnit2 when 
trying to get our code running:

In Framework/TestSuite.php there is the following code (shortened):

    public function __construct($theClass = '', $name = '') {
        $argumentsValid = FALSE;

        if (is_object($theClass) &&
            $theClass instanceof ReflectionClass) {
            $argumentsValid = TRUE;
        }

        else if (is_string($theClass) && $theClass !== '' && 
class_exists($theClass)) {

Which is f.e. called with (ezcTestSuite inherits the PHPUnit2 TestSuit 
class):

    public static function suite()
    {
        return new ezcTestSuite( "ezcConsoleToolsInputTest" );
    }

This does not work anymore with PHP 6 as the string is now suddenly no 
string anymore... highlight confusing I would say. 

Can we please use is_binary()/(binary)  for binary string types and 
is_string()/(string) for real strings as per PDM notes?

regards,
Derick


[1] http://files.derickrethans.nl/patches/uc-is_string-2006-03-15.diff.txt
[2] http://www.php.net/~derick/meeting-notes.html#different-string-types

-- 
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

--- End Message ---

php-i18n Digest 15 Mar 2006 09:24:53 -0000 Issue 319

Reply via email to