andrei Fri Sep 23 17:24:31 2005 EDT Modified files: /php-src README.UNICODE-UPGRADES Log: substr() sample case http://cvs.php.net/diff.php/php-src/README.UNICODE-UPGRADES?r1=1.4&r2=1.5&ty=u Index: php-src/README.UNICODE-UPGRADES diff -u php-src/README.UNICODE-UPGRADES:1.4 php-src/README.UNICODE-UPGRADES:1.5 --- php-src/README.UNICODE-UPGRADES:1.4 Wed Sep 14 14:01:41 2005 +++ php-src/README.UNICODE-UPGRADES Fri Sep 23 17:24:31 2005 @@ -262,6 +262,66 @@ +Upgrading Functions +=================== + +Let's take a look at a couple of functions that have been upgraded to +support new string types. + +substr() +-------- + +This functions returns part of a string based on offset and length +parameters. + + void *str; + int32_t str_len, cp_len; + zend_uchar str_type; + + if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "tl|l", &str, &str_len, &str_type, &f, &l) == FAILURE) { + return; + } + +The first thing we notice is that the incoming string specifier is 't', +which means that we can accept all 3 string types. The 'str' variable is +declared as void*, because it can point to either UChar* or char*. +The actual type of the incoming string is stored in 'str_type' variable. + + if (str_type == IS_UNICODE) { + cp_len = u_countChar32(str, str_len); + } else { + cp_len = str_len; + } + +If the string is a Unicode one, we cannot rely on the str_len value to tell +us the number of characters in it. Instead, we call u_countChar32() to +obtain it. + +The next several lines normalize start and length parameters to fit within the +string. Nothing new here. Then we locate the appropriate segment. + + if (str_type == IS_UNICODE) { + int32_t start = 0, end = 0; + U16_FWD_N((UChar*)str, end, str_len, f); + start = end; + U16_FWD_N((UChar*)str, end, str_len, l); + RETURN_UNICODEL((UChar*)str + start, end-start, 1); + +Since codepoint (character) #n is not necessarily at offset #n in Unicode +strings, we start at the beginning and iterate forward until we have gone +through the required number of codepoints to reach the start of the segment. +Then we save the location in 'start' and continue iterating through the number +of codepoints specified by the offset. Once that's done, we can return the +segment as a Unicode string. + + } else { + RETURN_STRINGL((char*)str + f, l, 1); + } + +For native and binary types, we can return the segment directly. + + + References ==========
-- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php