Re: [PHP-DEV] Re: Regarding the latest patch on fgetcsv() (stable branch)

Moriyoshi Koizumi Fri, 12 Dec 2003 13:18:46 -0800

On 2003/12/13, at 5:09, Ilia Alshanetsky wrote:

I mentioning this now because we are considering changes to the function in the development branch, which is a fine time to resolve any deficiencies.

Okay, fine :)

The added functionality, which if I understand correctly is support for
multibyte delimeters and enclosures is great. But it hardly explains a

The change was not for multibyte delimiters and enclosures. The current implementation still allows only single-byte characters for the delimiter and enclosure. I was able to add such a capability as well, but I didn't because it appeared to fairly slow it down.

As several multibyte encodings like CP932, CP936, CP949, CP950 and Shift_JIS may map a value in range of 0x40 - 0xfe to the second byte, which had been a problem. Therefore we need to check if a octet of a certain position belongs to a multibyte character or not and this fact motivated me to bring a scanner-like finite-state machine implementation into fgetcsv() (and basename()).

See http://www.microsoft.com/globaldev/reference/WinCP.mspx for detail.

significant performance disparity I am seeing. I believe much of the problem can be solved by moving from manual string iteration to one using C library functions such as memchr(). When parsing non-multibyte text there shouldn't be more then 10-15% performance loss. I should mention that benchmarks were made using time utility, so advantages offered by PHP 5's speedups were discounted. Had they been considered the speed loss would've been 300% or more.


If we limited the support to UTF-8 or EUC encoding only, we'd be
able to drastically gain much better performance. But it won't
actually solve practical problems where it is in action.

Moriyoshi

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Re: Regarding the latest patch on fgetcsv() (stable branch)

Reply via email to