ID:               48507
 Comment by:       phofstetter at sensational dot ch
 Reported By:      krynble at yahoo dot com dot br
 Status:           Verified
 Bug Type:         Filesystem function related
 Operating System: Unix
 PHP Version:      5.2.9
 New Comment:

below you'll find a small script which shows how to implement a user
filter that can be used to on-the-fly utf8-encode the data so that
fgetcsv is happy and returns correct output even if the first character
in a field has its high-bit set and is not valid utf-8:

Remember: This is a workaround and impacts performance. This is not a
valid fix for the bug.

I didn't yet have time to deeply look into the C implementation for
fgetcsv, but all these calls to php_mblen() feel suspicious to me.

I'll try and have a look into this later today, but for now, I'm just
glad I have this workaround (quickly hacked together - keep that in


class utf8encode_filter extends php_user_filter {
  function is_utf8($string){
      return preg_match('%(?:
          [\xC2-\xDF][\x80-\xBF]        # non-overlong 2-byte
          |\xE0[\xA0-\xBF][\x80-\xBF]               # excluding
          |[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}      # straight 3-byte
          |\xED[\x80-\x9F][\x80-\xBF]               # excluding
          |\xF0[\x90-\xBF][\x80-\xBF]{2}    # planes 1-3
          |[\xF1-\xF3][\x80-\xBF]{3}                  # planes 4-15
          |\xF4[\x80-\x8F][\x80-\xBF]{2}    # plane 16
      )+%xs', $string);
  function filter($in, $out, &$consumed, $closing)
    while ($bucket = stream_bucket_make_writeable($in)) {
      if (!$this->is_utf8($bucket->data))
          $bucket->data = utf8_encode($bucket->data);
      $consumed += $bucket->datalen;
      stream_bucket_append($out, $bucket);
    return PSFS_PASS_ON;

/* Register our filter with PHP */
stream_filter_register("utf8encode", "utf8encode_filter")
    or die("Failed to register filter");

$fp = fopen($_SERVER['argv'][1], "r");

/* Attach the registered filter to the stream just opened */
stream_filter_prepend($fp, "utf8encode");

while($data = fgetcsv($fp, 0, ';', '"'))


Previous Comments:

[2009-09-22 14:45:22] phofstetter at sensational dot ch

I was looking into this (after having been bitten by it) and I can add
another tidbit that might help tracking this down:

The bug doesn't happen if the file fgetcsv() is reading is in

I have created a test-file in ISO-8859-1 and then used
file_put_contents(utf8encode(file_get_contents())) to create the
UTF8-version of it (explaining this here because I'm not sure whether
this would write a BOM or not - probably not though).

That version could be read correctly.

I'm now writing a stream filter that does the UTF-8 conversion on the
fly to hook that in between the file and fgetcsv() - while I would lose
a bit of performance, in my case, this is the cleanest workaround.


[2009-09-21 18:11:47] dmulryan at calendarwiz dot com

Note: Previous comment has error where URL is shown in array element. 
This is not a bug but my error in the example.  Bug is in special


[2009-09-21 18:07:42] dmulryan at calendarwiz dot com

Similar problem when parsing the following line:


which produces empty array elements for fields with special

Array ( [0] => 0909211132 [1] => 1 [2] => [3] => [4] => URL [5] => Y
[6] => 1 [7] => 1 [8] => 1 [9] => [10] => 2530 )


[2009-06-26 19:35:22] sjoerd-php at linuxonly dot nl

Could reproduce with php 5.2.10, php 5.2.11-dev (200906261830) and php
5.3rc4. Example code:

$fp = tmpfile();
fwrite($fp, $str);
fseek($fp, 0);
$arr = fgetcsv($fp, 100, '#');

Expected: string(5) "?TICA"
Actual: string(4) "TICA"


[2009-06-13 18:10:03] krynble at yahoo dot com dot br

Unfortunately I'm unable to test it because the server is running in a


If someone can give a feedback about it, I would apreciate.

Still, thanks for the help!


The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

Edit this bug report at

Reply via email to