Edit report at http://bugs.php.net/bug.php?id=48507&edit=1

 ID:                 48507
 Comment by:         gjorgjioski at gmail dot com
 Reported by:        krynble at yahoo dot com dot br
 Summary:            fgetcsv() ignoring special characters
 Status:             Bogus
 Type:               Bug
 Package:            Filesystem function related
 Operating System:   Unix
 PHP Version:        5.*
 Block user comment: N
 Private report:     N

 New Comment:

This bug occurs also when file is in UTF8 (tab delimited file using
š,č characters). I can provide an example.


Previous Comments:
------------------------------------------------------------------------
[2010-05-19 13:39:52] pahan at hubbitus dot spb dot su

> Quote from the docs:

> Note: Locale setting is taken into account by this function. If LANG
is e.g.

> en_US.UTF-8, files in one-byte encoding are read wrong by this
function.

Ok, bug documented as "are read wrong by this function" is better then
nothing. 

But do you plan fix this wrong behaviour?

------------------------------------------------------------------------
[2010-05-18 11:03:42] m...@php.net

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Quote from the docs:



Note: Locale setting is taken into account by this function. If LANG is
e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this
function.

------------------------------------------------------------------------
[2009-12-12 11:40:29] pahan at hubbitus dot spb dot su

Sorry for duplicate (#50456 is my), but in it, additionally to there
described problem in fgetcsv I also suggest fix fputcvs to allow [force]
enclosing single words in field.



Off course it does *not* solve this problem of incorrect fgetcsv
parsing, because RFC allow not quoted values (
http://www.faqs.org/rfcs/rfc4180.html , section 2.5 ), but, it is make
pair fputcsv/fgetcsv as minimum compatible in PHP implementation.

------------------------------------------------------------------------
[2009-12-12 01:33:51] j...@php.net

See also bug #50456

------------------------------------------------------------------------
[2009-09-22 15:09:20] phofstetter at sensational dot ch

below you'll find a small script which shows how to implement a user
filter that can be used to on-the-fly utf8-encode the data so that
fgetcsv is happy and returns correct output even if the first character
in a field has its high-bit set and is not valid utf-8:



Remember: This is a workaround and impacts performance. This is not a
valid fix for the bug.



I didn't yet have time to deeply look into the C implementation for
fgetcsv, but all these calls to php_mblen() feel suspicious to me.



I'll try and have a look into this later today, but for now, I'm just
glad I have this workaround (quickly hacked together - keep that in
mind):



<?php



class utf8encode_filter extends php_user_filter {

  function is_utf8($string){

      return preg_match('%(?:

          [\xC2-\xDF][\x80-\xBF]        # non-overlong 2-byte

          |\xE0[\xA0-\xBF][\x80-\xBF]               # excluding
overlongs

          |[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}      # straight 3-byte

          |\xED[\x80-\x9F][\x80-\xBF]               # excluding
surrogates

          |\xF0[\x90-\xBF][\x80-\xBF]{2}    # planes 1-3

          |[\xF1-\xF3][\x80-\xBF]{3}                  # planes 4-15

          |\xF4[\x80-\x8F][\x80-\xBF]{2}    # plane 16

      )+%xs', $string);

  }

      

  function filter($in, $out, &$consumed, $closing)

  {

    while ($bucket = stream_bucket_make_writeable($in)) {

      if (!$this->is_utf8($bucket->data))

          $bucket->data = utf8_encode($bucket->data);

      $consumed += $bucket->datalen;

      stream_bucket_append($out, $bucket);

    }

    return PSFS_PASS_ON;

  }

}



/* Register our filter with PHP */

stream_filter_register("utf8encode", "utf8encode_filter")

    or die("Failed to register filter");



$fp = fopen($_SERVER['argv'][1], "r");



/* Attach the registered filter to the stream just opened */

stream_filter_prepend($fp, "utf8encode");



while($data = fgetcsv($fp, 0, ';', '"'))

    print_r($data);



fclose($fp);

------------------------------------------------------------------------


The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

    http://bugs.php.net/bug.php?id=48507


-- 
Edit this bug report at http://bugs.php.net/bug.php?id=48507&edit=1

Reply via email to