Edit report at http://bugs.php.net/bug.php?id=48507&edit=1
ID: 48507 Comment by: gjorgjioski at gmail dot com Reported by: krynble at yahoo dot com dot br Summary: fgetcsv() ignoring special characters Status: Bogus Type: Bug Package: Filesystem function related Operating System: Unix PHP Version: 5.* Block user comment: N Private report: N New Comment: This bug occurs also when file is in UTF8 (tab delimited file using Å¡,Ä characters). I can provide an example. Previous Comments: ------------------------------------------------------------------------ [2010-05-19 13:39:52] pahan at hubbitus dot spb dot su > Quote from the docs: > Note: Locale setting is taken into account by this function. If LANG is e.g. > en_US.UTF-8, files in one-byte encoding are read wrong by this function. Ok, bug documented as "are read wrong by this function" is better then nothing. But do you plan fix this wrong behaviour? ------------------------------------------------------------------------ [2010-05-18 11:03:42] m...@php.net Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.php.net/manual/ and the instructions on how to report a bug at http://bugs.php.net/how-to-report.php Quote from the docs: Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function. ------------------------------------------------------------------------ [2009-12-12 11:40:29] pahan at hubbitus dot spb dot su Sorry for duplicate (#50456 is my), but in it, additionally to there described problem in fgetcsv I also suggest fix fputcvs to allow [force] enclosing single words in field. Off course it does *not* solve this problem of incorrect fgetcsv parsing, because RFC allow not quoted values ( http://www.faqs.org/rfcs/rfc4180.html , section 2.5 ), but, it is make pair fputcsv/fgetcsv as minimum compatible in PHP implementation. ------------------------------------------------------------------------ [2009-12-12 01:33:51] j...@php.net See also bug #50456 ------------------------------------------------------------------------ [2009-09-22 15:09:20] phofstetter at sensational dot ch below you'll find a small script which shows how to implement a user filter that can be used to on-the-fly utf8-encode the data so that fgetcsv is happy and returns correct output even if the first character in a field has its high-bit set and is not valid utf-8: Remember: This is a workaround and impacts performance. This is not a valid fix for the bug. I didn't yet have time to deeply look into the C implementation for fgetcsv, but all these calls to php_mblen() feel suspicious to me. I'll try and have a look into this later today, but for now, I'm just glad I have this workaround (quickly hacked together - keep that in mind): <?php class utf8encode_filter extends php_user_filter { function is_utf8($string){ return preg_match('%(?: [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte |\xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs |[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte |\xED[\x80-\x9F][\x80-\xBF] # excluding surrogates |\xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3 |[\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 |\xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16 )+%xs', $string); } function filter($in, $out, &$consumed, $closing) { while ($bucket = stream_bucket_make_writeable($in)) { if (!$this->is_utf8($bucket->data)) $bucket->data = utf8_encode($bucket->data); $consumed += $bucket->datalen; stream_bucket_append($out, $bucket); } return PSFS_PASS_ON; } } /* Register our filter with PHP */ stream_filter_register("utf8encode", "utf8encode_filter") or die("Failed to register filter"); $fp = fopen($_SERVER['argv'][1], "r"); /* Attach the registered filter to the stream just opened */ stream_filter_prepend($fp, "utf8encode"); while($data = fgetcsv($fp, 0, ';', '"')) print_r($data); fclose($fp); ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/bug.php?id=48507 -- Edit this bug report at http://bugs.php.net/bug.php?id=48507&edit=1