ID: 46217 Updated by: [EMAIL PROTECTED] -Summary: "fgetcsv" parses a csv file in the greek encoding incorrectly. Reported By: brook73 at gmail dot com -Status: Open +Status: Feedback Bug Type: Filesystem function related Operating System: Ubuntu 8.04 -PHP Version: 5.2.6 +PHP Version: 5.2.5 New Comment:
Please try using this CVS snapshot: http://snaps.php.net/php5.2-latest.tar.gz For Windows: http://windows.php.net/snapshots/ Previous Comments: ------------------------------------------------------------------------ [2008-10-20 17:24:08] mike at regexia dot com (This is my first attempt at fixing a bug, so please bear with me. :)) Patch available here: http://www.regexia.com/php/bug46217/bug46217.diff Test case as well: http://www.regexia.com/php/bug46217/bug46217.phpt Explanation: The initial pass on a field tries to skip whitespace. If php_mblen() returns -2 or -1 that character is skipped (as if it's whitespace). Regardless of locale, non-ASCII characters were returning -1 (invalid). My patch treats those characters as regular non-WS characters. This behavior seems to be consistent with non-ASCII handling in the middle of a CSV field. Enclosing the CSV field data in a quote or the like works around the issue. Hope this is clear and the correct protocol for submitting this patch. :) Mike ------------------------------------------------------------------------ [2008-10-02 13:23:21] brook73 at gmail dot com Please use this file http://dev.cs-cart.com/~brook/test.csv ------------------------------------------------------------------------ [2008-10-02 12:34:56] brook73 at gmail dot com Re: Example of the line in csv file: ΓΟΜ000112;Είδη Γραφής - Διόρθωσης///Γόμες;1.30;1.30;30 Sep 2008 00:00:00;N;ΘΡΥΛΟΣ3;ΕΑΕΕΑΕΑΕΕΑΕΑΕΑΕ Expected result: Debug [0/0]:Array ( [0] => ΓΟΜ000112 [1] => Είδη Γραφής - Διόρθωσης///Γόμες [2] => 1.30 [3] => 1.30 [4] => 30 Sep 2008 00:00:00 [5] => N [6] => ΘΡΥΛΟΣ3 [7] => ΕΑΕΕΑΕΑΕΕΑΕΑΕΑΕ ) Actual result Expected result: Debug [0/0]:Array ( [0] => 000112 [1] => - Διόρθωσης///Γόμες [2] => 1.30 [3] => 1.30 [4] => 30 Sep 2008 00:00:00 [5] => N [6] => 3 [7] => ) ------------------------------------------------------------------------ [2008-10-02 12:20:44] brook73 at gmail dot com Description: ------------ The "fgetcsv" function parses a file in the greek encoding (ISO-8859-7) incorrectly - a lot of symbols are ignored. The "setlocale" function has not helped either (we tried setlocale(LC_ALL, 'gr_GR'), setlocale(LC_ALL, 'gr_GR.ISO-8895-7')). Can anyone help us and explain the reason why it happens? The PHP version is 5.2.5. Reproduce code: --------------- <?php $max_line_size = 16384; $delimiter = ";"; $f = fopen('somefile.csv', 'rb'); while (($data = fgetcsv($f, $max_line_size, $delimiter)) !== false) { print_r($data); } ?> Example of the line in csv file: ΓΟΜ000112;Είδη Γραφής - Διόρθωσης///Γόμες;1.30;1.30;30 Sep 2008 00:00:00;N;ΘΡΥΛΟΣ3;ΕΑΕΕΑΕΑΕΕΑΕΑΕΑΕ; Expected result: ---------------- Debug [0/0]:Array ( [0] => ΓΟΜ000112 [1] => Είδη Γραφής - Διόρθωσης///Γόμες [2] => 1.30 [3] => 1.30 [4] => 30 Sep 2008 00:00:00 [5] => N [6] => ΘΡΥΛΟΣ3 [7] => ΕΑΕΕΑΕΑΕΕΑΕΑΕΑΕ ) Actual result: -------------- Debug [0/0]:Array ( [0] => 000112 [1] => - Διόρθωσης///Γόμες [2] => 1.30 [3] => 1.30 [4] => 30 Sep 2008 00:00:00 [5] => N [6] => 3 [7] => ) ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=46217&edit=1