ID: 46217
Updated by: [EMAIL PROTECTED]
-Summary: "fgetcsv" parses a csv file in the greek encoding
incorrectly.
Reported By: brook73 at gmail dot com
-Status: Open
+Status: Feedback
Bug Type: Filesystem function related
Operating System: Ubuntu 8.04
-PHP Version: 5.2.6
+PHP Version: 5.2.5
New Comment:
Please try using this CVS snapshot:
http://snaps.php.net/php5.2-latest.tar.gz
For Windows:
http://windows.php.net/snapshots/
Previous Comments:
------------------------------------------------------------------------
[2008-10-20 17:24:08] mike at regexia dot com
(This is my first attempt at fixing a bug, so please bear with me. :))
Patch available here:
http://www.regexia.com/php/bug46217/bug46217.diff
Test case as well: http://www.regexia.com/php/bug46217/bug46217.phpt
Explanation:
The initial pass on a field tries to skip whitespace. If php_mblen()
returns -2 or -1 that character is skipped (as if it's whitespace).
Regardless of locale, non-ASCII characters were returning -1 (invalid).
My patch treats those characters as regular non-WS characters. This
behavior seems to be consistent with non-ASCII handling in the middle of
a CSV field.
Enclosing the CSV field data in a quote or the like works around the
issue.
Hope this is clear and the correct protocol for submitting this patch.
:)
Mike
------------------------------------------------------------------------
[2008-10-02 13:23:21] brook73 at gmail dot com
Please use this file
http://dev.cs-cart.com/~brook/test.csv
------------------------------------------------------------------------
[2008-10-02 12:34:56] brook73 at gmail dot com
Re:
Example of the line in csv file:
ΓΟΜ000112;Είδη
Γραφής -
Διόρθωσης///Γόμες;1.30;1.30;30
Sep 2008
00:00:00;N;ΘΡΥΛΟΣ3;ΕΑΕΕΑΕΑΕΕΑΕΑΕΑΕ
Expected result:
Debug [0/0]:Array
(
[0] => ΓΟΜ000112
[1] => Είδη
Γραφής -
Διόρθωσης///Γόμες
[2] => 1.30
[3] => 1.30
[4] => 30 Sep 2008 00:00:00
[5] => N
[6] => ΘΡΥΛΟΣ3
[7] =>
ΕΑΕΕΑΕΑΕΕΑΕΑΕΑΕ
)
Actual result
Expected result:
Debug [0/0]:Array
(
[0] => 000112
[1] => -
Διόρθωσης///Γόμες
[2] => 1.30
[3] => 1.30
[4] => 30 Sep 2008 00:00:00
[5] => N
[6] => 3
[7] =>
)
------------------------------------------------------------------------
[2008-10-02 12:20:44] brook73 at gmail dot com
Description:
------------
The "fgetcsv" function parses a file in the greek encoding
(ISO-8859-7) incorrectly - a lot of symbols are ignored.
The "setlocale" function has not helped either (we tried
setlocale(LC_ALL, 'gr_GR'), setlocale(LC_ALL, 'gr_GR.ISO-8895-7')).
Can anyone help us and explain the reason why it happens?
The PHP version is 5.2.5.
Reproduce code:
---------------
<?php
$max_line_size = 16384;
$delimiter = ";";
$f = fopen('somefile.csv', 'rb');
while (($data = fgetcsv($f, $max_line_size, $delimiter)) !== false) {
print_r($data);
}
?>
Example of the line in csv file:
ΓΟΜ000112;Είδη
Γραφής -
Διόρθωσης///Γόμες;1.30;1.30;30
Sep 2008
00:00:00;N;ΘΡΥΛΟΣ3;ΕΑΕΕΑΕΑΕΕΑΕΑΕΑΕ;
Expected result:
----------------
Debug [0/0]:Array
(
[0] => ΓΟΜ000112
[1] => Είδη
Γραφής -
Διόρθωσης///Γόμες
[2] => 1.30
[3] => 1.30
[4] => 30 Sep 2008 00:00:00
[5] => N
[6] => ΘΡΥΛΟΣ3
[7] =>
ΕΑΕΕΑΕΑΕΕΑΕΑΕΑΕ
)
Actual result:
--------------
Debug [0/0]:Array
(
[0] => 000112
[1] => -
Διόρθωσης///Γόμες
[2] => 1.30
[3] => 1.30
[4] => 30 Sep 2008 00:00:00
[5] => N
[6] => 3
[7] =>
)
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=46217&edit=1