ID:               46217
 Updated by:       [EMAIL PROTECTED]
-Summary:          "fgetcsv" parses a csv file in the greek encoding
                   incorrectly.
 Reported By:      brook73 at gmail dot com
-Status:           Open
+Status:           Feedback
 Bug Type:         Filesystem function related
 Operating System: Ubuntu 8.04
-PHP Version:      5.2.6
+PHP Version:      5.2.5
 New Comment:

Please try using this CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows:

  http://windows.php.net/snapshots/




Previous Comments:
------------------------------------------------------------------------

[2008-10-20 17:24:08] mike at regexia dot com

(This is my first attempt at fixing a bug, so please bear with me. :))

Patch available here:
http://www.regexia.com/php/bug46217/bug46217.diff
Test case as well: http://www.regexia.com/php/bug46217/bug46217.phpt

Explanation:
The initial pass on a field tries to skip whitespace. If php_mblen()
returns -2 or -1 that character is skipped (as if it's whitespace).
Regardless of locale, non-ASCII characters were returning -1 (invalid).
My patch treats those characters as regular non-WS characters. This
behavior seems to be consistent with non-ASCII handling in the middle of
a CSV field.

Enclosing the CSV field data in a quote or the like works around the
issue.

Hope this is clear and the correct protocol for submitting this patch.
:)

Mike

------------------------------------------------------------------------

[2008-10-02 13:23:21] brook73 at gmail dot com

Please use this file

http://dev.cs-cart.com/~brook/test.csv

------------------------------------------------------------------------

[2008-10-02 12:34:56] brook73 at gmail dot com

Re:

Example of the line in csv file:

ΓΟΜ000112;Είδη
Γραφής -
Διόρθωσης///Γόμες;1.30;1.30;30
Sep 2008
00:00:00;N;ΘΡΥΛΟΣ3;ΕΑΕΕΑΕΑΕΕΑΕΑΕΑΕ

Expected result:

Debug [0/0]:Array
(
    [0] => ΓΟΜ000112
    [1] => Είδη
Γραφής -
Διόρθωσης///Γόμες
    [2] => 1.30
    [3] => 1.30
    [4] => 30 Sep 2008 00:00:00
    [5] => N
    [6] => ΘΡΥΛΟΣ3
    [7] =>
ΕΑΕΕΑΕΑΕΕΑΕΑΕΑΕ
)

Actual result

Expected result:

Debug [0/0]:Array
(
    [0] => 000112
    [1] => -
Διόρθωσης///Γόμες
    [2] => 1.30
    [3] => 1.30
    [4] => 30 Sep 2008 00:00:00
    [5] => N
    [6] => 3
    [7] => 
)

------------------------------------------------------------------------

[2008-10-02 12:20:44] brook73 at gmail dot com

Description:
------------
The "fgetcsv" function parses a file in the greek encoding 
(ISO-8859-7) incorrectly - a lot of symbols are ignored.

The "setlocale" function has not helped either (we tried
setlocale(LC_ALL, 'gr_GR'), setlocale(LC_ALL, 'gr_GR.ISO-8895-7')).

Can anyone help us and explain the reason why it happens?

The PHP version is 5.2.5.

Reproduce code:
---------------
<?php

$max_line_size = 16384;
$delimiter = ";";

$f = fopen('somefile.csv', 'rb');

while (($data = fgetcsv($f, $max_line_size, $delimiter)) !== false) {
  print_r($data);
}

?>

Example of the line in csv file:

&#915;&#927;&#924;000112;&#917;&#943;&#948;&#951;
&#915;&#961;&#945;&#966;&#942;&#962; -
&#916;&#953;&#972;&#961;&#952;&#969;&#963;&#951;&#962;///&#915;&#972;&#956;&#949;&#962;;1.30;1.30;30
Sep 2008
00:00:00;N;&#920;&#929;&#933;&#923;&#927;&#931;3;&#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#913;&#917;;

Expected result:
----------------
Debug [0/0]:Array
(
    [0] => &#915;&#927;&#924;000112
    [1] => &#917;&#943;&#948;&#951;
&#915;&#961;&#945;&#966;&#942;&#962; -
&#916;&#953;&#972;&#961;&#952;&#969;&#963;&#951;&#962;///&#915;&#972;&#956;&#949;&#962;
    [2] => 1.30
    [3] => 1.30
    [4] => 30 Sep 2008 00:00:00
    [5] => N
    [6] => &#920;&#929;&#933;&#923;&#927;&#931;3
    [7] =>
&#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#913;&#917;
)


Actual result:
--------------
Debug [0/0]:Array
(
    [0] => 000112
    [1] => -
&#916;&#953;&#972;&#961;&#952;&#969;&#963;&#951;&#962;///&#915;&#972;&#956;&#949;&#962;
    [2] => 1.30
    [3] => 1.30
    [4] => 30 Sep 2008 00:00:00
    [5] => N
    [6] => 3
    [7] => 
)



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=46217&edit=1

Reply via email to