Bug #48507 [Com]: fgetcsv() ignoring special characters
Edit report at https://bugs.php.net/bug.php?id=48507edit=1 ID: 48507 Comment by: jamie dot kahgee at gmail dot com Reported by:krynble at yahoo dot com dot br Summary:fgetcsv() ignoring special characters Status: Not a bug Type: Bug Package:Filesystem function related Operating System: Unix PHP Version:5.* Block user comment: N Private report: N New Comment: rasmus@php, eswald@middil I had the same problem, running fgetcsv from CLI showed no error and everything worked and output as expected. It was when I ran from through APACHE that I couldn't get my output to display. (same script, same file). (Ã) was the specific character I was dealing with at the start of a string that was not showing. After I tried setting my locale local in the script everything worked as expected through APACHE and my strings started parsing and displaying correctly. setlocale(LC_ALL, 'en_US.UTF-8'); Hopefully this can help you. Previous Comments: [2012-02-13 05:16:35] ras...@php.net eswald@middil, I am not able to reproduce your results with either en_US.UTF-8 nor C with a UTF8 input file: ~ echo $LANG en_US.UTF-8 ~ file utf8.txt utf8.txt: UTF-8 Unicode text ~ cat utf8.txt a,a,é,é,óú,óú,óú,óú ~ php -r print_r(fgetcsv(fopen('./utf8.txt','r'))); Array ( [0] = a [1] = a [2] = é [3] = é [4] = óú [5] = óú [6] = óú [7] = óú ) I don't see any corruption. I can understand problems with charsets that are not low-ascii compatible with a low-ascii delimiter, but I don't see why this UTF8 case would break. [2012-02-13 01:46:59] figura at hotbox dot ru setlocale() might solve the issue but I do not see any reason to set up dependence of this fgetcsv on locale settings. The format is straight and clear. Especially this feature confuses when the string is read in UTF-8 format. [2012-01-26 19:55:01] eswald at middil dot com Tested with LANG=C, input file encoding of UTF-8. Also tested with LANG=C, input file encoding of cp1252, with identical results, except that the output characters (what was left of them) were also cp1252. [2012-01-26 19:50:26] eswald at middil dot com Confirmed with php5 (5.3.6-13ubuntu3.2 on Oneiric Ocelot); can be worked around by quoting the value with quotation marks. For example, the line a,a,é,é,óú,óú,óú,óú yields array ( 0 = 'a', 1 = 'a', 2 = '', 3 = 'é', 4 = '', 5 = 'óú', 6 = 'ú', 7 = 'óú', ) Note the corruption in elements 2, 4, and 6, but not in their quoted counterparts 3, 5, and 7. [2012-01-18 11:53:48] tero dot tasanen at gmail dot com I can also confirm that this is an actual bug. File encoding UTF-8, locale settings are set correctly and characters like äöå are dropped from the beginning of the csv column. Tested with php versions 5.2.6, 5.2.10, 5.3.6 The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=48507 -- Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1
Bug #48507 [Com]: fgetcsv() ignoring special characters
Edit report at https://bugs.php.net/bug.php?id=48507edit=1 ID: 48507 Comment by: figura at hotbox dot ru Reported by:krynble at yahoo dot com dot br Summary:fgetcsv() ignoring special characters Status: Not a bug Type: Bug Package:Filesystem function related Operating System: Unix PHP Version:5.* Block user comment: N Private report: N New Comment: setlocale() might solve the issue but I do not see any reason to set up dependence of this fgetcsv on locale settings. The format is straight and clear. Especially this feature confuses when the string is read in UTF-8 format. Previous Comments: [2012-01-26 19:55:01] eswald at middil dot com Tested with LANG=C, input file encoding of UTF-8. Also tested with LANG=C, input file encoding of cp1252, with identical results, except that the output characters (what was left of them) were also cp1252. [2012-01-26 19:50:26] eswald at middil dot com Confirmed with php5 (5.3.6-13ubuntu3.2 on Oneiric Ocelot); can be worked around by quoting the value with quotation marks. For example, the line a,a,é,é,óú,óú,óú,óú yields array ( 0 = 'a', 1 = 'a', 2 = '', 3 = 'é', 4 = '', 5 = 'óú', 6 = 'ú', 7 = 'óú', ) Note the corruption in elements 2, 4, and 6, but not in their quoted counterparts 3, 5, and 7. [2012-01-18 11:53:48] tero dot tasanen at gmail dot com I can also confirm that this is an actual bug. File encoding UTF-8, locale settings are set correctly and characters like äöå are dropped from the beginning of the csv column. Tested with php versions 5.2.6, 5.2.10, 5.3.6 [2011-10-28 08:33:25] peter dot e dot lind at gmail dot com This is definitely still a bug - my locale is set to da_DK.utf8, the file I'm trying to read is in UTF8 (confirmed with a hex-editor but in fact does not matter - the behaviour is the same, UTF8 or ISO-8859-1) yet special characters are still thrown away when they are first in a field [2011-10-18 13:59:30] me at monicag dot it Quoting my fellows above: how comes this is not a bug? The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=48507 -- Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1
Bug #48507 [Com]: fgetcsv() ignoring special characters
Edit report at https://bugs.php.net/bug.php?id=48507edit=1 ID: 48507 Comment by: eswald at middil dot com Reported by:krynble at yahoo dot com dot br Summary:fgetcsv() ignoring special characters Status: Bogus Type: Bug Package:Filesystem function related Operating System: Unix PHP Version:5.* Block user comment: N Private report: N New Comment: Confirmed with php5 (5.3.6-13ubuntu3.2 on Oneiric Ocelot); can be worked around by quoting the value with quotation marks. For example, the line a,a,é,é,óú,óú,óú,óú yields array ( 0 = 'a', 1 = 'a', 2 = '', 3 = 'é', 4 = '', 5 = 'óú', 6 = 'ú', 7 = 'óú', ) Note the corruption in elements 2, 4, and 6, but not in their quoted counterparts 3, 5, and 7. Previous Comments: [2012-01-18 11:53:48] tero dot tasanen at gmail dot com I can also confirm that this is an actual bug. File encoding UTF-8, locale settings are set correctly and characters like äöå are dropped from the beginning of the csv column. Tested with php versions 5.2.6, 5.2.10, 5.3.6 [2011-10-28 08:33:25] peter dot e dot lind at gmail dot com This is definitely still a bug - my locale is set to da_DK.utf8, the file I'm trying to read is in UTF8 (confirmed with a hex-editor but in fact does not matter - the behaviour is the same, UTF8 or ISO-8859-1) yet special characters are still thrown away when they are first in a field [2011-10-18 13:59:30] me at monicag dot it Quoting my fellows above: how comes this is not a bug? [2011-10-10 10:03:58] ghosh at q-one dot com Sorry. I don't understand why this isn't a bug either. Could someone please elaborate? I tried setting all different kinds of locale to no avail. The first letter of a string starting with a UTF-8 character is always missing. IMHO, fgetcsv should work as a simple string operation (or - whatever weird things it does right now - at least have a parameter to do so - count this as a feature request if you wish). I think, the current behavior is totally confusing. For instance, I don't understand why only the first character is missing but the problem doesnt appear if a character is in the middle of a string. [2011-07-17 16:19:28] max dot wildgrube at web dot de The problem does also appears if the special char is preceded by a blank. This blank also disappears. I use this ugly workaround: 1. first reading the complete csv file into a variable: $import 2. $import = preg_replace ({(^|\t)([â¬-ÿ ])}m, $1~~$2, $import); 3. after fgetcsv; for each $field of the row array: $field = str_replace ('~~', '', $field); This means: before using fgetcsv inserting a magic sequence (e.g. ~~) on the beginning of a field which begins with a blank or a special char; after parsing with fgetcsv removing it from each field. Max. The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=48507 -- Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1
Bug #48507 [Com]: fgetcsv() ignoring special characters
Edit report at https://bugs.php.net/bug.php?id=48507edit=1 ID: 48507 Comment by: eswald at middil dot com Reported by:krynble at yahoo dot com dot br Summary:fgetcsv() ignoring special characters Status: Bogus Type: Bug Package:Filesystem function related Operating System: Unix PHP Version:5.* Block user comment: N Private report: N New Comment: Tested with LANG=C, input file encoding of UTF-8. Also tested with LANG=C, input file encoding of cp1252, with identical results, except that the output characters (what was left of them) were also cp1252. Previous Comments: [2012-01-26 19:50:26] eswald at middil dot com Confirmed with php5 (5.3.6-13ubuntu3.2 on Oneiric Ocelot); can be worked around by quoting the value with quotation marks. For example, the line a,a,é,é,óú,óú,óú,óú yields array ( 0 = 'a', 1 = 'a', 2 = '', 3 = 'é', 4 = '', 5 = 'óú', 6 = 'ú', 7 = 'óú', ) Note the corruption in elements 2, 4, and 6, but not in their quoted counterparts 3, 5, and 7. [2012-01-18 11:53:48] tero dot tasanen at gmail dot com I can also confirm that this is an actual bug. File encoding UTF-8, locale settings are set correctly and characters like äöå are dropped from the beginning of the csv column. Tested with php versions 5.2.6, 5.2.10, 5.3.6 [2011-10-28 08:33:25] peter dot e dot lind at gmail dot com This is definitely still a bug - my locale is set to da_DK.utf8, the file I'm trying to read is in UTF8 (confirmed with a hex-editor but in fact does not matter - the behaviour is the same, UTF8 or ISO-8859-1) yet special characters are still thrown away when they are first in a field [2011-10-18 13:59:30] me at monicag dot it Quoting my fellows above: how comes this is not a bug? [2011-10-10 10:03:58] ghosh at q-one dot com Sorry. I don't understand why this isn't a bug either. Could someone please elaborate? I tried setting all different kinds of locale to no avail. The first letter of a string starting with a UTF-8 character is always missing. IMHO, fgetcsv should work as a simple string operation (or - whatever weird things it does right now - at least have a parameter to do so - count this as a feature request if you wish). I think, the current behavior is totally confusing. For instance, I don't understand why only the first character is missing but the problem doesnt appear if a character is in the middle of a string. The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=48507 -- Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1
Bug #48507 [Com]: fgetcsv() ignoring special characters
Edit report at https://bugs.php.net/bug.php?id=48507edit=1 ID: 48507 Comment by: tero dot tasanen at gmail dot com Reported by:krynble at yahoo dot com dot br Summary:fgetcsv() ignoring special characters Status: Bogus Type: Bug Package:Filesystem function related Operating System: Unix PHP Version:5.* Block user comment: N Private report: N New Comment: I can also confirm that this is an actual bug. File encoding UTF-8, locale settings are set correctly and characters like äöå are dropped from the beginning of the csv column. Tested with php versions 5.2.6, 5.2.10, 5.3.6 Previous Comments: [2011-10-28 08:33:25] peter dot e dot lind at gmail dot com This is definitely still a bug - my locale is set to da_DK.utf8, the file I'm trying to read is in UTF8 (confirmed with a hex-editor but in fact does not matter - the behaviour is the same, UTF8 or ISO-8859-1) yet special characters are still thrown away when they are first in a field [2011-10-18 13:59:30] me at monicag dot it Quoting my fellows above: how comes this is not a bug? [2011-10-10 10:03:58] ghosh at q-one dot com Sorry. I don't understand why this isn't a bug either. Could someone please elaborate? I tried setting all different kinds of locale to no avail. The first letter of a string starting with a UTF-8 character is always missing. IMHO, fgetcsv should work as a simple string operation (or - whatever weird things it does right now - at least have a parameter to do so - count this as a feature request if you wish). I think, the current behavior is totally confusing. For instance, I don't understand why only the first character is missing but the problem doesnt appear if a character is in the middle of a string. [2011-07-17 16:19:28] max dot wildgrube at web dot de The problem does also appears if the special char is preceded by a blank. This blank also disappears. I use this ugly workaround: 1. first reading the complete csv file into a variable: $import 2. $import = preg_replace ({(^|\t)([â¬-ÿ ])}m, $1~~$2, $import); 3. after fgetcsv; for each $field of the row array: $field = str_replace ('~~', '', $field); This means: before using fgetcsv inserting a magic sequence (e.g. ~~) on the beginning of a field which begins with a blank or a special char; after parsing with fgetcsv removing it from each field. Max. [2011-07-08 08:39:50] php-bug-48507 at bsrealm dot net This IS a bug. Whatever locale is, I expect this function to read everything between delimiter characters without stripping the contents. Besides, docs say that files in one-byte encoding would read wrong, and there is a different case. This bug causes serious portability issue. In my case, this function was used to read custom database that was storing descriptions entered by users. Some descriptions were in utf-8 enconding. Function just had to read whatever was between delimiter characters and it worked like that on Windows hosting and stopped working after moving to Unix hosting. Note, file itself is not utf-8 encoded and it should not be. It is not related to locale. It must read data, even if it's binary, between delimiters. The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=48507 -- Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1
Bug #48507 [Com]: fgetcsv() ignoring special characters
Edit report at https://bugs.php.net/bug.php?id=48507edit=1 ID: 48507 Comment by: peter dot e dot lind at gmail dot com Reported by:krynble at yahoo dot com dot br Summary:fgetcsv() ignoring special characters Status: Bogus Type: Bug Package:Filesystem function related Operating System: Unix PHP Version:5.* Block user comment: N Private report: N New Comment: This is definitely still a bug - my locale is set to da_DK.utf8, the file I'm trying to read is in UTF8 (confirmed with a hex-editor but in fact does not matter - the behaviour is the same, UTF8 or ISO-8859-1) yet special characters are still thrown away when they are first in a field Previous Comments: [2011-10-18 13:59:30] me at monicag dot it Quoting my fellows above: how comes this is not a bug? [2011-10-10 10:03:58] ghosh at q-one dot com Sorry. I don't understand why this isn't a bug either. Could someone please elaborate? I tried setting all different kinds of locale to no avail. The first letter of a string starting with a UTF-8 character is always missing. IMHO, fgetcsv should work as a simple string operation (or - whatever weird things it does right now - at least have a parameter to do so - count this as a feature request if you wish). I think, the current behavior is totally confusing. For instance, I don't understand why only the first character is missing but the problem doesnt appear if a character is in the middle of a string. [2011-07-17 16:19:28] max dot wildgrube at web dot de The problem does also appears if the special char is preceded by a blank. This blank also disappears. I use this ugly workaround: 1. first reading the complete csv file into a variable: $import 2. $import = preg_replace ({(^|\t)([â¬-ÿ ])}m, $1~~$2, $import); 3. after fgetcsv; for each $field of the row array: $field = str_replace ('~~', '', $field); This means: before using fgetcsv inserting a magic sequence (e.g. ~~) on the beginning of a field which begins with a blank or a special char; after parsing with fgetcsv removing it from each field. Max. [2011-07-08 08:39:50] php-bug-48507 at bsrealm dot net This IS a bug. Whatever locale is, I expect this function to read everything between delimiter characters without stripping the contents. Besides, docs say that files in one-byte encoding would read wrong, and there is a different case. This bug causes serious portability issue. In my case, this function was used to read custom database that was storing descriptions entered by users. Some descriptions were in utf-8 enconding. Function just had to read whatever was between delimiter characters and it worked like that on Windows hosting and stopped working after moving to Unix hosting. Note, file itself is not utf-8 encoded and it should not be. It is not related to locale. It must read data, even if it's binary, between delimiters. [2011-02-26 02:46:32] gjorgjioski at gmail dot com This is short example: kategorija Å¡irina platiÅ¡Ä Å¡tevilo read: kategorija irina platiÅ¡Ä tevilo expected: kategorija Å¡irina platiÅ¡Ä Å¡tevilo The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=48507 -- Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1
Bug #48507 [Com]: fgetcsv() ignoring special characters
Edit report at https://bugs.php.net/bug.php?id=48507edit=1 ID: 48507 Comment by: me at monicag dot it Reported by:krynble at yahoo dot com dot br Summary:fgetcsv() ignoring special characters Status: Bogus Type: Bug Package:Filesystem function related Operating System: Unix PHP Version:5.* Block user comment: N Private report: N New Comment: Quoting my fellows above: how comes this is not a bug? Previous Comments: [2011-10-10 10:03:58] ghosh at q-one dot com Sorry. I don't understand why this isn't a bug either. Could someone please elaborate? I tried setting all different kinds of locale to no avail. The first letter of a string starting with a UTF-8 character is always missing. IMHO, fgetcsv should work as a simple string operation (or - whatever weird things it does right now - at least have a parameter to do so - count this as a feature request if you wish). I think, the current behavior is totally confusing. For instance, I don't understand why only the first character is missing but the problem doesnt appear if a character is in the middle of a string. [2011-07-17 16:19:28] max dot wildgrube at web dot de The problem does also appears if the special char is preceded by a blank. This blank also disappears. I use this ugly workaround: 1. first reading the complete csv file into a variable: $import 2. $import = preg_replace ({(^|\t)([â¬-ÿ ])}m, $1~~$2, $import); 3. after fgetcsv; for each $field of the row array: $field = str_replace ('~~', '', $field); This means: before using fgetcsv inserting a magic sequence (e.g. ~~) on the beginning of a field which begins with a blank or a special char; after parsing with fgetcsv removing it from each field. Max. [2011-07-08 08:39:50] php-bug-48507 at bsrealm dot net This IS a bug. Whatever locale is, I expect this function to read everything between delimiter characters without stripping the contents. Besides, docs say that files in one-byte encoding would read wrong, and there is a different case. This bug causes serious portability issue. In my case, this function was used to read custom database that was storing descriptions entered by users. Some descriptions were in utf-8 enconding. Function just had to read whatever was between delimiter characters and it worked like that on Windows hosting and stopped working after moving to Unix hosting. Note, file itself is not utf-8 encoded and it should not be. It is not related to locale. It must read data, even if it's binary, between delimiters. [2011-02-26 02:46:32] gjorgjioski at gmail dot com This is short example: kategorija Å¡irina platiÅ¡Ä Å¡tevilo read: kategorija irina platiÅ¡Ä tevilo expected: kategorija Å¡irina platiÅ¡Ä Å¡tevilo [2011-02-26 02:36:32] gjorgjioski at gmail dot com This bug occurs also when file is in UTF8 (tab delimited file using Å¡,Ä characters). I can provide an example. The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=48507 -- Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1
Bug #48507 [Com]: fgetcsv() ignoring special characters
Edit report at https://bugs.php.net/bug.php?id=48507edit=1 ID: 48507 Comment by: ghosh at q-one dot com Reported by:krynble at yahoo dot com dot br Summary:fgetcsv() ignoring special characters Status: Bogus Type: Bug Package:Filesystem function related Operating System: Unix PHP Version:5.* Block user comment: N Private report: N New Comment: Sorry. I don't understand why this isn't a bug either. Could someone please elaborate? I tried setting all different kinds of locale to no avail. The first letter of a string starting with a UTF-8 character is always missing. IMHO, fgetcsv should work as a simple string operation (or - whatever weird things it does right now - at least have a parameter to do so - count this as a feature request if you wish). I think, the current behavior is totally confusing. For instance, I don't understand why only the first character is missing but the problem doesnt appear if a character is in the middle of a string. Previous Comments: [2011-07-17 16:19:28] max dot wildgrube at web dot de The problem does also appears if the special char is preceded by a blank. This blank also disappears. I use this ugly workaround: 1. first reading the complete csv file into a variable: $import 2. $import = preg_replace ({(^|\t)([â¬-ÿ ])}m, $1~~$2, $import); 3. after fgetcsv; for each $field of the row array: $field = str_replace ('~~', '', $field); This means: before using fgetcsv inserting a magic sequence (e.g. ~~) on the beginning of a field which begins with a blank or a special char; after parsing with fgetcsv removing it from each field. Max. [2011-07-08 08:39:50] php-bug-48507 at bsrealm dot net This IS a bug. Whatever locale is, I expect this function to read everything between delimiter characters without stripping the contents. Besides, docs say that files in one-byte encoding would read wrong, and there is a different case. This bug causes serious portability issue. In my case, this function was used to read custom database that was storing descriptions entered by users. Some descriptions were in utf-8 enconding. Function just had to read whatever was between delimiter characters and it worked like that on Windows hosting and stopped working after moving to Unix hosting. Note, file itself is not utf-8 encoded and it should not be. It is not related to locale. It must read data, even if it's binary, between delimiters. [2011-02-26 02:46:32] gjorgjioski at gmail dot com This is short example: kategorija Å¡irina platiÅ¡Ä Å¡tevilo read: kategorija irina platiÅ¡Ä tevilo expected: kategorija Å¡irina platiÅ¡Ä Å¡tevilo [2011-02-26 02:36:32] gjorgjioski at gmail dot com This bug occurs also when file is in UTF8 (tab delimited file using Å¡,Ä characters). I can provide an example. [2010-05-19 13:39:52] pahan at hubbitus dot spb dot su Quote from the docs: Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function. Ok, bug documented as are read wrong by this function is better then nothing. But do you plan fix this wrong behaviour? The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=48507 -- Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1
Bug #48507 [Com]: fgetcsv() ignoring special characters
Edit report at https://bugs.php.net/bug.php?id=48507edit=1 ID: 48507 Comment by: max dot wildgrube at web dot de Reported by:krynble at yahoo dot com dot br Summary:fgetcsv() ignoring special characters Status: Bogus Type: Bug Package:Filesystem function related Operating System: Unix PHP Version:5.* Block user comment: N Private report: N New Comment: The problem does also appears if the special char is preceded by a blank. This blank also disappears. I use this ugly workaround: 1. first reading the complete csv file into a variable: $import 2. $import = preg_replace ({(^|\t)([â¬-ÿ ])}m, $1~~$2, $import); 3. after fgetcsv; for each $field of the row array: $field = str_replace ('~~', '', $field); This means: before using fgetcsv inserting a magic sequence (e.g. ~~) on the beginning of a field which begins with a blank or a special char; after parsing with fgetcsv removing it from each field. Max. Previous Comments: [2011-07-08 08:39:50] php-bug-48507 at bsrealm dot net This IS a bug. Whatever locale is, I expect this function to read everything between delimiter characters without stripping the contents. Besides, docs say that files in one-byte encoding would read wrong, and there is a different case. This bug causes serious portability issue. In my case, this function was used to read custom database that was storing descriptions entered by users. Some descriptions were in utf-8 enconding. Function just had to read whatever was between delimiter characters and it worked like that on Windows hosting and stopped working after moving to Unix hosting. Note, file itself is not utf-8 encoded and it should not be. It is not related to locale. It must read data, even if it's binary, between delimiters. [2011-02-26 02:46:32] gjorgjioski at gmail dot com This is short example: kategorija Å¡irina platiÅ¡Ä Å¡tevilo read: kategorija irina platiÅ¡Ä tevilo expected: kategorija Å¡irina platiÅ¡Ä Å¡tevilo [2011-02-26 02:36:32] gjorgjioski at gmail dot com This bug occurs also when file is in UTF8 (tab delimited file using Å¡,Ä characters). I can provide an example. [2010-05-19 13:39:52] pahan at hubbitus dot spb dot su Quote from the docs: Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function. Ok, bug documented as are read wrong by this function is better then nothing. But do you plan fix this wrong behaviour? [2010-05-18 11:03:42] m...@php.net Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.php.net/manual/ and the instructions on how to report a bug at http://bugs.php.net/how-to-report.php Quote from the docs: Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function. The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=48507 -- Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1
Bug #48507 [Com]: fgetcsv() ignoring special characters
Edit report at https://bugs.php.net/bug.php?id=48507edit=1 ID: 48507 Comment by: php-bug-48507 at bsrealm dot net Reported by:krynble at yahoo dot com dot br Summary:fgetcsv() ignoring special characters Status: Bogus Type: Bug Package:Filesystem function related Operating System: Unix PHP Version:5.* Block user comment: N Private report: N New Comment: This IS a bug. Whatever locale is, I expect this function to read everything between delimiter characters without stripping the contents. Besides, docs say that files in one-byte encoding would read wrong, and there is a different case. This bug causes serious portability issue. In my case, this function was used to read custom database that was storing descriptions entered by users. Some descriptions were in utf-8 enconding. Function just had to read whatever was between delimiter characters and it worked like that on Windows hosting and stopped working after moving to Unix hosting. Note, file itself is not utf-8 encoded and it should not be. It is not related to locale. It must read data, even if it's binary, between delimiters. Previous Comments: [2011-02-26 02:46:32] gjorgjioski at gmail dot com This is short example: kategorija Å¡irina platiÅ¡Ä Å¡tevilo read: kategorija irina platiÅ¡Ä tevilo expected: kategorija Å¡irina platiÅ¡Ä Å¡tevilo [2011-02-26 02:36:32] gjorgjioski at gmail dot com This bug occurs also when file is in UTF8 (tab delimited file using Å¡,Ä characters). I can provide an example. [2010-05-19 13:39:52] pahan at hubbitus dot spb dot su Quote from the docs: Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function. Ok, bug documented as are read wrong by this function is better then nothing. But do you plan fix this wrong behaviour? [2010-05-18 11:03:42] m...@php.net Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.php.net/manual/ and the instructions on how to report a bug at http://bugs.php.net/how-to-report.php Quote from the docs: Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function. [2009-12-12 11:40:29] pahan at hubbitus dot spb dot su Sorry for duplicate (#50456 is my), but in it, additionally to there described problem in fgetcsv I also suggest fix fputcvs to allow [force] enclosing single words in field. Off course it does *not* solve this problem of incorrect fgetcsv parsing, because RFC allow not quoted values ( http://www.faqs.org/rfcs/rfc4180.html , section 2.5 ), but, it is make pair fputcsv/fgetcsv as minimum compatible in PHP implementation. The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=48507 -- Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1
Bug #48507 [Com]: fgetcsv() ignoring special characters
Edit report at http://bugs.php.net/bug.php?id=48507edit=1 ID: 48507 Comment by: gjorgjioski at gmail dot com Reported by:krynble at yahoo dot com dot br Summary:fgetcsv() ignoring special characters Status: Bogus Type: Bug Package:Filesystem function related Operating System: Unix PHP Version:5.* Block user comment: N Private report: N New Comment: This bug occurs also when file is in UTF8 (tab delimited file using Å¡,Ä characters). I can provide an example. Previous Comments: [2010-05-19 13:39:52] pahan at hubbitus dot spb dot su Quote from the docs: Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function. Ok, bug documented as are read wrong by this function is better then nothing. But do you plan fix this wrong behaviour? [2010-05-18 11:03:42] m...@php.net Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.php.net/manual/ and the instructions on how to report a bug at http://bugs.php.net/how-to-report.php Quote from the docs: Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function. [2009-12-12 11:40:29] pahan at hubbitus dot spb dot su Sorry for duplicate (#50456 is my), but in it, additionally to there described problem in fgetcsv I also suggest fix fputcvs to allow [force] enclosing single words in field. Off course it does *not* solve this problem of incorrect fgetcsv parsing, because RFC allow not quoted values ( http://www.faqs.org/rfcs/rfc4180.html , section 2.5 ), but, it is make pair fputcsv/fgetcsv as minimum compatible in PHP implementation. [2009-12-12 01:33:51] j...@php.net See also bug #50456 [2009-09-22 15:09:20] phofstetter at sensational dot ch below you'll find a small script which shows how to implement a user filter that can be used to on-the-fly utf8-encode the data so that fgetcsv is happy and returns correct output even if the first character in a field has its high-bit set and is not valid utf-8: Remember: This is a workaround and impacts performance. This is not a valid fix for the bug. I didn't yet have time to deeply look into the C implementation for fgetcsv, but all these calls to php_mblen() feel suspicious to me. I'll try and have a look into this later today, but for now, I'm just glad I have this workaround (quickly hacked together - keep that in mind): ?php class utf8encode_filter extends php_user_filter { function is_utf8($string){ return preg_match('%(?: [\xC2-\xDF][\x80-\xBF]# non-overlong 2-byte |\xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs |[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte |\xED[\x80-\x9F][\x80-\xBF] # excluding surrogates |\xF0[\x90-\xBF][\x80-\xBF]{2}# planes 1-3 |[\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 |\xF4[\x80-\x8F][\x80-\xBF]{2}# plane 16 )+%xs', $string); } function filter($in, $out, $consumed, $closing) { while ($bucket = stream_bucket_make_writeable($in)) { if (!$this-is_utf8($bucket-data)) $bucket-data = utf8_encode($bucket-data); $consumed += $bucket-datalen; stream_bucket_append($out, $bucket); } return PSFS_PASS_ON; } } /* Register our filter with PHP */ stream_filter_register(utf8encode, utf8encode_filter) or die(Failed to register filter); $fp = fopen($_SERVER['argv'][1], r); /* Attach the registered filter to the stream just opened */ stream_filter_prepend($fp, utf8encode); while($data = fgetcsv($fp, 0, ';', '')) print_r($data); fclose($fp); The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/bug.php?id=48507 -- Edit this bug report at http://bugs.php.net/bug.php?id=48507edit=1
Bug #48507 [Com]: fgetcsv() ignoring special characters
Edit report at http://bugs.php.net/bug.php?id=48507edit=1 ID: 48507 Comment by: gjorgjioski at gmail dot com Reported by:krynble at yahoo dot com dot br Summary:fgetcsv() ignoring special characters Status: Bogus Type: Bug Package:Filesystem function related Operating System: Unix PHP Version:5.* Block user comment: N Private report: N New Comment: This is short example: kategorija Å¡irina platiÅ¡Ä Å¡tevilo read: kategorija irina platiÅ¡Ä tevilo expected: kategorija Å¡irina platiÅ¡Ä Å¡tevilo Previous Comments: [2011-02-26 02:36:32] gjorgjioski at gmail dot com This bug occurs also when file is in UTF8 (tab delimited file using Å¡,Ä characters). I can provide an example. [2010-05-19 13:39:52] pahan at hubbitus dot spb dot su Quote from the docs: Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function. Ok, bug documented as are read wrong by this function is better then nothing. But do you plan fix this wrong behaviour? [2010-05-18 11:03:42] m...@php.net Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.php.net/manual/ and the instructions on how to report a bug at http://bugs.php.net/how-to-report.php Quote from the docs: Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function. [2009-12-12 11:40:29] pahan at hubbitus dot spb dot su Sorry for duplicate (#50456 is my), but in it, additionally to there described problem in fgetcsv I also suggest fix fputcvs to allow [force] enclosing single words in field. Off course it does *not* solve this problem of incorrect fgetcsv parsing, because RFC allow not quoted values ( http://www.faqs.org/rfcs/rfc4180.html , section 2.5 ), but, it is make pair fputcsv/fgetcsv as minimum compatible in PHP implementation. [2009-12-12 01:33:51] j...@php.net See also bug #50456 The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/bug.php?id=48507 -- Edit this bug report at http://bugs.php.net/bug.php?id=48507edit=1
Bug #48507 [Com]: fgetcsv() ignoring special characters
Edit report at http://bugs.php.net/bug.php?id=48507edit=1 ID: 48507 Comment by: pahan at hubbitus dot spb dot su Reported by: krynble at yahoo dot com dot br Summary: fgetcsv() ignoring special characters Status: Bogus Type: Bug Package: Filesystem function related Operating System: Unix PHP Version: 5.* New Comment: Quote from the docs: Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function. Ok, bug documented as are read wrong by this function is better then nothing. But do you plan fix this wrong behaviour? Previous Comments: [2010-05-18 11:03:42] m...@php.net Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.php.net/manual/ and the instructions on how to report a bug at http://bugs.php.net/how-to-report.php Quote from the docs: Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function. [2009-12-12 11:40:29] pahan at hubbitus dot spb dot su Sorry for duplicate (#50456 is my), but in it, additionally to there described problem in fgetcsv I also suggest fix fputcvs to allow [force] enclosing single words in field. Off course it does *not* solve this problem of incorrect fgetcsv parsing, because RFC allow not quoted values ( http://www.faqs.org/rfcs/rfc4180.html , section 2.5 ), but, it is make pair fputcsv/fgetcsv as minimum compatible in PHP implementation. [2009-12-12 01:33:51] j...@php.net See also bug #50456 [2009-09-22 15:09:20] phofstetter at sensational dot ch below you'll find a small script which shows how to implement a user filter that can be used to on-the-fly utf8-encode the data so that fgetcsv is happy and returns correct output even if the first character in a field has its high-bit set and is not valid utf-8: Remember: This is a workaround and impacts performance. This is not a valid fix for the bug. I didn't yet have time to deeply look into the C implementation for fgetcsv, but all these calls to php_mblen() feel suspicious to me. I'll try and have a look into this later today, but for now, I'm just glad I have this workaround (quickly hacked together - keep that in mind): ?php class utf8encode_filter extends php_user_filter { function is_utf8($string){ return preg_match('%(?: [\xC2-\xDF][\x80-\xBF]# non-overlong 2-byte |\xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs |[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte |\xED[\x80-\x9F][\x80-\xBF] # excluding surrogates |\xF0[\x90-\xBF][\x80-\xBF]{2}# planes 1-3 |[\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 |\xF4[\x80-\x8F][\x80-\xBF]{2}# plane 16 )+%xs', $string); } function filter($in, $out, $consumed, $closing) { while ($bucket = stream_bucket_make_writeable($in)) { if (!$this-is_utf8($bucket-data)) $bucket-data = utf8_encode($bucket-data); $consumed += $bucket-datalen; stream_bucket_append($out, $bucket); } return PSFS_PASS_ON; } } /* Register our filter with PHP */ stream_filter_register(utf8encode, utf8encode_filter) or die(Failed to register filter); $fp = fopen($_SERVER['argv'][1], r); /* Attach the registered filter to the stream just opened */ stream_filter_prepend($fp, utf8encode); while($data = fgetcsv($fp, 0, ';', '')) print_r($data); fclose($fp); [2009-09-22 14:45:22] phofstetter at sensational dot ch I was looking into this (after having been bitten by it) and I can add another tidbit that might help tracking this down: The bug doesn't happen if the file fgetcsv() is reading is in UTF-8-format. I have created a test-file in ISO-8859-1 and then used file_put_contents(utf8encode(file_get_contents())) to create the UTF8-version of it (explaining this here because I'm not sure whether this would write a BOM or not - probably not though). That version could be read correctly. I'm now writing a stream filter that does the UTF-8 conversion on the fly to hook that in between the file and fgetcsv() - while I would lose a bit of performance, in my case, this is the cleanest workaround. The remainder of the comments for this report are too long. To view the
#48507 [Com]: fgetcsv() ignoring special characters
ID: 48507 Comment by: pahan at hubbitus dot spb dot su Reported By: krynble at yahoo dot com dot br Status: Verified Bug Type: Filesystem function related Operating System: Unix PHP Version: 5.* New Comment: Sorry for duplicate (#50456 is my), but in it, additionally to there described problem in fgetcsv I also suggest fix fputcvs to allow [force] enclosing single words in field. Off course it does *not* solve this problem of incorrect fgetcsv parsing, because RFC allow not quoted values ( http://www.faqs.org/rfcs/rfc4180.html , section 2.5 ), but, it is make pair fputcsv/fgetcsv as minimum compatible in PHP implementation. Previous Comments: [2009-12-12 01:33:51] j...@php.net See also bug #50456 [2009-09-22 15:09:20] phofstetter at sensational dot ch below you'll find a small script which shows how to implement a user filter that can be used to on-the-fly utf8-encode the data so that fgetcsv is happy and returns correct output even if the first character in a field has its high-bit set and is not valid utf-8: Remember: This is a workaround and impacts performance. This is not a valid fix for the bug. I didn't yet have time to deeply look into the C implementation for fgetcsv, but all these calls to php_mblen() feel suspicious to me. I'll try and have a look into this later today, but for now, I'm just glad I have this workaround (quickly hacked together - keep that in mind): ?php class utf8encode_filter extends php_user_filter { function is_utf8($string){ return preg_match('%(?: [\xC2-\xDF][\x80-\xBF]# non-overlong 2-byte |\xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs |[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte |\xED[\x80-\x9F][\x80-\xBF] # excluding surrogates |\xF0[\x90-\xBF][\x80-\xBF]{2}# planes 1-3 |[\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 |\xF4[\x80-\x8F][\x80-\xBF]{2}# plane 16 )+%xs', $string); } function filter($in, $out, $consumed, $closing) { while ($bucket = stream_bucket_make_writeable($in)) { if (!$this-is_utf8($bucket-data)) $bucket-data = utf8_encode($bucket-data); $consumed += $bucket-datalen; stream_bucket_append($out, $bucket); } return PSFS_PASS_ON; } } /* Register our filter with PHP */ stream_filter_register(utf8encode, utf8encode_filter) or die(Failed to register filter); $fp = fopen($_SERVER['argv'][1], r); /* Attach the registered filter to the stream just opened */ stream_filter_prepend($fp, utf8encode); while($data = fgetcsv($fp, 0, ';', '')) print_r($data); fclose($fp); [2009-09-22 14:45:22] phofstetter at sensational dot ch I was looking into this (after having been bitten by it) and I can add another tidbit that might help tracking this down: The bug doesn't happen if the file fgetcsv() is reading is in UTF-8-format. I have created a test-file in ISO-8859-1 and then used file_put_contents(utf8encode(file_get_contents())) to create the UTF8-version of it (explaining this here because I'm not sure whether this would write a BOM or not - probably not though). That version could be read correctly. I'm now writing a stream filter that does the UTF-8 conversion on the fly to hook that in between the file and fgetcsv() - while I would lose a bit of performance, in my case, this is the cleanest workaround. [2009-09-21 18:11:47] dmulryan at calendarwiz dot com Note: Previous comment has error where URL is shown in array element. This is not a bug but my error in the example. Bug is in special characters. [2009-09-21 18:07:42] dmulryan at calendarwiz dot com Similar problem when parsing the following line: 0909211132,1,ØÊááàÑ,äÆæç,CForm,Y,1,1,1,97.95.176.240,2530 which produces empty array elements for fields with special characters: Array ( [0] = 0909211132 [1] = 1 [2] = [3] = [4] = URL [5] = Y [6] = 1 [7] = 1 [8] = 1 [9] = 97.95.176.240 [10] = 2530 ) The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/48507 -- Edit this bug report at http://bugs.php.net/?id=48507edit=1
#48507 [Com]: fgetcsv() ignoring special characters
ID: 48507 Comment by: phofstetter at sensational dot ch Reported By: krynble at yahoo dot com dot br Status: Verified Bug Type: Filesystem function related Operating System: Unix PHP Version: 5.2.9 New Comment: I was looking into this (after having been bitten by it) and I can add another tidbit that might help tracking this down: The bug doesn't happen if the file fgetcsv() is reading is in UTF-8-format. I have created a test-file in ISO-8859-1 and then used file_put_contents(utf8encode(file_get_contents())) to create the UTF8-version of it (explaining this here because I'm not sure whether this would write a BOM or not - probably not though). That version could be read correctly. I'm now writing a stream filter that does the UTF-8 conversion on the fly to hook that in between the file and fgetcsv() - while I would lose a bit of performance, in my case, this is the cleanest workaround. Previous Comments: [2009-09-21 18:11:47] dmulryan at calendarwiz dot com Note: Previous comment has error where URL is shown in array element. This is not a bug but my error in the example. Bug is in special characters. [2009-09-21 18:07:42] dmulryan at calendarwiz dot com Similar problem when parsing the following line: 0909211132,1,ØÊááàÑ,äÆæç,CForm,Y,1,1,1,97.95.176.240,2530 which produces empty array elements for fields with special characters: Array ( [0] = 0909211132 [1] = 1 [2] = [3] = [4] = URL [5] = Y [6] = 1 [7] = 1 [8] = 1 [9] = 97.95.176.240 [10] = 2530 ) [2009-06-26 19:35:22] sjoerd-php at linuxonly dot nl Could reproduce with php 5.2.10, php 5.2.11-dev (200906261830) and php 5.3rc4. Example code: ?php $fp = tmpfile(); $str = WEIRD#\xD3TICA#BEHAVIOR; fwrite($fp, $str); fseek($fp, 0); $arr = fgetcsv($fp, 100, '#'); var_dump($arr[1]); fclose($fp); ? Expected: string(5) ?TICA Actual: string(4) TICA [2009-06-13 18:10:03] krynble at yahoo dot com dot br Unfortunately I'm unable to test it because the server is running in a Datacenter. If someone can give a feedback about it, I would apreciate. Still, thanks for the help! [2009-06-10 12:47:52] j...@php.net Please try using this CVS snapshot: http://snaps.php.net/php5.2-latest.tar.gz For Windows: http://windows.php.net/snapshots/ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/48507 -- Edit this bug report at http://bugs.php.net/?id=48507edit=1
#48507 [Com]: fgetcsv() ignoring special characters
ID: 48507 Comment by: phofstetter at sensational dot ch Reported By: krynble at yahoo dot com dot br Status: Verified Bug Type: Filesystem function related Operating System: Unix PHP Version: 5.2.9 New Comment: below you'll find a small script which shows how to implement a user filter that can be used to on-the-fly utf8-encode the data so that fgetcsv is happy and returns correct output even if the first character in a field has its high-bit set and is not valid utf-8: Remember: This is a workaround and impacts performance. This is not a valid fix for the bug. I didn't yet have time to deeply look into the C implementation for fgetcsv, but all these calls to php_mblen() feel suspicious to me. I'll try and have a look into this later today, but for now, I'm just glad I have this workaround (quickly hacked together - keep that in mind): ?php class utf8encode_filter extends php_user_filter { function is_utf8($string){ return preg_match('%(?: [\xC2-\xDF][\x80-\xBF]# non-overlong 2-byte |\xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs |[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte |\xED[\x80-\x9F][\x80-\xBF] # excluding surrogates |\xF0[\x90-\xBF][\x80-\xBF]{2}# planes 1-3 |[\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 |\xF4[\x80-\x8F][\x80-\xBF]{2}# plane 16 )+%xs', $string); } function filter($in, $out, $consumed, $closing) { while ($bucket = stream_bucket_make_writeable($in)) { if (!$this-is_utf8($bucket-data)) $bucket-data = utf8_encode($bucket-data); $consumed += $bucket-datalen; stream_bucket_append($out, $bucket); } return PSFS_PASS_ON; } } /* Register our filter with PHP */ stream_filter_register(utf8encode, utf8encode_filter) or die(Failed to register filter); $fp = fopen($_SERVER['argv'][1], r); /* Attach the registered filter to the stream just opened */ stream_filter_prepend($fp, utf8encode); while($data = fgetcsv($fp, 0, ';', '')) print_r($data); fclose($fp); Previous Comments: [2009-09-22 14:45:22] phofstetter at sensational dot ch I was looking into this (after having been bitten by it) and I can add another tidbit that might help tracking this down: The bug doesn't happen if the file fgetcsv() is reading is in UTF-8-format. I have created a test-file in ISO-8859-1 and then used file_put_contents(utf8encode(file_get_contents())) to create the UTF8-version of it (explaining this here because I'm not sure whether this would write a BOM or not - probably not though). That version could be read correctly. I'm now writing a stream filter that does the UTF-8 conversion on the fly to hook that in between the file and fgetcsv() - while I would lose a bit of performance, in my case, this is the cleanest workaround. [2009-09-21 18:11:47] dmulryan at calendarwiz dot com Note: Previous comment has error where URL is shown in array element. This is not a bug but my error in the example. Bug is in special characters. [2009-09-21 18:07:42] dmulryan at calendarwiz dot com Similar problem when parsing the following line: 0909211132,1,ØÊááàÑ,äÆæç,CForm,Y,1,1,1,97.95.176.240,2530 which produces empty array elements for fields with special characters: Array ( [0] = 0909211132 [1] = 1 [2] = [3] = [4] = URL [5] = Y [6] = 1 [7] = 1 [8] = 1 [9] = 97.95.176.240 [10] = 2530 ) [2009-06-26 19:35:22] sjoerd-php at linuxonly dot nl Could reproduce with php 5.2.10, php 5.2.11-dev (200906261830) and php 5.3rc4. Example code: ?php $fp = tmpfile(); $str = WEIRD#\xD3TICA#BEHAVIOR; fwrite($fp, $str); fseek($fp, 0); $arr = fgetcsv($fp, 100, '#'); var_dump($arr[1]); fclose($fp); ? Expected: string(5) ?TICA Actual: string(4) TICA [2009-06-13 18:10:03] krynble at yahoo dot com dot br Unfortunately I'm unable to test it because the server is running in a Datacenter. If someone can give a feedback about it, I would apreciate. Still, thanks for the help! The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/48507 -- Edit this bug report at http://bugs.php.net/?id=48507edit=1
#48507 [Com]: fgetcsv() ignoring special characters
ID: 48507 Comment by: dmulryan at calendarwiz dot com Reported By: krynble at yahoo dot com dot br Status: Verified Bug Type: Filesystem function related Operating System: Unix PHP Version: 5.2.9 New Comment: Similar problem when parsing the following line: 0909211132,1,ØÊááàÑ,äÆæç,CForm,Y,1,1,1,97.95.176.240,2530 which produces empty array elements for fields with special characters: Array ( [0] = 0909211132 [1] = 1 [2] = [3] = [4] = URL [5] = Y [6] = 1 [7] = 1 [8] = 1 [9] = 97.95.176.240 [10] = 2530 ) Previous Comments: [2009-06-26 19:35:22] sjoerd-php at linuxonly dot nl Could reproduce with php 5.2.10, php 5.2.11-dev (200906261830) and php 5.3rc4. Example code: ?php $fp = tmpfile(); $str = WEIRD#\xD3TICA#BEHAVIOR; fwrite($fp, $str); fseek($fp, 0); $arr = fgetcsv($fp, 100, '#'); var_dump($arr[1]); fclose($fp); ? Expected: string(5) ?TICA Actual: string(4) TICA [2009-06-13 18:10:03] krynble at yahoo dot com dot br Unfortunately I'm unable to test it because the server is running in a Datacenter. If someone can give a feedback about it, I would apreciate. Still, thanks for the help! [2009-06-10 12:47:52] j...@php.net Please try using this CVS snapshot: http://snaps.php.net/php5.2-latest.tar.gz For Windows: http://windows.php.net/snapshots/ [2009-06-09 14:18:39] krynble at yahoo dot com dot br Description: Problem using fgetcsv ignoring special characters at the begining of a string. The example I had was using the word ÓTICA with the # character as separator. Reproduce code: --- Consider a file with the following contents: WEIRD#ÓTICA#BEHAVIOR When using fgetcsv to parse this file, I get an output like this: Array( [0] = WEIRD, [1] = TICA, [2] = BEHAVIOR ) Expected result: Array( [0] = WEIRD, [1] = ÓTICA, [2] = BEHAVIOR ) Actual result: -- Array( [0] = WEIRD, [1] = TICA, [2] = BEHAVIOR ) -- Edit this bug report at http://bugs.php.net/?id=48507edit=1
#48507 [Com]: fgetcsv() ignoring special characters
ID: 48507 Comment by: dmulryan at calendarwiz dot com Reported By: krynble at yahoo dot com dot br Status: Verified Bug Type: Filesystem function related Operating System: Unix PHP Version: 5.2.9 New Comment: Note: Previous comment has error where URL is shown in array element. This is not a bug but my error in the example. Bug is in special characters. Previous Comments: [2009-09-21 18:07:42] dmulryan at calendarwiz dot com Similar problem when parsing the following line: 0909211132,1,ØÊááàÑ,äÆæç,CForm,Y,1,1,1,97.95.176.240,2530 which produces empty array elements for fields with special characters: Array ( [0] = 0909211132 [1] = 1 [2] = [3] = [4] = URL [5] = Y [6] = 1 [7] = 1 [8] = 1 [9] = 97.95.176.240 [10] = 2530 ) [2009-06-26 19:35:22] sjoerd-php at linuxonly dot nl Could reproduce with php 5.2.10, php 5.2.11-dev (200906261830) and php 5.3rc4. Example code: ?php $fp = tmpfile(); $str = WEIRD#\xD3TICA#BEHAVIOR; fwrite($fp, $str); fseek($fp, 0); $arr = fgetcsv($fp, 100, '#'); var_dump($arr[1]); fclose($fp); ? Expected: string(5) ?TICA Actual: string(4) TICA [2009-06-13 18:10:03] krynble at yahoo dot com dot br Unfortunately I'm unable to test it because the server is running in a Datacenter. If someone can give a feedback about it, I would apreciate. Still, thanks for the help! [2009-06-10 12:47:52] j...@php.net Please try using this CVS snapshot: http://snaps.php.net/php5.2-latest.tar.gz For Windows: http://windows.php.net/snapshots/ [2009-06-09 14:18:39] krynble at yahoo dot com dot br Description: Problem using fgetcsv ignoring special characters at the begining of a string. The example I had was using the word ÓTICA with the # character as separator. Reproduce code: --- Consider a file with the following contents: WEIRD#ÓTICA#BEHAVIOR When using fgetcsv to parse this file, I get an output like this: Array( [0] = WEIRD, [1] = TICA, [2] = BEHAVIOR ) Expected result: Array( [0] = WEIRD, [1] = ÓTICA, [2] = BEHAVIOR ) Actual result: -- Array( [0] = WEIRD, [1] = TICA, [2] = BEHAVIOR ) -- Edit this bug report at http://bugs.php.net/?id=48507edit=1
#48507 [Com]: fgetcsv() ignoring special characters
ID: 48507 Comment by: sjoerd-php at linuxonly dot nl Reported By: krynble at yahoo dot com dot br Status: Open Bug Type: Filesystem function related Operating System: Unix PHP Version: 5.2.9 New Comment: Could reproduce with php 5.2.10, php 5.2.11-dev (200906261830) and php 5.3rc4. Example code: ?php $fp = tmpfile(); $str = WEIRD#\xD3TICA#BEHAVIOR; fwrite($fp, $str); fseek($fp, 0); $arr = fgetcsv($fp, 100, '#'); var_dump($arr[1]); fclose($fp); ? Expected: string(5) ?TICA Actual: string(4) TICA Previous Comments: [2009-06-13 18:10:03] krynble at yahoo dot com dot br Unfortunately I'm unable to test it because the server is running in a Datacenter. If someone can give a feedback about it, I would apreciate. Still, thanks for the help! [2009-06-10 12:47:52] j...@php.net Please try using this CVS snapshot: http://snaps.php.net/php5.2-latest.tar.gz For Windows: http://windows.php.net/snapshots/ [2009-06-09 14:18:39] krynble at yahoo dot com dot br Description: Problem using fgetcsv ignoring special characters at the begining of a string. The example I had was using the word ÓTICA with the # character as separator. Reproduce code: --- Consider a file with the following contents: WEIRD#ÓTICA#BEHAVIOR When using fgetcsv to parse this file, I get an output like this: Array( [0] = WEIRD, [1] = TICA, [2] = BEHAVIOR ) Expected result: Array( [0] = WEIRD, [1] = ÓTICA, [2] = BEHAVIOR ) Actual result: -- Array( [0] = WEIRD, [1] = TICA, [2] = BEHAVIOR ) -- Edit this bug report at http://bugs.php.net/?id=48507edit=1