Bug #48507 [Com]: fgetcsv() ignoring special characters

2012-02-27 Thread jamie dot kahgee at gmail dot com
Edit report at https://bugs.php.net/bug.php?id=48507edit=1

 ID: 48507
 Comment by: jamie dot kahgee at gmail dot com
 Reported by:krynble at yahoo dot com dot br
 Summary:fgetcsv() ignoring special characters
 Status: Not a bug
 Type:   Bug
 Package:Filesystem function related
 Operating System:   Unix
 PHP Version:5.*
 Block user comment: N
 Private report: N

 New Comment:

rasmus@php, eswald@middil

I had the same problem, running fgetcsv from CLI showed no error and everything 
worked and output as expected.  It was when I ran from through APACHE that I 
couldn't get my output to display. (same script, same file).

(Ü) was the specific character I was dealing with at the start of a string 
that 
was not showing.

After I tried setting my locale local in the script everything worked as 
expected through APACHE and my strings started parsing and displaying correctly.

setlocale(LC_ALL, 'en_US.UTF-8');

Hopefully this can help you.


Previous Comments:

[2012-02-13 05:16:35] ras...@php.net

eswald@middil, I am not able to reproduce your results with either en_US.UTF-8 
nor C with a UTF8 input file:

~ echo $LANG
en_US.UTF-8
~ file utf8.txt
utf8.txt: UTF-8 Unicode text
~ cat utf8.txt 
a,a,é,é,óú,óú,óú,óú
~ php -r print_r(fgetcsv(fopen('./utf8.txt','r')));
Array
(
[0] = a
[1] = a
[2] = é
[3] = é
[4] = óú
[5] = óú
[6] = óú
[7] = óú
)

I don't see any corruption. I can understand problems with charsets that are 
not  
low-ascii compatible with a low-ascii delimiter, but I don't see why this UTF8 
case would break.


[2012-02-13 01:46:59] figura at hotbox dot ru

setlocale() might solve the issue but I do not see any reason to set up 
dependence of this fgetcsv on locale settings. The format is straight and 
clear. 

Especially this feature confuses when the string is read in UTF-8 format.


[2012-01-26 19:55:01] eswald at middil dot com

Tested with LANG=C, input file encoding of UTF-8.
Also tested with LANG=C, input file encoding of cp1252, with identical results, 
except that the output characters (what was left of them) were also cp1252.


[2012-01-26 19:50:26] eswald at middil dot com

Confirmed with php5 (5.3.6-13ubuntu3.2 on Oneiric Ocelot); can be worked around 
by quoting the value with quotation marks.  For example, the line

a,a,é,é,óú,óú,óú,óú

yields

array (
  0 = 'a',
  1 = 'a',
  2 = '',
  3 = 'é',
  4 = '',
  5 = 'óú',
  6 = 'ú',
  7 = 'óú',
)

Note the corruption in elements 2, 4, and 6, but not in their quoted 
counterparts 3, 5, and 7.


[2012-01-18 11:53:48] tero dot tasanen at gmail dot com

I can also confirm that this is an actual bug. File encoding UTF-8, locale 
settings are set correctly and characters like äöå are dropped from the 
beginning 
of the csv column. 

Tested with php versions 5.2.6, 5.2.10, 5.3.6




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=48507


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1


Bug #48507 [Com]: fgetcsv() ignoring special characters

2012-02-12 Thread figura at hotbox dot ru
Edit report at https://bugs.php.net/bug.php?id=48507edit=1

 ID: 48507
 Comment by: figura at hotbox dot ru
 Reported by:krynble at yahoo dot com dot br
 Summary:fgetcsv() ignoring special characters
 Status: Not a bug
 Type:   Bug
 Package:Filesystem function related
 Operating System:   Unix
 PHP Version:5.*
 Block user comment: N
 Private report: N

 New Comment:

setlocale() might solve the issue but I do not see any reason to set up 
dependence of this fgetcsv on locale settings. The format is straight and 
clear. 

Especially this feature confuses when the string is read in UTF-8 format.


Previous Comments:

[2012-01-26 19:55:01] eswald at middil dot com

Tested with LANG=C, input file encoding of UTF-8.
Also tested with LANG=C, input file encoding of cp1252, with identical results, 
except that the output characters (what was left of them) were also cp1252.


[2012-01-26 19:50:26] eswald at middil dot com

Confirmed with php5 (5.3.6-13ubuntu3.2 on Oneiric Ocelot); can be worked around 
by quoting the value with quotation marks.  For example, the line

a,a,é,é,óú,óú,óú,óú

yields

array (
  0 = 'a',
  1 = 'a',
  2 = '',
  3 = 'é',
  4 = '',
  5 = 'óú',
  6 = 'ú',
  7 = 'óú',
)

Note the corruption in elements 2, 4, and 6, but not in their quoted 
counterparts 3, 5, and 7.


[2012-01-18 11:53:48] tero dot tasanen at gmail dot com

I can also confirm that this is an actual bug. File encoding UTF-8, locale 
settings are set correctly and characters like äöå are dropped from the 
beginning 
of the csv column. 

Tested with php versions 5.2.6, 5.2.10, 5.3.6


[2011-10-28 08:33:25] peter dot e dot lind at gmail dot com

This is definitely still a bug - my locale is set to da_DK.utf8, the file I'm 
trying to read is in UTF8 (confirmed with a hex-editor but in fact does not 
matter - the behaviour is the same, UTF8 or ISO-8859-1) yet special characters 
are still thrown away when they are first in a field


[2011-10-18 13:59:30] me at monicag dot it

Quoting my fellows above: how comes this is not a bug?




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=48507


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1


Bug #48507 [Com]: fgetcsv() ignoring special characters

2012-01-26 Thread eswald at middil dot com
Edit report at https://bugs.php.net/bug.php?id=48507edit=1

 ID: 48507
 Comment by: eswald at middil dot com
 Reported by:krynble at yahoo dot com dot br
 Summary:fgetcsv() ignoring special characters
 Status: Bogus
 Type:   Bug
 Package:Filesystem function related
 Operating System:   Unix
 PHP Version:5.*
 Block user comment: N
 Private report: N

 New Comment:

Confirmed with php5 (5.3.6-13ubuntu3.2 on Oneiric Ocelot); can be worked around 
by quoting the value with quotation marks.  For example, the line

a,a,é,é,óú,óú,óú,óú

yields

array (
  0 = 'a',
  1 = 'a',
  2 = '',
  3 = 'é',
  4 = '',
  5 = 'óú',
  6 = 'ú',
  7 = 'óú',
)

Note the corruption in elements 2, 4, and 6, but not in their quoted 
counterparts 3, 5, and 7.


Previous Comments:

[2012-01-18 11:53:48] tero dot tasanen at gmail dot com

I can also confirm that this is an actual bug. File encoding UTF-8, locale 
settings are set correctly and characters like äöå are dropped from the 
beginning 
of the csv column. 

Tested with php versions 5.2.6, 5.2.10, 5.3.6


[2011-10-28 08:33:25] peter dot e dot lind at gmail dot com

This is definitely still a bug - my locale is set to da_DK.utf8, the file I'm 
trying to read is in UTF8 (confirmed with a hex-editor but in fact does not 
matter - the behaviour is the same, UTF8 or ISO-8859-1) yet special characters 
are still thrown away when they are first in a field


[2011-10-18 13:59:30] me at monicag dot it

Quoting my fellows above: how comes this is not a bug?


[2011-10-10 10:03:58] ghosh at q-one dot com

Sorry. I don't understand why this isn't a bug either. Could someone please 
elaborate? I tried setting all different kinds of locale to no avail. The first 
letter of a string starting with a UTF-8 character is always missing. IMHO, 
fgetcsv should work as a simple string operation (or - whatever weird things it 
does right now - at least have a parameter to do so - count this as a feature 
request if you wish). I think, the current behavior is totally confusing. For 
instance, I don't understand why only the first character is missing but the 
problem doesnt appear if a character is in the middle of a string.


[2011-07-17 16:19:28] max dot wildgrube at web dot de

The problem does also appears if the special char is preceded by a blank. This 
blank also disappears.

I use this ugly workaround:
1. first reading the complete csv file into a variable: $import
2. $import = preg_replace ({(^|\t)([€-ÿ ])}m, $1~~$2, $import); 
3. after fgetcsv; for each $field of the row array: $field = str_replace ('~~', 
'', $field);

This means: before using fgetcsv inserting a magic sequence (e.g. ~~) on the 
beginning of a field which begins with a blank or a special char; after parsing 
with fgetcsv removing it from each field.

Max.




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=48507


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1


Bug #48507 [Com]: fgetcsv() ignoring special characters

2012-01-26 Thread eswald at middil dot com
Edit report at https://bugs.php.net/bug.php?id=48507edit=1

 ID: 48507
 Comment by: eswald at middil dot com
 Reported by:krynble at yahoo dot com dot br
 Summary:fgetcsv() ignoring special characters
 Status: Bogus
 Type:   Bug
 Package:Filesystem function related
 Operating System:   Unix
 PHP Version:5.*
 Block user comment: N
 Private report: N

 New Comment:

Tested with LANG=C, input file encoding of UTF-8.
Also tested with LANG=C, input file encoding of cp1252, with identical results, 
except that the output characters (what was left of them) were also cp1252.


Previous Comments:

[2012-01-26 19:50:26] eswald at middil dot com

Confirmed with php5 (5.3.6-13ubuntu3.2 on Oneiric Ocelot); can be worked around 
by quoting the value with quotation marks.  For example, the line

a,a,é,é,óú,óú,óú,óú

yields

array (
  0 = 'a',
  1 = 'a',
  2 = '',
  3 = 'é',
  4 = '',
  5 = 'óú',
  6 = 'ú',
  7 = 'óú',
)

Note the corruption in elements 2, 4, and 6, but not in their quoted 
counterparts 3, 5, and 7.


[2012-01-18 11:53:48] tero dot tasanen at gmail dot com

I can also confirm that this is an actual bug. File encoding UTF-8, locale 
settings are set correctly and characters like äöå are dropped from the 
beginning 
of the csv column. 

Tested with php versions 5.2.6, 5.2.10, 5.3.6


[2011-10-28 08:33:25] peter dot e dot lind at gmail dot com

This is definitely still a bug - my locale is set to da_DK.utf8, the file I'm 
trying to read is in UTF8 (confirmed with a hex-editor but in fact does not 
matter - the behaviour is the same, UTF8 or ISO-8859-1) yet special characters 
are still thrown away when they are first in a field


[2011-10-18 13:59:30] me at monicag dot it

Quoting my fellows above: how comes this is not a bug?


[2011-10-10 10:03:58] ghosh at q-one dot com

Sorry. I don't understand why this isn't a bug either. Could someone please 
elaborate? I tried setting all different kinds of locale to no avail. The first 
letter of a string starting with a UTF-8 character is always missing. IMHO, 
fgetcsv should work as a simple string operation (or - whatever weird things it 
does right now - at least have a parameter to do so - count this as a feature 
request if you wish). I think, the current behavior is totally confusing. For 
instance, I don't understand why only the first character is missing but the 
problem doesnt appear if a character is in the middle of a string.




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=48507


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1


Bug #48507 [Com]: fgetcsv() ignoring special characters

2012-01-18 Thread tero dot tasanen at gmail dot com
Edit report at https://bugs.php.net/bug.php?id=48507edit=1

 ID: 48507
 Comment by: tero dot tasanen at gmail dot com
 Reported by:krynble at yahoo dot com dot br
 Summary:fgetcsv() ignoring special characters
 Status: Bogus
 Type:   Bug
 Package:Filesystem function related
 Operating System:   Unix
 PHP Version:5.*
 Block user comment: N
 Private report: N

 New Comment:

I can also confirm that this is an actual bug. File encoding UTF-8, locale 
settings are set correctly and characters like äöå are dropped from the 
beginning 
of the csv column. 

Tested with php versions 5.2.6, 5.2.10, 5.3.6


Previous Comments:

[2011-10-28 08:33:25] peter dot e dot lind at gmail dot com

This is definitely still a bug - my locale is set to da_DK.utf8, the file I'm 
trying to read is in UTF8 (confirmed with a hex-editor but in fact does not 
matter - the behaviour is the same, UTF8 or ISO-8859-1) yet special characters 
are still thrown away when they are first in a field


[2011-10-18 13:59:30] me at monicag dot it

Quoting my fellows above: how comes this is not a bug?


[2011-10-10 10:03:58] ghosh at q-one dot com

Sorry. I don't understand why this isn't a bug either. Could someone please 
elaborate? I tried setting all different kinds of locale to no avail. The first 
letter of a string starting with a UTF-8 character is always missing. IMHO, 
fgetcsv should work as a simple string operation (or - whatever weird things it 
does right now - at least have a parameter to do so - count this as a feature 
request if you wish). I think, the current behavior is totally confusing. For 
instance, I don't understand why only the first character is missing but the 
problem doesnt appear if a character is in the middle of a string.


[2011-07-17 16:19:28] max dot wildgrube at web dot de

The problem does also appears if the special char is preceded by a blank. This 
blank also disappears.

I use this ugly workaround:
1. first reading the complete csv file into a variable: $import
2. $import = preg_replace ({(^|\t)([€-ÿ ])}m, $1~~$2, $import); 
3. after fgetcsv; for each $field of the row array: $field = str_replace ('~~', 
'', $field);

This means: before using fgetcsv inserting a magic sequence (e.g. ~~) on the 
beginning of a field which begins with a blank or a special char; after parsing 
with fgetcsv removing it from each field.

Max.


[2011-07-08 08:39:50] php-bug-48507 at bsrealm dot net

This IS a bug. Whatever locale is, I expect this function to read everything 
between delimiter characters without stripping the contents. Besides, docs say 
that files in one-byte encoding would read wrong, and there is a different 
case. This bug causes serious portability issue. In my case, this function was 
used to read custom database that was storing descriptions entered by users. 
Some descriptions were in utf-8 enconding. Function just had to read whatever 
was between delimiter characters and it worked like that on Windows hosting and 
stopped working after moving to Unix hosting. Note, file itself is not utf-8 
encoded and it should not be. It is not related to locale. It must read data, 
even if it's binary, between delimiters.




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=48507


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1


Bug #48507 [Com]: fgetcsv() ignoring special characters

2011-10-28 Thread peter dot e dot lind at gmail dot com
Edit report at https://bugs.php.net/bug.php?id=48507edit=1

 ID: 48507
 Comment by: peter dot e dot lind at gmail dot com
 Reported by:krynble at yahoo dot com dot br
 Summary:fgetcsv() ignoring special characters
 Status: Bogus
 Type:   Bug
 Package:Filesystem function related
 Operating System:   Unix
 PHP Version:5.*
 Block user comment: N
 Private report: N

 New Comment:

This is definitely still a bug - my locale is set to da_DK.utf8, the file I'm 
trying to read is in UTF8 (confirmed with a hex-editor but in fact does not 
matter - the behaviour is the same, UTF8 or ISO-8859-1) yet special characters 
are still thrown away when they are first in a field


Previous Comments:

[2011-10-18 13:59:30] me at monicag dot it

Quoting my fellows above: how comes this is not a bug?


[2011-10-10 10:03:58] ghosh at q-one dot com

Sorry. I don't understand why this isn't a bug either. Could someone please 
elaborate? I tried setting all different kinds of locale to no avail. The first 
letter of a string starting with a UTF-8 character is always missing. IMHO, 
fgetcsv should work as a simple string operation (or - whatever weird things it 
does right now - at least have a parameter to do so - count this as a feature 
request if you wish). I think, the current behavior is totally confusing. For 
instance, I don't understand why only the first character is missing but the 
problem doesnt appear if a character is in the middle of a string.


[2011-07-17 16:19:28] max dot wildgrube at web dot de

The problem does also appears if the special char is preceded by a blank. This 
blank also disappears.

I use this ugly workaround:
1. first reading the complete csv file into a variable: $import
2. $import = preg_replace ({(^|\t)([€-ÿ ])}m, $1~~$2, $import); 
3. after fgetcsv; for each $field of the row array: $field = str_replace ('~~', 
'', $field);

This means: before using fgetcsv inserting a magic sequence (e.g. ~~) on the 
beginning of a field which begins with a blank or a special char; after parsing 
with fgetcsv removing it from each field.

Max.


[2011-07-08 08:39:50] php-bug-48507 at bsrealm dot net

This IS a bug. Whatever locale is, I expect this function to read everything 
between delimiter characters without stripping the contents. Besides, docs say 
that files in one-byte encoding would read wrong, and there is a different 
case. This bug causes serious portability issue. In my case, this function was 
used to read custom database that was storing descriptions entered by users. 
Some descriptions were in utf-8 enconding. Function just had to read whatever 
was between delimiter characters and it worked like that on Windows hosting and 
stopped working after moving to Unix hosting. Note, file itself is not utf-8 
encoded and it should not be. It is not related to locale. It must read data, 
even if it's binary, between delimiters.


[2011-02-26 02:46:32] gjorgjioski at gmail dot com

This is short example:

kategorija  širina platišč   število

read:
kategorija
irina platišč
tevilo

expected:
kategorija
širina platišč
Å¡tevilo




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=48507


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1


Bug #48507 [Com]: fgetcsv() ignoring special characters

2011-10-18 Thread me at monicag dot it
Edit report at https://bugs.php.net/bug.php?id=48507edit=1

 ID: 48507
 Comment by: me at monicag dot it
 Reported by:krynble at yahoo dot com dot br
 Summary:fgetcsv() ignoring special characters
 Status: Bogus
 Type:   Bug
 Package:Filesystem function related
 Operating System:   Unix
 PHP Version:5.*
 Block user comment: N
 Private report: N

 New Comment:

Quoting my fellows above: how comes this is not a bug?


Previous Comments:

[2011-10-10 10:03:58] ghosh at q-one dot com

Sorry. I don't understand why this isn't a bug either. Could someone please 
elaborate? I tried setting all different kinds of locale to no avail. The first 
letter of a string starting with a UTF-8 character is always missing. IMHO, 
fgetcsv should work as a simple string operation (or - whatever weird things it 
does right now - at least have a parameter to do so - count this as a feature 
request if you wish). I think, the current behavior is totally confusing. For 
instance, I don't understand why only the first character is missing but the 
problem doesnt appear if a character is in the middle of a string.


[2011-07-17 16:19:28] max dot wildgrube at web dot de

The problem does also appears if the special char is preceded by a blank. This 
blank also disappears.

I use this ugly workaround:
1. first reading the complete csv file into a variable: $import
2. $import = preg_replace ({(^|\t)([€-ÿ ])}m, $1~~$2, $import); 
3. after fgetcsv; for each $field of the row array: $field = str_replace ('~~', 
'', $field);

This means: before using fgetcsv inserting a magic sequence (e.g. ~~) on the 
beginning of a field which begins with a blank or a special char; after parsing 
with fgetcsv removing it from each field.

Max.


[2011-07-08 08:39:50] php-bug-48507 at bsrealm dot net

This IS a bug. Whatever locale is, I expect this function to read everything 
between delimiter characters without stripping the contents. Besides, docs say 
that files in one-byte encoding would read wrong, and there is a different 
case. This bug causes serious portability issue. In my case, this function was 
used to read custom database that was storing descriptions entered by users. 
Some descriptions were in utf-8 enconding. Function just had to read whatever 
was between delimiter characters and it worked like that on Windows hosting and 
stopped working after moving to Unix hosting. Note, file itself is not utf-8 
encoded and it should not be. It is not related to locale. It must read data, 
even if it's binary, between delimiters.


[2011-02-26 02:46:32] gjorgjioski at gmail dot com

This is short example:

kategorija  širina platišč   število

read:
kategorija
irina platišč
tevilo

expected:
kategorija
širina platišč
Å¡tevilo


[2011-02-26 02:36:32] gjorgjioski at gmail dot com

This bug occurs also when file is in UTF8 (tab delimited file using š,č 
characters). I can provide an example.




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=48507


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1


Bug #48507 [Com]: fgetcsv() ignoring special characters

2011-10-10 Thread ghosh at q-one dot com
Edit report at https://bugs.php.net/bug.php?id=48507edit=1

 ID: 48507
 Comment by: ghosh at q-one dot com
 Reported by:krynble at yahoo dot com dot br
 Summary:fgetcsv() ignoring special characters
 Status: Bogus
 Type:   Bug
 Package:Filesystem function related
 Operating System:   Unix
 PHP Version:5.*
 Block user comment: N
 Private report: N

 New Comment:

Sorry. I don't understand why this isn't a bug either. Could someone please 
elaborate? I tried setting all different kinds of locale to no avail. The first 
letter of a string starting with a UTF-8 character is always missing. IMHO, 
fgetcsv should work as a simple string operation (or - whatever weird things it 
does right now - at least have a parameter to do so - count this as a feature 
request if you wish). I think, the current behavior is totally confusing. For 
instance, I don't understand why only the first character is missing but the 
problem doesnt appear if a character is in the middle of a string.


Previous Comments:

[2011-07-17 16:19:28] max dot wildgrube at web dot de

The problem does also appears if the special char is preceded by a blank. This 
blank also disappears.

I use this ugly workaround:
1. first reading the complete csv file into a variable: $import
2. $import = preg_replace ({(^|\t)([€-ÿ ])}m, $1~~$2, $import); 
3. after fgetcsv; for each $field of the row array: $field = str_replace ('~~', 
'', $field);

This means: before using fgetcsv inserting a magic sequence (e.g. ~~) on the 
beginning of a field which begins with a blank or a special char; after parsing 
with fgetcsv removing it from each field.

Max.


[2011-07-08 08:39:50] php-bug-48507 at bsrealm dot net

This IS a bug. Whatever locale is, I expect this function to read everything 
between delimiter characters without stripping the contents. Besides, docs say 
that files in one-byte encoding would read wrong, and there is a different 
case. This bug causes serious portability issue. In my case, this function was 
used to read custom database that was storing descriptions entered by users. 
Some descriptions were in utf-8 enconding. Function just had to read whatever 
was between delimiter characters and it worked like that on Windows hosting and 
stopped working after moving to Unix hosting. Note, file itself is not utf-8 
encoded and it should not be. It is not related to locale. It must read data, 
even if it's binary, between delimiters.


[2011-02-26 02:46:32] gjorgjioski at gmail dot com

This is short example:

kategorija  širina platišč   število

read:
kategorija
irina platišč
tevilo

expected:
kategorija
širina platišč
Å¡tevilo


[2011-02-26 02:36:32] gjorgjioski at gmail dot com

This bug occurs also when file is in UTF8 (tab delimited file using š,č 
characters). I can provide an example.


[2010-05-19 13:39:52] pahan at hubbitus dot spb dot su

 Quote from the docs:
 Note: Locale setting is taken into account by this function. If LANG is e.g.
 en_US.UTF-8, files in one-byte encoding are read wrong by this function.
Ok, bug documented as are read wrong by this function is better then nothing. 
But do you plan fix this wrong behaviour?




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=48507


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1


Bug #48507 [Com]: fgetcsv() ignoring special characters

2011-07-17 Thread max dot wildgrube at web dot de
Edit report at https://bugs.php.net/bug.php?id=48507edit=1

 ID: 48507
 Comment by: max dot wildgrube at web dot de
 Reported by:krynble at yahoo dot com dot br
 Summary:fgetcsv() ignoring special characters
 Status: Bogus
 Type:   Bug
 Package:Filesystem function related
 Operating System:   Unix
 PHP Version:5.*
 Block user comment: N
 Private report: N

 New Comment:

The problem does also appears if the special char is preceded by a blank. This 
blank also disappears.

I use this ugly workaround:
1. first reading the complete csv file into a variable: $import
2. $import = preg_replace ({(^|\t)([€-ÿ ])}m, $1~~$2, $import); 
3. after fgetcsv; for each $field of the row array: $field = str_replace ('~~', 
'', $field);

This means: before using fgetcsv inserting a magic sequence (e.g. ~~) on the 
beginning of a field which begins with a blank or a special char; after parsing 
with fgetcsv removing it from each field.

Max.


Previous Comments:

[2011-07-08 08:39:50] php-bug-48507 at bsrealm dot net

This IS a bug. Whatever locale is, I expect this function to read everything 
between delimiter characters without stripping the contents. Besides, docs say 
that files in one-byte encoding would read wrong, and there is a different 
case. This bug causes serious portability issue. In my case, this function was 
used to read custom database that was storing descriptions entered by users. 
Some descriptions were in utf-8 enconding. Function just had to read whatever 
was between delimiter characters and it worked like that on Windows hosting and 
stopped working after moving to Unix hosting. Note, file itself is not utf-8 
encoded and it should not be. It is not related to locale. It must read data, 
even if it's binary, between delimiters.


[2011-02-26 02:46:32] gjorgjioski at gmail dot com

This is short example:

kategorija  širina platišč   število

read:
kategorija
irina platišč
tevilo

expected:
kategorija
širina platišč
Å¡tevilo


[2011-02-26 02:36:32] gjorgjioski at gmail dot com

This bug occurs also when file is in UTF8 (tab delimited file using š,č 
characters). I can provide an example.


[2010-05-19 13:39:52] pahan at hubbitus dot spb dot su

 Quote from the docs:
 Note: Locale setting is taken into account by this function. If LANG is e.g.
 en_US.UTF-8, files in one-byte encoding are read wrong by this function.
Ok, bug documented as are read wrong by this function is better then nothing. 
But do you plan fix this wrong behaviour?


[2010-05-18 11:03:42] m...@php.net

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Quote from the docs:

Note: Locale setting is taken into account by this function. If LANG is e.g. 
en_US.UTF-8, files in one-byte encoding are read wrong by this function.




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=48507


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1


Bug #48507 [Com]: fgetcsv() ignoring special characters

2011-07-08 Thread php-bug-48507 at bsrealm dot net
Edit report at https://bugs.php.net/bug.php?id=48507edit=1

 ID: 48507
 Comment by: php-bug-48507 at bsrealm dot net
 Reported by:krynble at yahoo dot com dot br
 Summary:fgetcsv() ignoring special characters
 Status: Bogus
 Type:   Bug
 Package:Filesystem function related
 Operating System:   Unix
 PHP Version:5.*
 Block user comment: N
 Private report: N

 New Comment:

This IS a bug. Whatever locale is, I expect this function to read everything 
between delimiter characters without stripping the contents. Besides, docs say 
that files in one-byte encoding would read wrong, and there is a different 
case. This bug causes serious portability issue. In my case, this function was 
used to read custom database that was storing descriptions entered by users. 
Some descriptions were in utf-8 enconding. Function just had to read whatever 
was between delimiter characters and it worked like that on Windows hosting and 
stopped working after moving to Unix hosting. Note, file itself is not utf-8 
encoded and it should not be. It is not related to locale. It must read data, 
even if it's binary, between delimiters.


Previous Comments:

[2011-02-26 02:46:32] gjorgjioski at gmail dot com

This is short example:

kategorija  širina platišč   število

read:
kategorija
irina platišč
tevilo

expected:
kategorija
širina platišč
Å¡tevilo


[2011-02-26 02:36:32] gjorgjioski at gmail dot com

This bug occurs also when file is in UTF8 (tab delimited file using š,č 
characters). I can provide an example.


[2010-05-19 13:39:52] pahan at hubbitus dot spb dot su

 Quote from the docs:
 Note: Locale setting is taken into account by this function. If LANG is e.g.
 en_US.UTF-8, files in one-byte encoding are read wrong by this function.
Ok, bug documented as are read wrong by this function is better then nothing. 
But do you plan fix this wrong behaviour?


[2010-05-18 11:03:42] m...@php.net

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Quote from the docs:

Note: Locale setting is taken into account by this function. If LANG is e.g. 
en_US.UTF-8, files in one-byte encoding are read wrong by this function.


[2009-12-12 11:40:29] pahan at hubbitus dot spb dot su

Sorry for duplicate (#50456 is my), but in it, additionally to there described 
problem in fgetcsv I also suggest fix fputcvs to allow [force] enclosing single 
words in field.

Off course it does *not* solve this problem of incorrect fgetcsv parsing, 
because RFC allow not quoted values ( http://www.faqs.org/rfcs/rfc4180.html , 
section 2.5 ), but, it is make pair fputcsv/fgetcsv as minimum compatible in 
PHP implementation.




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=48507


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=48507edit=1


Bug #48507 [Com]: fgetcsv() ignoring special characters

2011-02-25 Thread gjorgjioski at gmail dot com
Edit report at http://bugs.php.net/bug.php?id=48507edit=1

 ID: 48507
 Comment by: gjorgjioski at gmail dot com
 Reported by:krynble at yahoo dot com dot br
 Summary:fgetcsv() ignoring special characters
 Status: Bogus
 Type:   Bug
 Package:Filesystem function related
 Operating System:   Unix
 PHP Version:5.*
 Block user comment: N
 Private report: N

 New Comment:

This bug occurs also when file is in UTF8 (tab delimited file using
š,č characters). I can provide an example.


Previous Comments:

[2010-05-19 13:39:52] pahan at hubbitus dot spb dot su

 Quote from the docs:

 Note: Locale setting is taken into account by this function. If LANG
is e.g.

 en_US.UTF-8, files in one-byte encoding are read wrong by this
function.

Ok, bug documented as are read wrong by this function is better then
nothing. 

But do you plan fix this wrong behaviour?


[2010-05-18 11:03:42] m...@php.net

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Quote from the docs:



Note: Locale setting is taken into account by this function. If LANG is
e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this
function.


[2009-12-12 11:40:29] pahan at hubbitus dot spb dot su

Sorry for duplicate (#50456 is my), but in it, additionally to there
described problem in fgetcsv I also suggest fix fputcvs to allow [force]
enclosing single words in field.



Off course it does *not* solve this problem of incorrect fgetcsv
parsing, because RFC allow not quoted values (
http://www.faqs.org/rfcs/rfc4180.html , section 2.5 ), but, it is make
pair fputcsv/fgetcsv as minimum compatible in PHP implementation.


[2009-12-12 01:33:51] j...@php.net

See also bug #50456


[2009-09-22 15:09:20] phofstetter at sensational dot ch

below you'll find a small script which shows how to implement a user
filter that can be used to on-the-fly utf8-encode the data so that
fgetcsv is happy and returns correct output even if the first character
in a field has its high-bit set and is not valid utf-8:



Remember: This is a workaround and impacts performance. This is not a
valid fix for the bug.



I didn't yet have time to deeply look into the C implementation for
fgetcsv, but all these calls to php_mblen() feel suspicious to me.



I'll try and have a look into this later today, but for now, I'm just
glad I have this workaround (quickly hacked together - keep that in
mind):



?php



class utf8encode_filter extends php_user_filter {

  function is_utf8($string){

  return preg_match('%(?:

  [\xC2-\xDF][\x80-\xBF]# non-overlong 2-byte

  |\xE0[\xA0-\xBF][\x80-\xBF]   # excluding
overlongs

  |[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte

  |\xED[\x80-\x9F][\x80-\xBF]   # excluding
surrogates

  |\xF0[\x90-\xBF][\x80-\xBF]{2}# planes 1-3

  |[\xF1-\xF3][\x80-\xBF]{3}  # planes 4-15

  |\xF4[\x80-\x8F][\x80-\xBF]{2}# plane 16

  )+%xs', $string);

  }

  

  function filter($in, $out, $consumed, $closing)

  {

while ($bucket = stream_bucket_make_writeable($in)) {

  if (!$this-is_utf8($bucket-data))

  $bucket-data = utf8_encode($bucket-data);

  $consumed += $bucket-datalen;

  stream_bucket_append($out, $bucket);

}

return PSFS_PASS_ON;

  }

}



/* Register our filter with PHP */

stream_filter_register(utf8encode, utf8encode_filter)

or die(Failed to register filter);



$fp = fopen($_SERVER['argv'][1], r);



/* Attach the registered filter to the stream just opened */

stream_filter_prepend($fp, utf8encode);



while($data = fgetcsv($fp, 0, ';', ''))

print_r($data);



fclose($fp);




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

http://bugs.php.net/bug.php?id=48507


-- 
Edit this bug report at http://bugs.php.net/bug.php?id=48507edit=1


Bug #48507 [Com]: fgetcsv() ignoring special characters

2011-02-25 Thread gjorgjioski at gmail dot com
Edit report at http://bugs.php.net/bug.php?id=48507edit=1

 ID: 48507
 Comment by: gjorgjioski at gmail dot com
 Reported by:krynble at yahoo dot com dot br
 Summary:fgetcsv() ignoring special characters
 Status: Bogus
 Type:   Bug
 Package:Filesystem function related
 Operating System:   Unix
 PHP Version:5.*
 Block user comment: N
 Private report: N

 New Comment:

This is short example:



kategorija  širina platišč   število



read:

kategorija

irina platišč

tevilo



expected:

kategorija

širina platišč

Å¡tevilo


Previous Comments:

[2011-02-26 02:36:32] gjorgjioski at gmail dot com

This bug occurs also when file is in UTF8 (tab delimited file using
š,č characters). I can provide an example.


[2010-05-19 13:39:52] pahan at hubbitus dot spb dot su

 Quote from the docs:

 Note: Locale setting is taken into account by this function. If LANG
is e.g.

 en_US.UTF-8, files in one-byte encoding are read wrong by this
function.

Ok, bug documented as are read wrong by this function is better then
nothing. 

But do you plan fix this wrong behaviour?


[2010-05-18 11:03:42] m...@php.net

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Quote from the docs:



Note: Locale setting is taken into account by this function. If LANG is
e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this
function.


[2009-12-12 11:40:29] pahan at hubbitus dot spb dot su

Sorry for duplicate (#50456 is my), but in it, additionally to there
described problem in fgetcsv I also suggest fix fputcvs to allow [force]
enclosing single words in field.



Off course it does *not* solve this problem of incorrect fgetcsv
parsing, because RFC allow not quoted values (
http://www.faqs.org/rfcs/rfc4180.html , section 2.5 ), but, it is make
pair fputcsv/fgetcsv as minimum compatible in PHP implementation.


[2009-12-12 01:33:51] j...@php.net

See also bug #50456




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

http://bugs.php.net/bug.php?id=48507


-- 
Edit this bug report at http://bugs.php.net/bug.php?id=48507edit=1


Bug #48507 [Com]: fgetcsv() ignoring special characters

2010-05-19 Thread pahan at hubbitus dot spb dot su
Edit report at http://bugs.php.net/bug.php?id=48507edit=1

 ID:   48507
 Comment by:   pahan at hubbitus dot spb dot su
 Reported by:  krynble at yahoo dot com dot br
 Summary:  fgetcsv() ignoring special characters
 Status:   Bogus
 Type: Bug
 Package:  Filesystem function related
 Operating System: Unix
 PHP Version:  5.*

 New Comment:

 Quote from the docs:

 Note: Locale setting is taken into account by this function. If LANG
is e.g.

 en_US.UTF-8, files in one-byte encoding are read wrong by this
function.

Ok, bug documented as are read wrong by this function is better then
nothing. 

But do you plan fix this wrong behaviour?


Previous Comments:

[2010-05-18 11:03:42] m...@php.net

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Quote from the docs:



Note: Locale setting is taken into account by this function. If LANG is
e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this
function.


[2009-12-12 11:40:29] pahan at hubbitus dot spb dot su

Sorry for duplicate (#50456 is my), but in it, additionally to there
described problem in fgetcsv I also suggest fix fputcvs to allow [force]
enclosing single words in field.



Off course it does *not* solve this problem of incorrect fgetcsv
parsing, because RFC allow not quoted values (
http://www.faqs.org/rfcs/rfc4180.html , section 2.5 ), but, it is make
pair fputcsv/fgetcsv as minimum compatible in PHP implementation.


[2009-12-12 01:33:51] j...@php.net

See also bug #50456


[2009-09-22 15:09:20] phofstetter at sensational dot ch

below you'll find a small script which shows how to implement a user
filter that can be used to on-the-fly utf8-encode the data so that
fgetcsv is happy and returns correct output even if the first character
in a field has its high-bit set and is not valid utf-8:



Remember: This is a workaround and impacts performance. This is not a
valid fix for the bug.



I didn't yet have time to deeply look into the C implementation for
fgetcsv, but all these calls to php_mblen() feel suspicious to me.



I'll try and have a look into this later today, but for now, I'm just
glad I have this workaround (quickly hacked together - keep that in
mind):



?php



class utf8encode_filter extends php_user_filter {

  function is_utf8($string){

  return preg_match('%(?:

  [\xC2-\xDF][\x80-\xBF]# non-overlong 2-byte

  |\xE0[\xA0-\xBF][\x80-\xBF]   # excluding
overlongs

  |[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte

  |\xED[\x80-\x9F][\x80-\xBF]   # excluding
surrogates

  |\xF0[\x90-\xBF][\x80-\xBF]{2}# planes 1-3

  |[\xF1-\xF3][\x80-\xBF]{3}  # planes 4-15

  |\xF4[\x80-\x8F][\x80-\xBF]{2}# plane 16

  )+%xs', $string);

  }

  

  function filter($in, $out, $consumed, $closing)

  {

while ($bucket = stream_bucket_make_writeable($in)) {

  if (!$this-is_utf8($bucket-data))

  $bucket-data = utf8_encode($bucket-data);

  $consumed += $bucket-datalen;

  stream_bucket_append($out, $bucket);

}

return PSFS_PASS_ON;

  }

}



/* Register our filter with PHP */

stream_filter_register(utf8encode, utf8encode_filter)

or die(Failed to register filter);



$fp = fopen($_SERVER['argv'][1], r);



/* Attach the registered filter to the stream just opened */

stream_filter_prepend($fp, utf8encode);



while($data = fgetcsv($fp, 0, ';', ''))

print_r($data);



fclose($fp);


[2009-09-22 14:45:22] phofstetter at sensational dot ch

I was looking into this (after having been bitten by it) and I can add
another tidbit that might help tracking this down:



The bug doesn't happen if the file fgetcsv() is reading is in
UTF-8-format.



I have created a test-file in ISO-8859-1 and then used
file_put_contents(utf8encode(file_get_contents())) to create the
UTF8-version of it (explaining this here because I'm not sure whether
this would write a BOM or not - probably not though).



That version could be read correctly.



I'm now writing a stream filter that does the UTF-8 conversion on the
fly to hook that in between the file and fgetcsv() - while I would lose
a bit of performance, in my case, this is the cleanest workaround.




The remainder of the comments for this report are too long. To view
the 

#48507 [Com]: fgetcsv() ignoring special characters

2009-12-12 Thread pahan at hubbitus dot spb dot su
 ID:   48507
 Comment by:   pahan at hubbitus dot spb dot su
 Reported By:  krynble at yahoo dot com dot br
 Status:   Verified
 Bug Type: Filesystem function related
 Operating System: Unix
 PHP Version:  5.*
 New Comment:

Sorry for duplicate (#50456 is my), but in it, additionally to there
described problem in fgetcsv I also suggest fix fputcvs to allow [force]
enclosing single words in field.

Off course it does *not* solve this problem of incorrect fgetcsv
parsing, because RFC allow not quoted values (
http://www.faqs.org/rfcs/rfc4180.html , section 2.5 ), but, it is make
pair fputcsv/fgetcsv as minimum compatible in PHP implementation.


Previous Comments:


[2009-12-12 01:33:51] j...@php.net

See also bug #50456



[2009-09-22 15:09:20] phofstetter at sensational dot ch

below you'll find a small script which shows how to implement a user
filter that can be used to on-the-fly utf8-encode the data so that
fgetcsv is happy and returns correct output even if the first character
in a field has its high-bit set and is not valid utf-8:

Remember: This is a workaround and impacts performance. This is not a
valid fix for the bug.

I didn't yet have time to deeply look into the C implementation for
fgetcsv, but all these calls to php_mblen() feel suspicious to me.

I'll try and have a look into this later today, but for now, I'm just
glad I have this workaround (quickly hacked together - keep that in
mind):

?php

class utf8encode_filter extends php_user_filter {
  function is_utf8($string){
  return preg_match('%(?:
  [\xC2-\xDF][\x80-\xBF]# non-overlong 2-byte
  |\xE0[\xA0-\xBF][\x80-\xBF]   # excluding
overlongs
  |[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
  |\xED[\x80-\x9F][\x80-\xBF]   # excluding
surrogates
  |\xF0[\x90-\xBF][\x80-\xBF]{2}# planes 1-3
  |[\xF1-\xF3][\x80-\xBF]{3}  # planes 4-15
  |\xF4[\x80-\x8F][\x80-\xBF]{2}# plane 16
  )+%xs', $string);
  }
  
  function filter($in, $out, $consumed, $closing)
  {
while ($bucket = stream_bucket_make_writeable($in)) {
  if (!$this-is_utf8($bucket-data))
  $bucket-data = utf8_encode($bucket-data);
  $consumed += $bucket-datalen;
  stream_bucket_append($out, $bucket);
}
return PSFS_PASS_ON;
  }
}

/* Register our filter with PHP */
stream_filter_register(utf8encode, utf8encode_filter)
or die(Failed to register filter);

$fp = fopen($_SERVER['argv'][1], r);

/* Attach the registered filter to the stream just opened */
stream_filter_prepend($fp, utf8encode);

while($data = fgetcsv($fp, 0, ';', ''))
print_r($data);

fclose($fp);



[2009-09-22 14:45:22] phofstetter at sensational dot ch

I was looking into this (after having been bitten by it) and I can add
another tidbit that might help tracking this down:

The bug doesn't happen if the file fgetcsv() is reading is in
UTF-8-format.

I have created a test-file in ISO-8859-1 and then used
file_put_contents(utf8encode(file_get_contents())) to create the
UTF8-version of it (explaining this here because I'm not sure whether
this would write a BOM or not - probably not though).

That version could be read correctly.

I'm now writing a stream filter that does the UTF-8 conversion on the
fly to hook that in between the file and fgetcsv() - while I would lose
a bit of performance, in my case, this is the cleanest workaround.



[2009-09-21 18:11:47] dmulryan at calendarwiz dot com

Note: Previous comment has error where URL is shown in array element. 
This is not a bug but my error in the example.  Bug is in special
characters.



[2009-09-21 18:07:42] dmulryan at calendarwiz dot com

Similar problem when parsing the following line:

0909211132,1,ØÊááàÑ,äÆæç,CForm,Y,1,1,1,97.95.176.240,2530

which produces empty array elements for fields with special
characters:

Array ( [0] = 0909211132 [1] = 1 [2] = [3] = [4] = URL [5] = Y
[6] = 1 [7] = 1 [8] = 1 [9] = 97.95.176.240 [10] = 2530 )



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/48507

-- 
Edit this bug report at http://bugs.php.net/?id=48507edit=1



#48507 [Com]: fgetcsv() ignoring special characters

2009-09-22 Thread phofstetter at sensational dot ch
 ID:   48507
 Comment by:   phofstetter at sensational dot ch
 Reported By:  krynble at yahoo dot com dot br
 Status:   Verified
 Bug Type: Filesystem function related
 Operating System: Unix
 PHP Version:  5.2.9
 New Comment:

I was looking into this (after having been bitten by it) and I can add
another tidbit that might help tracking this down:

The bug doesn't happen if the file fgetcsv() is reading is in
UTF-8-format.

I have created a test-file in ISO-8859-1 and then used
file_put_contents(utf8encode(file_get_contents())) to create the
UTF8-version of it (explaining this here because I'm not sure whether
this would write a BOM or not - probably not though).

That version could be read correctly.

I'm now writing a stream filter that does the UTF-8 conversion on the
fly to hook that in between the file and fgetcsv() - while I would lose
a bit of performance, in my case, this is the cleanest workaround.


Previous Comments:


[2009-09-21 18:11:47] dmulryan at calendarwiz dot com

Note: Previous comment has error where URL is shown in array element. 
This is not a bug but my error in the example.  Bug is in special
characters.



[2009-09-21 18:07:42] dmulryan at calendarwiz dot com

Similar problem when parsing the following line:

0909211132,1,ØÊááàÑ,äÆæç,CForm,Y,1,1,1,97.95.176.240,2530

which produces empty array elements for fields with special
characters:

Array ( [0] = 0909211132 [1] = 1 [2] = [3] = [4] = URL [5] = Y
[6] = 1 [7] = 1 [8] = 1 [9] = 97.95.176.240 [10] = 2530 )



[2009-06-26 19:35:22] sjoerd-php at linuxonly dot nl

Could reproduce with php 5.2.10, php 5.2.11-dev (200906261830) and php
5.3rc4. Example code:

?php
$fp = tmpfile();
$str = WEIRD#\xD3TICA#BEHAVIOR;
fwrite($fp, $str);
fseek($fp, 0);
$arr = fgetcsv($fp, 100, '#');
var_dump($arr[1]);
fclose($fp);
?

Expected: string(5) ?TICA
Actual: string(4) TICA



[2009-06-13 18:10:03] krynble at yahoo dot com dot br

Unfortunately I'm unable to test it because the server is running in a

Datacenter.

If someone can give a feedback about it, I would apreciate.

Still, thanks for the help!



[2009-06-10 12:47:52] j...@php.net

Please try using this CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows:

  http://windows.php.net/snapshots/





The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/48507

-- 
Edit this bug report at http://bugs.php.net/?id=48507edit=1



#48507 [Com]: fgetcsv() ignoring special characters

2009-09-22 Thread phofstetter at sensational dot ch
 ID:   48507
 Comment by:   phofstetter at sensational dot ch
 Reported By:  krynble at yahoo dot com dot br
 Status:   Verified
 Bug Type: Filesystem function related
 Operating System: Unix
 PHP Version:  5.2.9
 New Comment:

below you'll find a small script which shows how to implement a user
filter that can be used to on-the-fly utf8-encode the data so that
fgetcsv is happy and returns correct output even if the first character
in a field has its high-bit set and is not valid utf-8:

Remember: This is a workaround and impacts performance. This is not a
valid fix for the bug.

I didn't yet have time to deeply look into the C implementation for
fgetcsv, but all these calls to php_mblen() feel suspicious to me.

I'll try and have a look into this later today, but for now, I'm just
glad I have this workaround (quickly hacked together - keep that in
mind):

?php

class utf8encode_filter extends php_user_filter {
  function is_utf8($string){
  return preg_match('%(?:
  [\xC2-\xDF][\x80-\xBF]# non-overlong 2-byte
  |\xE0[\xA0-\xBF][\x80-\xBF]   # excluding
overlongs
  |[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
  |\xED[\x80-\x9F][\x80-\xBF]   # excluding
surrogates
  |\xF0[\x90-\xBF][\x80-\xBF]{2}# planes 1-3
  |[\xF1-\xF3][\x80-\xBF]{3}  # planes 4-15
  |\xF4[\x80-\x8F][\x80-\xBF]{2}# plane 16
  )+%xs', $string);
  }
  
  function filter($in, $out, $consumed, $closing)
  {
while ($bucket = stream_bucket_make_writeable($in)) {
  if (!$this-is_utf8($bucket-data))
  $bucket-data = utf8_encode($bucket-data);
  $consumed += $bucket-datalen;
  stream_bucket_append($out, $bucket);
}
return PSFS_PASS_ON;
  }
}

/* Register our filter with PHP */
stream_filter_register(utf8encode, utf8encode_filter)
or die(Failed to register filter);

$fp = fopen($_SERVER['argv'][1], r);

/* Attach the registered filter to the stream just opened */
stream_filter_prepend($fp, utf8encode);

while($data = fgetcsv($fp, 0, ';', ''))
print_r($data);

fclose($fp);


Previous Comments:


[2009-09-22 14:45:22] phofstetter at sensational dot ch

I was looking into this (after having been bitten by it) and I can add
another tidbit that might help tracking this down:

The bug doesn't happen if the file fgetcsv() is reading is in
UTF-8-format.

I have created a test-file in ISO-8859-1 and then used
file_put_contents(utf8encode(file_get_contents())) to create the
UTF8-version of it (explaining this here because I'm not sure whether
this would write a BOM or not - probably not though).

That version could be read correctly.

I'm now writing a stream filter that does the UTF-8 conversion on the
fly to hook that in between the file and fgetcsv() - while I would lose
a bit of performance, in my case, this is the cleanest workaround.



[2009-09-21 18:11:47] dmulryan at calendarwiz dot com

Note: Previous comment has error where URL is shown in array element. 
This is not a bug but my error in the example.  Bug is in special
characters.



[2009-09-21 18:07:42] dmulryan at calendarwiz dot com

Similar problem when parsing the following line:

0909211132,1,ØÊááàÑ,äÆæç,CForm,Y,1,1,1,97.95.176.240,2530

which produces empty array elements for fields with special
characters:

Array ( [0] = 0909211132 [1] = 1 [2] = [3] = [4] = URL [5] = Y
[6] = 1 [7] = 1 [8] = 1 [9] = 97.95.176.240 [10] = 2530 )



[2009-06-26 19:35:22] sjoerd-php at linuxonly dot nl

Could reproduce with php 5.2.10, php 5.2.11-dev (200906261830) and php
5.3rc4. Example code:

?php
$fp = tmpfile();
$str = WEIRD#\xD3TICA#BEHAVIOR;
fwrite($fp, $str);
fseek($fp, 0);
$arr = fgetcsv($fp, 100, '#');
var_dump($arr[1]);
fclose($fp);
?

Expected: string(5) ?TICA
Actual: string(4) TICA



[2009-06-13 18:10:03] krynble at yahoo dot com dot br

Unfortunately I'm unable to test it because the server is running in a

Datacenter.

If someone can give a feedback about it, I would apreciate.

Still, thanks for the help!



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/48507

-- 
Edit this bug report at http://bugs.php.net/?id=48507edit=1



#48507 [Com]: fgetcsv() ignoring special characters

2009-09-21 Thread dmulryan at calendarwiz dot com
 ID:   48507
 Comment by:   dmulryan at calendarwiz dot com
 Reported By:  krynble at yahoo dot com dot br
 Status:   Verified
 Bug Type: Filesystem function related
 Operating System: Unix
 PHP Version:  5.2.9
 New Comment:

Similar problem when parsing the following line:

0909211132,1,ØÊááàÑ,äÆæç,CForm,Y,1,1,1,97.95.176.240,2530

which produces empty array elements for fields with special
characters:

Array ( [0] = 0909211132 [1] = 1 [2] = [3] = [4] = URL [5] = Y
[6] = 1 [7] = 1 [8] = 1 [9] = 97.95.176.240 [10] = 2530 )


Previous Comments:


[2009-06-26 19:35:22] sjoerd-php at linuxonly dot nl

Could reproduce with php 5.2.10, php 5.2.11-dev (200906261830) and php
5.3rc4. Example code:

?php
$fp = tmpfile();
$str = WEIRD#\xD3TICA#BEHAVIOR;
fwrite($fp, $str);
fseek($fp, 0);
$arr = fgetcsv($fp, 100, '#');
var_dump($arr[1]);
fclose($fp);
?

Expected: string(5) ?TICA
Actual: string(4) TICA



[2009-06-13 18:10:03] krynble at yahoo dot com dot br

Unfortunately I'm unable to test it because the server is running in a

Datacenter.

If someone can give a feedback about it, I would apreciate.

Still, thanks for the help!



[2009-06-10 12:47:52] j...@php.net

Please try using this CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows:

  http://windows.php.net/snapshots/





[2009-06-09 14:18:39] krynble at yahoo dot com dot br

Description:

Problem using fgetcsv ignoring special characters at the begining of a

string.

The example I had was using the word ÓTICA with the # character as

separator.

Reproduce code:
---
Consider a file with the following contents: WEIRD#ÓTICA#BEHAVIOR

When using fgetcsv to parse this file, I get an output like this:

Array(
   [0] = WEIRD,
   [1] = TICA,
   [2] = BEHAVIOR
)

Expected result:

Array(
   [0] = WEIRD,
   [1] = ÓTICA,
   [2] = BEHAVIOR
)

Actual result:
--
Array(
   [0] = WEIRD,
   [1] = TICA,
   [2] = BEHAVIOR
)





-- 
Edit this bug report at http://bugs.php.net/?id=48507edit=1



#48507 [Com]: fgetcsv() ignoring special characters

2009-09-21 Thread dmulryan at calendarwiz dot com
 ID:   48507
 Comment by:   dmulryan at calendarwiz dot com
 Reported By:  krynble at yahoo dot com dot br
 Status:   Verified
 Bug Type: Filesystem function related
 Operating System: Unix
 PHP Version:  5.2.9
 New Comment:

Note: Previous comment has error where URL is shown in array element. 
This is not a bug but my error in the example.  Bug is in special
characters.


Previous Comments:


[2009-09-21 18:07:42] dmulryan at calendarwiz dot com

Similar problem when parsing the following line:

0909211132,1,ØÊááàÑ,äÆæç,CForm,Y,1,1,1,97.95.176.240,2530

which produces empty array elements for fields with special
characters:

Array ( [0] = 0909211132 [1] = 1 [2] = [3] = [4] = URL [5] = Y
[6] = 1 [7] = 1 [8] = 1 [9] = 97.95.176.240 [10] = 2530 )



[2009-06-26 19:35:22] sjoerd-php at linuxonly dot nl

Could reproduce with php 5.2.10, php 5.2.11-dev (200906261830) and php
5.3rc4. Example code:

?php
$fp = tmpfile();
$str = WEIRD#\xD3TICA#BEHAVIOR;
fwrite($fp, $str);
fseek($fp, 0);
$arr = fgetcsv($fp, 100, '#');
var_dump($arr[1]);
fclose($fp);
?

Expected: string(5) ?TICA
Actual: string(4) TICA



[2009-06-13 18:10:03] krynble at yahoo dot com dot br

Unfortunately I'm unable to test it because the server is running in a

Datacenter.

If someone can give a feedback about it, I would apreciate.

Still, thanks for the help!



[2009-06-10 12:47:52] j...@php.net

Please try using this CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows:

  http://windows.php.net/snapshots/





[2009-06-09 14:18:39] krynble at yahoo dot com dot br

Description:

Problem using fgetcsv ignoring special characters at the begining of a

string.

The example I had was using the word ÓTICA with the # character as

separator.

Reproduce code:
---
Consider a file with the following contents: WEIRD#ÓTICA#BEHAVIOR

When using fgetcsv to parse this file, I get an output like this:

Array(
   [0] = WEIRD,
   [1] = TICA,
   [2] = BEHAVIOR
)

Expected result:

Array(
   [0] = WEIRD,
   [1] = ÓTICA,
   [2] = BEHAVIOR
)

Actual result:
--
Array(
   [0] = WEIRD,
   [1] = TICA,
   [2] = BEHAVIOR
)





-- 
Edit this bug report at http://bugs.php.net/?id=48507edit=1



#48507 [Com]: fgetcsv() ignoring special characters

2009-06-26 Thread sjoerd-php at linuxonly dot nl
 ID:   48507
 Comment by:   sjoerd-php at linuxonly dot nl
 Reported By:  krynble at yahoo dot com dot br
 Status:   Open
 Bug Type: Filesystem function related
 Operating System: Unix
 PHP Version:  5.2.9
 New Comment:

Could reproduce with php 5.2.10, php 5.2.11-dev (200906261830) and php
5.3rc4. Example code:

?php
$fp = tmpfile();
$str = WEIRD#\xD3TICA#BEHAVIOR;
fwrite($fp, $str);
fseek($fp, 0);
$arr = fgetcsv($fp, 100, '#');
var_dump($arr[1]);
fclose($fp);
?

Expected: string(5) ?TICA
Actual: string(4) TICA


Previous Comments:


[2009-06-13 18:10:03] krynble at yahoo dot com dot br

Unfortunately I'm unable to test it because the server is running in a

Datacenter.

If someone can give a feedback about it, I would apreciate.

Still, thanks for the help!



[2009-06-10 12:47:52] j...@php.net

Please try using this CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows:

  http://windows.php.net/snapshots/





[2009-06-09 14:18:39] krynble at yahoo dot com dot br

Description:

Problem using fgetcsv ignoring special characters at the begining of a

string.

The example I had was using the word ÓTICA with the # character as

separator.

Reproduce code:
---
Consider a file with the following contents: WEIRD#ÓTICA#BEHAVIOR

When using fgetcsv to parse this file, I get an output like this:

Array(
   [0] = WEIRD,
   [1] = TICA,
   [2] = BEHAVIOR
)

Expected result:

Array(
   [0] = WEIRD,
   [1] = ÓTICA,
   [2] = BEHAVIOR
)

Actual result:
--
Array(
   [0] = WEIRD,
   [1] = TICA,
   [2] = BEHAVIOR
)





-- 
Edit this bug report at http://bugs.php.net/?id=48507edit=1