Bug #65045 [Fbk->Csd]: mb_convert_encoding breaks well-formed character

2013-06-29 Thread hirokawa
Edit report at https://bugs.php.net/bug.php?id=65045&edit=1

 ID: 65045
 Updated by: hirok...@php.net
 Reported by:masakielastic at gmail dot com
 Summary:mb_convert_encoding breaks well-formed character
-Status: Feedback
+Status: Closed
 Type:   Bug
 Package:mbstring related
 Operating System:   Mac OSX
 PHP Version:5.5.0RC3
 Assigned To:hirokawa
 Block user comment: N
 Private report: N

 New Comment:

Automatic comment on behalf of hirokawa
Revision: 
http://git.php.net/?p=php-src.git;a=commit;h=c6a7549efcca62346687b0fda5b408b963f5ab2d
Log: fixed #65045: mb_convert_encoding breaks well-formed character.


Previous Comments:

[2013-06-30 02:49:42] hirok...@php.net

This problem is caused by ill-formed utf-8 handling issue of libmbfl.
libmbfl is maintaining at https://github.com/moriyoshi/libmbfl.
Please try to use the newest version of libmbfl on github.


[2013-06-22 14:02:28] a...@php.net

Related To: Bug #65081


[2013-06-17 12:30:10] a...@php.net

I can reproduce that on windows too, the issue is probably not only osx. Here's 
slightly modified snippet:



And the output (added pipes as utf8 char separators manually)

0xf0 0xa4 0xad | 0xf0 0xa4 0xad 0xa2 | 0xf0 0xa4 0xad 0xa2

0xef 0xbf 0xbd | 0xef 0xbf 0xbd | 0xef 0xbf 0xbd | 0xef 0xbf 0xbd | 0xf0 0xa4 
0xad 0xa2

As one can see, the first original invalid 3 byte sequence and the second valid 
4 byte sequence are replaced with "0xef 0xbf 0xbd", the last one remains. 
However looking at the codes only libmfl is in the game 
there http://lxr.php.net/xref/PHP_5_5/ext/mbstring/mbstring.c#3011 . Not sure 
yet to have overseen something, have to make a C 
snippet.


[2013-06-16 23:17:01] masakielastic at gmail dot com

Description:

When converting string from UTF-8 to UTF-8 by using mb_convert_encoding for 
replacing ill-formed byte sequence with the substitute character(U+FFFD), 
mb_convert_encoding replaces the character follwing ill-formed byte sequence 
with 
the substitute character. mb_convert_encoding also delete trailing ill-formed 
byte 
sequence and doesn't replace it with the substitute character.

The comprehensive test case for 2-4 byte 
characters is here: https://gist.github.com/masakielastic/5793665 .

Test script:
---
// U+24B62: "\xF0\xA4\xAD\xA2"
// ill-formed: "\xF0\xA4\xAD"
// U+FFFD: "\xEF\xBF\xBD"

$str = "\xF0\xA4\xAD".  "\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2";
$expected = "\xEF\xBF\xBD"."\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2";

$str2 = "\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD";
$expected2 = "\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2"."\xEF\xBF\xBD";

mb_substitute_character(0xFFFD);
var_dump(
$expected === htmlspecialchars_decode(htmlspecialchars($str, 
ENT_SUBSTITUTE, 'UTF-8')),
$expected2 === htmlspecialchars_decode(htmlspecialchars($str2, 
ENT_SUBSTITUTE, 'UTF-8')), 
$expected === mb_convert_encoding($str, 'UTF-8', 'UTF-8'),
$expected2 === mb_convert_encoding($str2, 'UTF-8', 'UTF-8')
);

Expected result:

bool(true)
bool(true)
bool(true)
bool(true)

Actual result:
--
bool(true)
bool(true)
bool(false)
bool(false)






-- 
Edit this bug report at https://bugs.php.net/bug.php?id=65045&edit=1


Bug #65045 [Ver->Fbk]: mb_convert_encoding breaks well-formed character

2013-06-29 Thread hirokawa
Edit report at https://bugs.php.net/bug.php?id=65045&edit=1

 ID: 65045
 Updated by: hirok...@php.net
 Reported by:masakielastic at gmail dot com
 Summary:mb_convert_encoding breaks well-formed character
-Status: Verified
+Status: Feedback
 Type:   Bug
 Package:mbstring related
 Operating System:   Mac OSX
 PHP Version:5.5.0RC3
-Assigned To:
+Assigned To:hirokawa
 Block user comment: N
 Private report: N

 New Comment:

This problem is caused by ill-formed utf-8 handling issue of libmbfl.
libmbfl is maintaining at https://github.com/moriyoshi/libmbfl.
Please try to use the newest version of libmbfl on github.


Previous Comments:

[2013-06-22 14:02:28] a...@php.net

Related To: Bug #65081


[2013-06-17 12:30:10] a...@php.net

I can reproduce that on windows too, the issue is probably not only osx. Here's 
slightly modified snippet:



And the output (added pipes as utf8 char separators manually)

0xf0 0xa4 0xad | 0xf0 0xa4 0xad 0xa2 | 0xf0 0xa4 0xad 0xa2

0xef 0xbf 0xbd | 0xef 0xbf 0xbd | 0xef 0xbf 0xbd | 0xef 0xbf 0xbd | 0xf0 0xa4 
0xad 0xa2

As one can see, the first original invalid 3 byte sequence and the second valid 
4 byte sequence are replaced with "0xef 0xbf 0xbd", the last one remains. 
However looking at the codes only libmfl is in the game 
there http://lxr.php.net/xref/PHP_5_5/ext/mbstring/mbstring.c#3011 . Not sure 
yet to have overseen something, have to make a C 
snippet.


[2013-06-16 23:17:01] masakielastic at gmail dot com

Description:

When converting string from UTF-8 to UTF-8 by using mb_convert_encoding for 
replacing ill-formed byte sequence with the substitute character(U+FFFD), 
mb_convert_encoding replaces the character follwing ill-formed byte sequence 
with 
the substitute character. mb_convert_encoding also delete trailing ill-formed 
byte 
sequence and doesn't replace it with the substitute character.

The comprehensive test case for 2-4 byte 
characters is here: https://gist.github.com/masakielastic/5793665 .

Test script:
---
// U+24B62: "\xF0\xA4\xAD\xA2"
// ill-formed: "\xF0\xA4\xAD"
// U+FFFD: "\xEF\xBF\xBD"

$str = "\xF0\xA4\xAD".  "\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2";
$expected = "\xEF\xBF\xBD"."\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2";

$str2 = "\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD";
$expected2 = "\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2"."\xEF\xBF\xBD";

mb_substitute_character(0xFFFD);
var_dump(
$expected === htmlspecialchars_decode(htmlspecialchars($str, 
ENT_SUBSTITUTE, 'UTF-8')),
$expected2 === htmlspecialchars_decode(htmlspecialchars($str2, 
ENT_SUBSTITUTE, 'UTF-8')), 
$expected === mb_convert_encoding($str, 'UTF-8', 'UTF-8'),
$expected2 === mb_convert_encoding($str2, 'UTF-8', 'UTF-8')
);

Expected result:

bool(true)
bool(true)
bool(true)
bool(true)

Actual result:
--
bool(true)
bool(true)
bool(false)
bool(false)






-- 
Edit this bug report at https://bugs.php.net/bug.php?id=65045&edit=1


Bug #60116 [Bgs->Csd]: escapeshellcmd() cannot escape the chars which causes shell injection.

2011-11-11 Thread hirokawa
Edit report at https://bugs.php.net/bug.php?id=60116&edit=1

 ID: 60116
 Updated by: hirok...@php.net
 Reported by:hirok...@php.net
 Summary:escapeshellcmd() cannot escape the chars which
 causes shell injection.
-Status: Bogus
+Status: Closed
 Type:   Bug
 Package:Filter related
 Operating System:   Ubuntu Linux
 PHP Version:trunk-SVN-2011-10-23 (SVN)
 Assigned To:hirokawa
 Block user comment: N
 Private report: N



Previous Comments:

[2011-11-11 15:06:10] hirok...@php.net

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php




[2011-11-11 14:52:48] hirok...@php.net

Automatic comment from SVN on behalf of hirokawa
Revision: http://svn.php.net/viewvc/?view=revision&revision=319057
Log: revert changes to fix bug #60116.


[2011-11-11 09:53:49] lbarn...@php.net

> The default behavier which not escaped paired quotes is still dangerous even 
> if the single-quotes is used.

Yes, I was speaking of both single quotes and double quote: Don't enclose the 
escaped string in quotes at all :)

This is a bit puzzling because all escaping function like mysql_escape_string 
expect the user to enclose the string in quotes. But escapeshellcmd and 
escapeshellarg don't.

It's like htmlspecialchars: it just removes the special meaning of special 
characters.

> But, generally, escapeshellcmd() is used to escape the user input

It shouldn't be the case. escapeshellcmd escapes all control characters from a 
string, which avoids command injection, redirection, etc but doesn't prevent 
argument injection (it doesn't escape spaces).


[2011-11-10 22:49:42] hirok...@php.net

The default behavier which not escaped paired quotes is still dangerous even if 
the single-quotes is used.

$_GET['key'] = ":' '/etc/hosts";
$key = escapeshellcmd($_GET['key']);
$cmd = "grep '$key' /var/data/*"; // <- single quote
system($cmd);  // output: grep ':' '/etc/hosts' /var/data/*

You are right, escapeshellarg() is better than escapeshellcmd() in this case.
But, generally, escapeshellcmd() is used to escape the user input 
(GET/POST/Cookie), the default behavior (paired quotes are not escaped) is 
not recommended.


[2011-11-10 15:18:49] lbarn...@php.net

The example at http://docs.php.net/manual/en/function.escapeshellcmd.php is 
wrong. It is enclosing an escaped argument in double quotes, but the 
escapeshellcmd function doesn't expect this.

As a result the second command in the example is unsafe.

IMO the second command in the example should be removed and replaced by a 
warning telling to use escapeshellarg instead (because escapeshellcmd doesn't 
escape spaces and an argument escaped by escapeshellcmd may be interpreted as 
multiple arguments by the shell).




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=60116


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=60116&edit=1


Bug #60116 [Asn->Bgs]: escapeshellcmd() cannot escape the chars which causes shell injection.

2011-11-11 Thread hirokawa
Edit report at https://bugs.php.net/bug.php?id=60116&edit=1

 ID: 60116
 Updated by: hirok...@php.net
 Reported by:hirok...@php.net
 Summary:escapeshellcmd() cannot escape the chars which
 causes shell injection.
-Status: Assigned
+Status: Bogus
 Type:   Bug
 Package:Filter related
 Operating System:   Ubuntu Linux
 PHP Version:trunk-SVN-2011-10-23 (SVN)
 Assigned To:hirokawa
 Block user comment: N
 Private report: N

 New Comment:

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php




Previous Comments:

[2011-11-11 14:52:48] hirok...@php.net

Automatic comment from SVN on behalf of hirokawa
Revision: http://svn.php.net/viewvc/?view=revision&revision=319057
Log: revert changes to fix bug #60116.


[2011-11-11 09:53:49] lbarn...@php.net

> The default behavier which not escaped paired quotes is still dangerous even 
> if the single-quotes is used.

Yes, I was speaking of both single quotes and double quote: Don't enclose the 
escaped string in quotes at all :)

This is a bit puzzling because all escaping function like mysql_escape_string 
expect the user to enclose the string in quotes. But escapeshellcmd and 
escapeshellarg don't.

It's like htmlspecialchars: it just removes the special meaning of special 
characters.

> But, generally, escapeshellcmd() is used to escape the user input

It shouldn't be the case. escapeshellcmd escapes all control characters from a 
string, which avoids command injection, redirection, etc but doesn't prevent 
argument injection (it doesn't escape spaces).


[2011-11-10 22:49:42] hirok...@php.net

The default behavier which not escaped paired quotes is still dangerous even if 
the single-quotes is used.

$_GET['key'] = ":' '/etc/hosts";
$key = escapeshellcmd($_GET['key']);
$cmd = "grep '$key' /var/data/*"; // <- single quote
system($cmd);  // output: grep ':' '/etc/hosts' /var/data/*

You are right, escapeshellarg() is better than escapeshellcmd() in this case.
But, generally, escapeshellcmd() is used to escape the user input 
(GET/POST/Cookie), the default behavior (paired quotes are not escaped) is 
not recommended.


[2011-11-10 15:18:49] lbarn...@php.net

The example at http://docs.php.net/manual/en/function.escapeshellcmd.php is 
wrong. It is enclosing an escaped argument in double quotes, but the 
escapeshellcmd function doesn't expect this.

As a result the second command in the example is unsafe.

IMO the second command in the example should be removed and replaced by a 
warning telling to use escapeshellarg instead (because escapeshellcmd doesn't 
escape spaces and an argument escaped by escapeshellcmd may be interpreted as 
multiple arguments by the shell).


[2011-11-10 15:09:03] lbarn...@php.net

Hi,

It seems that you are not using escapeshellcmd() correctly, and that's why it's 
unsafe in the way you are using it.

You are enclosing escapeshellcmd's output in double quotes.

However escapeshellcmd() and escapeshellarg() do not work like 
mysql_real_escape_string() for example, and you must *not* enclose the string 
in quotes yourself. (The example in the documentation is wrong.)

When you don't do it it's perfectly fine:

echo escapeshellcmd('foo" "bar');

Result: foo" "bar // the quotes don't allow to inject a command.

echo escapeshellcmd('foo"bar')

Result: foo\"bar // This time the quote is escaped since it's not paired. 
Again, injecting a command is not possible.

Also, I believe that escapeshell*arg*() should be used instead or 
escapeshell*cmd*() when escaping an argument:

$cmd = sprintf('grep %s /var/data/*', escapeshellarg($_GET['key']));

(escapeshellcmd() won't escape spaces and would allow to inject an additional 
argument; escapeshellarg() encloses the whole argument in single quotes and 
ensures that it's treated as a single argument)




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=60116


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=60116&edit=1


Bug #60116 [Asn]: escapeshellcmd() cannot escape the chars which causes shell injection.

2011-11-10 Thread hirokawa
Edit report at https://bugs.php.net/bug.php?id=60116&edit=1

 ID: 60116
 Updated by: hirok...@php.net
 Reported by:hirok...@php.net
 Summary:escapeshellcmd() cannot escape the chars which
 causes shell injection.
 Status: Assigned
 Type:   Bug
 Package:Filter related
 Operating System:   Ubuntu Linux
 PHP Version:trunk-SVN-2011-10-23 (SVN)
 Assigned To:hirokawa
 Block user comment: N
 Private report: N

 New Comment:

The default behavier which not escaped paired quotes is still dangerous even if 
the single-quotes is used.

$_GET['key'] = ":' '/etc/hosts";
$key = escapeshellcmd($_GET['key']);
$cmd = "grep '$key' /var/data/*"; // <- single quote
system($cmd);  // output: grep ':' '/etc/hosts' /var/data/*

You are right, escapeshellarg() is better than escapeshellcmd() in this case.
But, generally, escapeshellcmd() is used to escape the user input 
(GET/POST/Cookie), the default behavior (paired quotes are not escaped) is 
not recommended.


Previous Comments:

[2011-11-10 15:18:49] lbarn...@php.net

The example at http://docs.php.net/manual/en/function.escapeshellcmd.php is 
wrong. It is enclosing an escaped argument in double quotes, but the 
escapeshellcmd function doesn't expect this.

As a result the second command in the example is unsafe.

IMO the second command in the example should be removed and replaced by a 
warning telling to use escapeshellarg instead (because escapeshellcmd doesn't 
escape spaces and an argument escaped by escapeshellcmd may be interpreted as 
multiple arguments by the shell).


[2011-11-10 15:09:03] lbarn...@php.net

Hi,

It seems that you are not using escapeshellcmd() correctly, and that's why it's 
unsafe in the way you are using it.

You are enclosing escapeshellcmd's output in double quotes.

However escapeshellcmd() and escapeshellarg() do not work like 
mysql_real_escape_string() for example, and you must *not* enclose the string 
in quotes yourself. (The example in the documentation is wrong.)

When you don't do it it's perfectly fine:

echo escapeshellcmd('foo" "bar');

Result: foo" "bar // the quotes don't allow to inject a command.

echo escapeshellcmd('foo"bar')

Result: foo\"bar // This time the quote is escaped since it's not paired. 
Again, injecting a command is not possible.

Also, I believe that escapeshell*arg*() should be used instead or 
escapeshell*cmd*() when escaping an argument:

$cmd = sprintf('grep %s /var/data/*', escapeshellarg($_GET['key']));

(escapeshellcmd() won't escape spaces and would allow to inject an additional 
argument; escapeshellarg() encloses the whole argument in single quotes and 
ensures that it's treated as a single argument)


[2011-11-10 14:19:09] hirok...@php.net

Automatic comment from SVN on behalf of hirokawa
Revision: http://svn.php.net/viewvc/?view=revision&revision=318996
Log: MFH: fixed bug #60116 (escapeshellcmd() cannot escape the characters which 
cause shell command injection).


[2011-10-30 05:57:22] hirok...@php.net

Automatic comment from SVN on behalf of hirokawa
Revision: http://svn.php.net/viewvc/?view=revision&revision=318568
Log: added a test script for bug60116 and fixed behabior of ESCAPE_CMD_END.


[2011-10-24 14:13:27] hirok...@php.net

This bug has been fixed in SVN.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.






The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=60116


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=60116&edit=1


Bug #60227 [Opn->Csd]: header() cannot detect the multi-line header with CR(0x0D).

2011-11-06 Thread hirokawa
Edit report at https://bugs.php.net/bug.php?id=60227&edit=1

 ID: 60227
 Updated by: hirok...@php.net
 Reported by:rui_hirokawa at yahoo dot co dot jp
 Summary:header() cannot detect the multi-line header with
 CR(0x0D).
-Status: Open
+Status: Closed
 Type:   Bug
 Package:HTTP related
 Operating System:   Ubuntu Linux 11.10
 PHP Version:trunk-SVN-2011-11-06 (SVN)
-Assigned To:
+Assigned To:hirokawa
 Block user comment: N
 Private report: N

 New Comment:

This bug has been fixed in SVN.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.

 For Windows:

http://windows.php.net/snapshots/
 
Thank you for the report, and for helping us make PHP better.




Previous Comments:

[2011-11-06 11:07:07] hirok...@php.net

Automatic comment from SVN on behalf of hirokawa
Revision: http://svn.php.net/viewvc/?view=revision&revision=318820
Log: fixed bug #60227: header() cannot detect the multi-line header with CR.


[2011-11-06 07:04:50] rui_hirokawa at yahoo dot co dot jp

Description:

As of PHP 5.1.2, header() can no longer be used to send multiple response 
headers 
in a single call to prevent the HTTP Response Splitting Attack.
header() only checks the linefeed (LF, 0x0A) as line-end marker, it doesn't 
check 
the carriage-return (CR, 0x0D).

However, some browsers including Google Chrome, IE also recognize CR as the 
line-
end (it is reported by Mr. Tokumaru).

The current specification of header() still has the vulnerability against the 
HTTP header splitting attack.




Test script:
---


accessed from the url like:
http://example.com/head1.php?url=http://example.com/head1.php%0DSet-Cookie:+NAME=foo

It should be executed with Google Chrome or IE.


Expected result:

Warning: Header may not contain more than a single header, new line detected. 
in 
//head1.php on line 2
Array ( )

Actual result:
--
Array (NAME=>'foo')







-- 
Edit this bug report at https://bugs.php.net/bug.php?id=60227&edit=1


Bug #60116 [Asn->]: escapeshellcmd() cannot escape the chars which causes shell injection.

2011-10-24 Thread hirokawa
Edit report at https://bugs.php.net/bug.php?id=60116&edit=1

 ID: 60116
 Updated by: hirok...@php.net
 Reported by:hirok...@php.net
 Summary:escapeshellcmd() cannot escape the chars which
 causes shell injection.
-Status: Assigned
+Status: To be documented
 Type:   Bug
 Package:Filter related
 Operating System:   Ubuntu Linux
 PHP Version:trunk-SVN-2011-10-23 (SVN)
 Assigned To:hirokawa
 Block user comment: N
 Private report: N

 New Comment:

This bug has been fixed in SVN.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.




Previous Comments:

[2011-10-23 15:08:28] tyr...@php.net

judging from http://svn.php.net/viewvc?view=revision&revision=318342 this can 
be 
closed, right?


[2011-10-23 13:49:52] hirok...@php.net

Automatic comment from SVN on behalf of hirokawa
Revision: http://svn.php.net/viewvc/?view=revision&revision=318342
Log: fixed bug #60116 escapeshellcmd() cannot escape the dangerous quotes.


[2011-10-23 11:17:25] hirok...@php.net

The following patch has been added/updated:

Patch Name: php-escape.patch
Revision:   1319368645
URL:
https://bugs.php.net/patch-display.php?bug=60116&patch=php-escape.patch&revision=1319368645


[2011-10-23 11:16:45] hirok...@php.net

Description:

escapeshellcmd() escapes " and ' only if it isn't paired (it is documented in 
the PHP manual). 

For the test script to look for some keyword in the files of the specified 
directory (/var/data/), the double quotation in the user input as shown below 
cannot be escaped because it is paired (it is found by Mr. Tokumaru).

$_GET['key'] = ':" "/etc/passwd';

The command line will be,

grep ":" "/etc/passwd" /var/data/*

The content of arbitrary file such as /etc/passwd will be shown.

The attached patch (made by Mr. Ohgaki, slightly modified be me) will add an 
option flag for escapeshellcmd().
The recommended code to escape the quotation in this case will be,

$key = escapeshellcmd($_GET['key'], ESCAPE_CMD_ALL);

output: grep ":\" \"/etc/passwd" /var/data/*

The option flag has the three different value.
There is no backward incompatibility because the default behavior 
(ESCAPE_CMD_PAIR) is unchanged.

ESCAPE_CMD_PAIR : escape if it is not paired (default)
ESCAPE_CMD_END  : escape except for end/beginning of the string
ESCAPE_CMD_ALL  : escape without exception (recommended)

In Windows, the quotation is always escaped, but, in other environment,
it is only escaped if it is not paired.

It is highly recommended to apply this patch to prevent the possible shell 
command injection attack.









Test script:
---


Expected result:

grep ":\" \"/etc/passwd" /var/data/*


Actual result:
--
grep ":" "/etc/passwd" /var/data/*
[content of /etc/passwd]








-- 
Edit this bug report at https://bugs.php.net/bug.php?id=60116&edit=1


Bug #42290 [Asn->Csd]: mb_eregi_replace() is not case-insensitive with multibyte pattern

2011-10-15 Thread hirokawa
Edit report at https://bugs.php.net/bug.php?id=42290&edit=1

 ID: 42290
 Updated by: hirok...@php.net
 Reported by:arysin at gmail dot com
 Summary:mb_eregi_replace() is not case-insensitive with
 multibyte pattern
-Status: Assigned
+Status: Closed
 Type:   Bug
 Package:mbstring related
 Operating System:   *
 PHP Version:5.2CVS-2007-08-14
 Assigned To:hirokawa
 Block user comment: N
 Private report: N

 New Comment:

This bug has been fixed in SVN.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.

 For Windows:

http://windows.php.net/snapshots/
 
Thank you for the report, and for helping us make PHP better.

Prior to PHP 5.4.0, the case-insensitive match of Unicode except for LATIN-1 
area was not supported by the bundled multibyte regex library (Oniguruma 4.7.2).
The Oniguruma library was updated to the newest version (5.9.2) which fully 
supports the Unicode property.


Previous Comments:

[2011-10-15 08:55:55] hirok...@php.net

Automatic comment from SVN on behalf of hirokawa
Revision: http://svn.php.net/viewvc/?view=revision&revision=318132
Log: updated bundled oniguruma regex library to 5.9.2. fixed bug #42290.


[2010-10-13 01:29:36] gevorg dot ha at gmail dot com

Hi, 

please find code snippet which shoes that it doesn't work:

mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");

// Text contains three words with same letters, only with some uppercases.
$hText = 'ՀԱՅԱՍՏԱՆԸ Հայաստան հայաստան';

// None of these two is working and only the last word is being replaced.
echo mb_eregi_replace ('Õ°Õ¡ÕµÕ¡Õ½Õ¿Õ¡Õ¶', '\\0', 
$hText).''; 
echo mb_ereg_replace ('Õ°Õ¡ÕµÕ¡Õ½Õ¿Õ¡Õ¶', '\\0', $hText, 
'msri').''; 

Best,
Gevorg


[2010-08-28 03:20:07] hirok...@php.net

Could you show me the detailed information such as, 

- code snippet which can reproduce the problem.
- setting information of mbstring.* in php.ini
- character encoding which you are using.
- version/locale of your OS.


[2010-08-27 16:36:18] bubalula at gmail dot com

I tried also on another server with php version 5.2.11 and it does not work 
either.


[2010-08-27 16:22:05] bubalula at gmail dot com

I have the same problem in version 5.2.12.
I don't know why this bug isn't taken seriously as it creates big problems for 
us working with non latin languages.




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=42290


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=42290&edit=1


Bug #40685 [Bgs->Csd]: '&&&' => '&&' at mb_decode_numericentity

2011-09-23 Thread hirokawa
Edit report at https://bugs.php.net/bug.php?id=40685&edit=1

 ID: 40685
 Updated by: hirok...@php.net
 Reported by:dq2 at compass dot jp
 Summary:'&&&' => '&&' at mb_decode_numericentity
-Status: Bogus
+Status: Closed
 Type:   Bug
 Package:mbstring related
 Operating System:   winXP
 PHP Version:4.4.6
-Assigned To:
+Assigned To:hirokawa
 Block user comment: N
 Private report: N

 New Comment:

This bug has been fixed in SVN.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.

 For Windows:

http://windows.php.net/snapshots/
 
Thank you for the report, and for helping us make PHP better.




Previous Comments:

[2011-09-24 02:20:19] hirok...@php.net

Automatic comment from SVN on behalf of hirokawa
Revision: http://svn.php.net/viewvc/?view=revision&revision=317232
Log: MFH: fixed #40685: removed '&' in mb_decode_numericentity().


[2011-09-24 02:11:58] hirok...@php.net

Automatic comment from SVN on behalf of hirokawa
Revision: http://svn.php.net/viewvc/?view=revision&revision=317231
Log: MFH: fixed #40685: removed '&' in mb_decode_numericentity().

----
[2011-09-24 02:11:30] hirok...@php.net

Automatic comment from SVN on behalf of hirokawa
Revision: http://svn.php.net/viewvc/?view=revision&revision=317230
Log: added tests for #40685.

----
[2011-09-24 02:10:59] hirok...@php.net

Automatic comment from SVN on behalf of hirokawa
Revision: http://svn.php.net/viewvc/?view=revision&revision=317229
Log: fixed #40685: removed '&' in mb_decode_numericentity().


[2007-03-01 20:59:31] dq2 at compass dot jp

thank you, i know '&&&' is not numeric entity,
but i find strange to reduce '&'.
and sorry expected and actuary vice versa




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=40685


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=40685&edit=1


Bug #42290 [NoF->Asn]: mb_eregi_replace() is not case-insensitive with multibyte pattern

2010-08-27 Thread hirokawa
Edit report at http://bugs.php.net/bug.php?id=42290&edit=1

 ID: 42290
 Updated by: hirok...@php.net
 Reported by:arysin at gmail dot com
 Summary:mb_eregi_replace() is not case-insensitive with
 multibyte pattern
-Status: No Feedback
+Status: Assigned
 Type:   Bug
 Package:mbstring related
 Operating System:   *
 PHP Version:5.2CVS-2007-08-14
 Assigned To:hirokawa
 Block user comment: N

 New Comment:

Could you show me the detailed information such as, 



- code snippet which can reproduce the problem.

- setting information of mbstring.* in php.ini

- character encoding which you are using.

- version/locale of your OS.


Previous Comments:

[2010-08-27 16:36:18] bubalula at gmail dot com

I tried also on another server with php version 5.2.11 and it does not
work either.


[2010-08-27 16:22:05] bubalula at gmail dot com

I have the same problem in version 5.2.12.

I don't know why this bug isn't taken seriously as it creates big
problems for us working with non latin languages.


[2009-09-30 13:12:10] babson at gmail dot com

I am using PHP version 5.2.9 and have the same problem.

I tried sample by arysin and got the same result as he did.



What can be done?


[2009-04-15 16:04:55] rvorojbit at gmail dot com

I am also having the exact same problem now as was described in the
previous post last year!!! Is there any workaround for this bug? I
didn't find any in google...


[2008-05-03 07:38:12] admin at bg-history dot info

I got the same problem with UTF-8 encoding, using Cyrillic.



While trying to make "search highlight" neither "eregi_replace", nor
"str-ireplace" functions actually "got" the capital letter...



for example:



$str="общи";



$newstr="Общи";



$bodytext = str_ireplace($str, "".$str."", $bodytext);



$bodytext2 = str_ireplace($newstr, "".$newstr."", $bodytext);



in $bodytext there is a word "Общи". Although I
used case insensitive replace, only in $bodytext2 the word is
highlighted.



I've searched a lot for an issue, that solves that problem, and found
none. 



P.S. Sorry for my English, hope it's understandable.




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

http://bugs.php.net/bug.php?id=42290


-- 
Edit this bug report at http://bugs.php.net/bug.php?id=42290&edit=1


#46131 [Bgs->Fbk]: mb_check_encoding returns wrong result when using iso-2022-jp character set

2008-11-07 Thread hirokawa
 ID:   46131
 Updated by:   [EMAIL PROTECTED]
 Reported By:  areid at lumerical dot com
-Status:   Bogus
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: RHEL5
 PHP Version:  5.2.6
 Assigned To:  hirokawa
 New Comment:

ISO-2022-JP doesn't include the vendor specific characters.
Please use ISO-2022-JP-MS instead of ISO-2022-JP.

And,
$txt = "\x1b\x24\x42\x2d\x6a" 
is wrong ISO-2022-JP encoded string.
It should be 
$txt = "\x1b\x24\x42\x2d\x6a\x1b\x28\x42".

Please try:


result: good encoding


Previous Comments:


[2008-11-08 03:00:57] [EMAIL PROTECTED]

$txt = "\x1b\x24\x42\x2d\x6a" 
is wrong ISO-2022-JP encoded string.
It should be 
$txt = "\x1b\x24\x42\x2d\x6a\x1b\x28\x42".

Please try:


result: good encoding



[2008-10-30 10:42:26] [EMAIL PROTECTED]

Assigned to the maintainer.



[2008-09-19 20:16:36] areid at lumerical dot com

Description:

The mb_check_encoding function returns false when a particular Japanese
character is used with the iso-2022-jp character set. The offending
character has hex code 2d6a. This is a special character representing
"incorporated". The character itself does not seem to be in the JIS X
0208-1983 character table, but most windows applications seem to
recognize it (Outlook, Firefox, Explorer, etc). In this particular case,
the original text was composed in Outlook.

Reproduce code:
---
//This is valid iso-2022-jp code for
//this single Japanese character representing incorporated
$txt = "\x1b\x24\x42\x2d\x6a";

//The output of the below code will be "bad encoding"
if(mb_check_encoding($txt,'ISO-2022-JP')){
echo 'good encoding';
}else{
echo 'bad encoding';


Expected result:

"good encoding" should be printed

Actual result:
--
"bad encoding" is printed





-- 
Edit this bug report at http://bugs.php.net/?id=46131&edit=1



#46131 [Asn->Bgs]: mb_check_encoding returns wrong result when using iso-2022-jp character set

2008-11-07 Thread hirokawa
 ID:   46131
 Updated by:   [EMAIL PROTECTED]
 Reported By:  areid at lumerical dot com
-Status:   Assigned
+Status:   Bogus
 Bug Type: mbstring related
 Operating System: RHEL5
 PHP Version:  5.2.6
 Assigned To:  hirokawa
 New Comment:

$txt = "\x1b\x24\x42\x2d\x6a" 
is wrong ISO-2022-JP encoded string.
It should be 
$txt = "\x1b\x24\x42\x2d\x6a\x1b\x28\x42".

Please try:


result: good encoding


Previous Comments:


[2008-10-30 10:42:26] [EMAIL PROTECTED]

Assigned to the maintainer.



[2008-09-19 20:16:36] areid at lumerical dot com

Description:

The mb_check_encoding function returns false when a particular Japanese
character is used with the iso-2022-jp character set. The offending
character has hex code 2d6a. This is a special character representing
"incorporated". The character itself does not seem to be in the JIS X
0208-1983 character table, but most windows applications seem to
recognize it (Outlook, Firefox, Explorer, etc). In this particular case,
the original text was composed in Outlook.

Reproduce code:
---
//This is valid iso-2022-jp code for
//this single Japanese character representing incorporated
$txt = "\x1b\x24\x42\x2d\x6a";

//The output of the below code will be "bad encoding"
if(mb_check_encoding($txt,'ISO-2022-JP')){
echo 'good encoding';
}else{
echo 'bad encoding';


Expected result:

"good encoding" should be printed

Actual result:
--
"bad encoding" is printed





-- 
Edit this bug report at http://bugs.php.net/?id=46131&edit=1



#45993 [Asn->Tbd]: mb_detect_encoding and mb_check_encoding results are dissonant

2008-11-07 Thread hirokawa
 ID:   45993
 Updated by:   [EMAIL PROTECTED]
 Reported By:  mtrojan at transline dot de
-Status:   Assigned
+Status:   To be documented
 Bug Type: mbstring related
 Operating System: Windows XP
 PHP Version:  5.2.6
 Assigned To:  hirokawa
 New Comment:

mb_detect_encoding does not support the UTF-16/UTF-16BE 
encoding detection. Because UTF-16 isn't byte stream encoding like
UTF-8, we cannot detect the encoding as other byte stream encoding.

The file encoded in UTF-16 can be detected easily using BOM, 
it is like,

if ($content[0]==chr(0xff) && $content[1]==chr(0xfe)) {
  echo 'UTF-16';
} else if ($content[0]==chr(0xfe) && $content[1]==chr(0xff)) {
  echo 'UTF-16BE';
}








Previous Comments:


[2008-10-26 23:01:49] [EMAIL PROTECTED]

Assigned to the mbstring maintainer.



[2008-09-04 11:47:39] mtrojan at transline dot de

Description:

mb_detect_encoding does not seem to recognize UTF-16 encoded files
properly. Even if it is assured by using mb_check_encoding that a file
is truly UTF-16LE, mb_detect_encoding does not detect the same file as
UTF-16 and is returning ISO-8859-1 instead. Activating/deactivating
strict mode has no influence on the result.

Reproduce code:
---
$content = file_get_contents($src_path);

$encodings = array('UTF-16', 'UTF-16LE', 'UTF-16BE', 'UTF-8',
'UNICODE', 'ISO-8859-1');

$enc = mb_detect_encoding($content, $encodings);
print "encoding: $enc\n";

print 'checked: ' . intval(mb_check_encoding($content, 'UTF-16LE'));

Expected result:

encoding: UTF-16LE
checked: 1

Actual result:
--
encoding: ISO-8859-1
checked: 1





-- 
Edit this bug report at http://bugs.php.net/?id=45993&edit=1



#27421 [NoF->Csd]: mbstring.func_overload set in .htaccess becomes global

2008-10-03 Thread hirokawa
 ID:   27421
 Updated by:   [EMAIL PROTECTED]
 Reported By:  php at strategma dot bg
-Status:   No Feedback
+Status:   Closed
 Bug Type: mbstring related
 Operating System: *
 PHP Version:  5.2.5
 Assigned To:  hirokawa
 New Comment:

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.

The fix is already applied in PHP 5.2 CVS and PHP 5.3 CVS.
The snapshot is also available from http://snaps.php.net




Previous Comments:


[2008-09-21 19:39:42] dollar80 at freemail dot hu

I need reloed some of moves.



[2008-09-19 14:37:56] torkel at eonbit dot com

I applied the patch from 'david at dfoerster dot de' and it solved the
issue.

In our case the mbstring.func_overload was set inside an apache virtual
host, and the setting became global. I.e. leaked into other virtual
hosts.

Thank you very much for providing this. It has been a real headache for
on of out customer.

Can you please include it in the next release? I'll be happy to provide
more information.



[2008-09-10 05:12:17] awad3 at hotmail dot com

I Download a file from Internet I coudn’t Open It Please Can you Help
Me ?
The File:
Attachment.PhP



[2008-08-08 10:47:47] david at dfoerster dot de

Thank you for applying the patch. Is it also in the 5.2 branch?

Now this is fixed you might want to remove the note about the 
per-directory-context from the documentation or with which version 
it's supposed to work.



[2008-08-05 01:00:01] php-bugs at lists dot php dot net

No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/27421

-- 
Edit this bug report at http://bugs.php.net/?id=27421&edit=1



#27421 [Asn->Fbk]: mbstring.func_overload set in .htaccess becomes global

2008-07-28 Thread hirokawa
 ID:   27421
 Updated by:   [EMAIL PROTECTED]
 Reported By:  php at strategma dot bg
-Status:   Assigned
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: *
 PHP Version:  5.2.5
 Assigned To:  hirokawa
 New Comment:

Please try using this CVS snapshot:

  http://snaps.php.net/php5.3-latest.tar.gz
 
For Windows (zip):
 
  http://snaps.php.net/win32/php5.3-win32-latest.zip

For Windows (installer):

  http://snaps.php.net/win32/php5.3-win32-installer-latest.msi




Previous Comments:


[2008-07-13 15:15:26] [EMAIL PROTECTED]

Rui, didn't you just apply a patch that "fixes" this since it can't be
set per-directory anymore?



[2008-06-10 08:39:42] future at shiny dot co dot il

Oops, my mistake. David's patch DOES solve the issue. (I just forgot to
rebuild the module :)



[2008-06-10 02:10:56] future at shiny dot co dot il

David, unfortunately your patch doesn't seem to solve the problem.

Furthermore, on my system, strlen never seems to be overridden
(mb_orig_strlen never exists) while substr always remains overridden
(mb_orig_substr always exists).

Are you sure this shutdown sequence is even being run?



[2008-03-19 18:28:25] david at dfoerster dot de

> It is not recommended to use the function overloading option in 
> the per-directory context, because it's not confirmed yet to be 
> stable enough in a production environment and may lead to 
> undefined behaviour. 

Once the patch is applied this notice can probably be removed from 
the documentation.



[2008-03-19 18:08:46] david at dfoerster dot de

Hi,

this patch fixes the problem (didn't find a way to attach a patch 
here):
http://www.dfoerster.de/misc/php-27421.diff

The problem was that the while loop in PHP_RSHUTDOWN_FUNCTION would 
terminate on the first function that was not overloaded. With a 
settin of 2, the str* functions would never be restored, because the 
mail function was not overloaded.

The patch changes the behaviour to be similar to the loop in 
PHP_RINIT_FUNCTION.



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/27421

-- 
Edit this bug report at http://bugs.php.net/?id=27421&edit=1



#43227 [Asn->Fbk]: eregi() mbregex compile err: premature end of regular expression in

2008-07-12 Thread hirokawa
 ID:   43227
 Updated by:   [EMAIL PROTECTED]
 Reported By:  baco at infomaniak dot ch
-Status:   Assigned
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: Linux Debian
 PHP Version:  5.2.5
 Assigned To:  hirokawa
 New Comment:

Please try using this CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows (zip):
 
  http://snaps.php.net/win32/php5.2-win32-latest.zip

For Windows (installer):

  http://snaps.php.net/win32/php5.2-win32-installer-latest.msi




Previous Comments:


[2008-07-11 21:32:22] [EMAIL PROTECTED]

Assigned to mbstring maintainer.



[2008-02-25 13:31:02] baco at infomaniak dot ch

As workaround try to force mbstring.func_overload = 0 in your php.ini
and use this patch.

PHP5

unix_mbstring_func_overload.patch
--- ext/mbstring/mbstring.c 2007-09-24 13:51:36.0 +0200
+++ ext/mbstring/mbstring.c 2007-12-04 18:00:10.023564681 +0100
@@ -765,8 +765,8 @@
 PHP_INI_ENTRY("mbstring.script_encoding", NULL, PHP_INI_ALL, 
OnUpdate_mbstring_script_encoding)
 #endif /* ZEND_MULTIBYTE */
 PHP_INI_ENTRY("mbstring.substitute_character", NULL, 
PHP_INI_ALL, OnUpdate_mbstring_substitute_character)
-STD_PHP_INI_ENTRY("mbstring.func_overload", "0", 
PHP_INI_SYSTEM |
-PHP_INI_PERDIR, OnUpdateLong, func_overload, 
zend_mbstring_globals, mbstring_globals)
+STD_PHP_INI_ENTRY("mbstring.func_overload", "0",
+PHP_INI_SYSTEM, OnUpdateLong, func_overload, 
zend_mbstring_globals, mbstring_globals)

 STD_PHP_INI_BOOLEAN("mbstring.encoding_translation", "0",
 PHP_INI_SYSTEM | PHP_INI_PERDIR, 
OnUpdate_mbstring_encoding_translation,

PHP4

--- ext/mbstring/mbstring.c 2007-04-04 17:28:18.0 +0200
+++ ext/mbstring/mbstring.c 2007-12-04 18:05:29.363559316 +0100
@@ -815,8 +815,8 @@
 PHP_INI_ENTRY("mbstring.script_encoding", NULL, PHP_INI_ALL, 
OnUpdate_mbstring_script_encoding)
 #endif /* ZEND_MULTIBYTE */
 PHP_INI_ENTRY("mbstring.substitute_character", NULL, 
PHP_INI_ALL, OnUpdate_mbstring_substitute_character)
-STD_PHP_INI_ENTRY("mbstring.func_overload", "0", 
PHP_INI_SYSTEM |
-PHP_INI_PERDIR, OnUpdateInt, func_overload, 
zend_mbstring_globals, mbstring_globals)
+STD_PHP_INI_ENTRY("mbstring.func_overload", "0",
+PHP_INI_SYSTEM, OnUpdateInt, func_overload, 
zend_mbstring_globals, mbstring_globals)

 STD_PHP_INI_BOOLEAN("mbstring.encoding_translation", "0",
 PHP_INI_SYSTEM | PHP_INI_PERDIR, 
OnUpdate_mbstring_encoding_translation,



[2008-02-25 13:18:00] lip at lip dot net dot ua

I think these bugs are similar.
http://bugs.php.net/bug.php?id=44237



[2007-11-09 16:03:14] baco at infomaniak dot ch

Description:

eregi() produce random errors like "function.mb-eregi: mbregex compile
err: premature end of regular expression in" when used with special
chars like accents.

N.B. On the web you can found a lot of reports of this issue. Some post
suggests forcing mbstring.func_overload = 0 but it doesn't work for me.

If Apache1 is restarted the error doesn't come anymore before an amount
of time and request.

$ GET http://localhost/test.php
ok

$ GET http://localhost/test.php
ok

$ GET http://localhost/test.php

Warning:  mb_eregi() [function.mb-eregi]: mbregex compile err:
premature end of regular expression in
/home/www/ca8b72beb934995c1afb34e1a3ceb893/web/test.php on line
2

$ GET http://localhost/test.php

Warning:  mb_eregi() [function.mb-eregi]: mbregex compile err:
premature end of regular expression in
/home/www/ca8b72beb934995c1afb34e1a3ceb893/web/test.php on line
2

$ GET http://localhost/test.php

Warning:  mb_eregi() [function.mb-eregi]: mbregex compile err:
premature end of regular expression in
/home/www/ca8b72beb934995c1afb34e1a3ceb893/web/test.php on line
2

$ GET http://localhost/test.php
ok

$ GET http://localhost/test.php
ok

...


Reproduce code:
---


Expected result:

OK

Actual result:
--

Warning:  mb_eregi() [function.mb-eregi]: mbregex compile err:
premature end of regular expression in
/home/www/ca8b72beb934995c1afb34e1a3ceb893/web/test.php on line
2





-- 
Edit this bug report at http://bugs.php.net/?id=43227&edit=1



#42101 [Asn->NoF]: mb_substr() misbehaves when length = PHP_INT_MAX (64bit issue)

2008-02-27 Thread hirokawa
 ID:   42101
 Updated by:   [EMAIL PROTECTED]
 Reported By:  mcorne at yahoo dot com
-Status:   Assigned
+Status:   No Feedback
 Bug Type: mbstring related
 Operating System: Linux x86-64
 PHP Version:  5.2.4RC2-dev
 Assigned To:  hirokawa
 New Comment:

No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Open". Thank you.




Previous Comments:


[2007-09-22 01:32:00] [EMAIL PROTECTED]

I reproduced the same issue with mb_substr() on my Athlon 64/x2
machine.

I believe that substr() is also has the same 64bit issue.

It is a sample script (tested on my x86/64 Ubuntu Linux, Athlon 64x2)


I think PHP itself is not 64bit compatible.
Why didn't you submit a bug report for substr() ?





[2007-08-17 13:49:09] [EMAIL PROTECTED]

Assigned to the maintainer of mbstring extension.



[2007-08-15 06:45:07] mcorne at yahoo dot com

Same issue on the latest release.
Test done on:
PHP Version => 5.2.4RC2-dev
System => Linux durbatuluk 2.6.20-16-generic #2 SMP Thu Jun 7 19:00:28
UTC 2007 x86_64
Build Date => Aug 13 2007 21:59:11



[2007-07-25 12:10:28] mcorne at yahoo dot com

Description:

mb_substr("\x44\xCC\x87", 0, PHP_INT_MAX, 'UTF-8') only captures the
first character on linux 64-bit instead of returning the whole string.
Note that this works fine on Windows XP and Linux 32-bit.

Reproduce code:
---
function substring($string, $length)
{
$substr = mb_substr($string, 0, $length , 'UTF-8');
$length = strlen($substr);
$chars = $length? unpack("C{$length}chars", $substr) : array();
$decs = array_map('dechex', $chars);
return array($substr, $decs);
}

$test['string'] = "\x44\xCC\x87";
$test['utf8'] = '\x44\xCC\x87';
$test['unicode'] = '\u0044\u0307';
$test['PHP_INT_MAX'] = PHP_INT_MAX;
$test['php_int_max'] = substring($test['string'], PHP_INT_MAX);
$test[''] = substring($test['string'], );

print_r($test);


Expected result:

Array
(
[string] => Ḋ
[utf8] => \x44\xCC\x87
[unicode] => \u0044\u0307
[PHP_INT_MAX] => 2147483647
[php_int_max] => Array
(
[0] => Ḋ
[1] => Array
(
[chars1] => 44
[chars2] => cc
[chars3] => 87
)

)

[] => Array
(
[0] => Ḋ
[1] => Array
(
[chars1] => 44
[chars2] => cc
[chars3] => 87
)

)

)

Actual result:
--
Array
(
[string] => Ḋ
[utf8] => \x44\xCC\x87
[unicode] => \u0044\u0307
[PHP_INT_MAX] => 2147483647
[php_int_max] => Array
(
[0] => D
[1] => Array
(
[chars1] => 44
)

)

[] => Array
(
[0] => Ḋ
[1] => Array
(
[chars1] => 44
[chars2] => cc
[chars3] => 87
)

)

)





-- 
Edit this bug report at http://bugs.php.net/?id=42101&edit=1


#43840 [NoF->Csd]: mb_strpos bounds check is byte count rather than a character count

2008-02-27 Thread hirokawa
 ID:   43840
 Updated by:   [EMAIL PROTECTED]
 Reported By:  [EMAIL PROTECTED]
-Status:   No Feedback
+Status:   Closed
 Bug Type: mbstring related
 Operating System: Windows XP
 PHP Version:  5.2CVS-2008-01-14 (snap)
 Assigned To:  hirokawa


Previous Comments:


[2008-02-26 01:00:00] php-bugs at lists dot php dot net

No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".



[2008-02-18 14:15:14] [EMAIL PROTECTED]

I've run the above example on the latest 5.2 and 5.3 snapshots and it's
behaving as I expected now. Thanks for making the change!



[2008-02-16 08:58:15] [EMAIL PROTECTED]

Please try using this CVS snapshot:

  http://snaps.php.net/php5.3-latest.tar.gz
 
For Windows (zip):
 
  http://snaps.php.net/win32/php5.3-win32-latest.zip

For Windows (installer):

  http://snaps.php.net/win32/php5.3-win32-installer-latest.msi





[2008-02-12 09:50:04] [EMAIL PROTECTED]

Here is the entire mbstring section of my php.ini file, I haven't
changed it from the default that comes when you download PHP.
[mbstring]
; language for internal character representation.
;mbstring.language = Japanese

; internal/script encoding.
; Some encoding cannot work as internal encoding.
; (e.g. SJIS, BIG5, ISO-2022-*)
;mbstring.internal_encoding = EUC-JP

; http input encoding.
;mbstring.http_input = auto

; http output encoding. mb_output_handler must be
; registered as output buffer to function
;mbstring.http_output = SJIS

; enable automatic encoding translation according to
; mbstring.internal_encoding setting. Input chars are
; converted to internal encoding by setting this to On.
; Note: Do _not_ use automatic encoding translation for
;   portable libs/applications.
;mbstring.encoding_translation = Off

; automatic encoding detection order.
; auto means
;mbstring.detect_order = auto

; substitute_character used when character cannot be converted
; one from another
;mbstring.substitute_character = none;

; overload(replace) single byte functions by mbstring functions.
; mail(), ereg(), etc are overloaded by mb_send_mail(), mb_ereg(),
; etc. Possible values are 0,1,2,4 or combination of them.
; For example, 7 for overload everything.
; 0: No overload
; 1: Overload mail() function
; 2: Overload str*() functions
; 4: Overload ereg*() functions
;mbstring.func_overload = 0

Thanks



[2008-02-10 00:30:07] [EMAIL PROTECTED]

Could you show me the mbstring related setting (mbstring.*)
in your php.ini ?




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/43840

-- 
Edit this bug report at http://bugs.php.net/?id=43840&edit=1


#44014 [Asn->Fbk]: mb_convert_encoding 'destroys' first character (UTF16->UTF8)

2008-02-16 Thread hirokawa
 ID:   44014
 Updated by:   [EMAIL PROTECTED]
 Reported By:  michael202 at gmx dot de
-Status:   Assigned
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: Win XP
 PHP Version:  5.2.5
 Assigned To:  hirokawa
 New Comment:

BOM of Unicode is not supported by encoding conversion function 
in mbstring.

And big endian is default in UTF-16. Please specify 'UTF-16LE'
if you need to specify little endian format.

Try,

 Mo
?>

or

 Mo
?>



Previous Comments:


[2008-02-05 05:10:37] [EMAIL PROTECTED]

Assigned to the mbstring maintainer.



[2008-02-01 12:08:07] michael202 at gmx dot de

Description:

mb_convert_encoding 'destroys' first character when
converting from UTF16 to UTF8

(iconv works).

Reproduce code:
---
$utf16 = chr(0xFF).chr(0xFE).chr(0x4d).chr(0).chr(0x6f).chr(0); //'Mo'

$utf8 = mb_convert_encoding($utf16, 'UTF-8', 'UTF-16');  

echo($utf8 . "\n"); // -> ´++´¢ìo

$utf8 = iconv('UTF-16', 'UTF-8', $utf16);  

echo($utf8 . "\n"); // -> Mo 


Expected result:

mb:(BOM8)Mo
iconv: Mo

(BOM8) is a placeholder

Actual result:
--
mb:(BOM8)´¢ìo  (copied from cmd shell)
iconv: Mo

(BOM8) is a placeholder







-- 
Edit this bug report at http://bugs.php.net/?id=44014&edit=1


#43998 [Asn->Fbk]: Two error messages returned for incorrect encoding for mb_strto[upper|lower]

2008-02-16 Thread hirokawa
 ID:   43998
 Updated by:   [EMAIL PROTECTED]
 Reported By:  josmessa at uk dot ibm dot com
-Status:   Assigned
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: Windows XP SP2
 PHP Version:  5.2CVS-2008-01-31 (snap)
 Assigned To:  hirokawa
 New Comment:

Please try using this CVS snapshot:

  http://snaps.php.net/php5.3-latest.tar.gz
 
For Windows (zip):
 
  http://snaps.php.net/win32/php5.3-win32-latest.zip

For Windows (installer):

  http://snaps.php.net/win32/php5.3-win32-installer-latest.msi




Previous Comments:


[2008-01-31 17:04:13] [EMAIL PROTECTED]

Assign to extension maintainer



[2008-01-31 16:09:24] josmessa at uk dot ibm dot com

Description:

When an incorrect or unknown encoding is passed to
mb_strto[upper|lower] two error messages are returned which both are
warning about the same thing.
In some cases, one error message is returned as well as an
upper/lowercased string, but this behaviour is not documented.

Reproduce code:
---


Expected result:

Only one error message should be returned for iterations 1-3

Actual result:
--
-- Iteration 1 --

Warning: mb_strtolower(): Illegal character encoding specified in
...\mb_strtolower.php on line 8

Warning: mb_strtolower(): Unknown encoding "12345" in
...\mb_strtolower.php on line 8
bool(false)

Warning: mb_strtoupper(): Illegal character encoding specified in
...\mb_strtolower.php on line 9

Warning: mb_strtoupper(): Unknown encoding "12345" in
...\mb_strtolower.php on line 9
bool(false)

-- Iteration 2 --

Warning: mb_strtolower(): Illegal character encoding specified in
...\mb_strtolower.php on line 8

Warning: mb_strtolower(): Unknown encoding "1.23456789E-9" in
...\mb_strtolower.php on line 8
bool(false)

Warning: mb_strtoupper(): Illegal character encoding specified in
...\mb_strtolower.php on line 9

Warning: mb_strtoupper(): Unknown encoding "1.23456789E-9" in
...\mb_strtolower.php on line 9
bool(false)

-- Iteration 3 --

Warning: mb_strtolower(): Illegal character encoding specified in
...\mb_strtolower.php on line 8

Warning: mb_strtolower(): Unknown encoding "1" in ...\mb_strtolower.php
on line 8
bool(false)

Warning: mb_strtoupper(): Illegal character encoding specified in
...\mb_strtolower.php on line 9

Warning: mb_strtoupper(): Unknown encoding "1" in ...\mb_strtolower.php
on line 9
bool(false)

-- Iteration 4 --

Warning: mb_strtolower(): Illegal character encoding specified in
...\mb_strtolower.php on line 8
string(12) "hello, world"

Warning: mb_strtoupper(): Illegal character encoding specified in
...\mb_strtolower.php on line 9
string(12) "HELLO, WORLD"

-- Iteration 5 --

Warning: mb_strtolower(): Illegal character encoding specified in
...\mb_strtolower.php on line 8
string(12) "hello, world"

Warning: mb_strtoupper(): Illegal character encoding specified in
...\mb_strtolower.php on line 9
string(12) "HELLO, WORLD"






-- 
Edit this bug report at http://bugs.php.net/?id=43998&edit=1


#43995 [Asn->Bgs]: mb_ereg returns byte length of string instead of character length

2008-02-16 Thread hirokawa
 ID:   43995
 Updated by:   [EMAIL PROTECTED]
 Reported By:  josmessa at uk dot ibm dot com
-Status:   Assigned
+Status:   Bogus
 Bug Type: mbstring related
 Operating System: Windows XP SP2
 PHP Version:  5.2CVS-2008-01-31 (snap)
 Assigned To:  hirokawa
 New Comment:

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

>From PHP manual in http://jp.php.net/manual/en/function.mb-ereg.php,
the function returns the byte length of matched part.




Previous Comments:


[2008-01-31 17:05:20] [EMAIL PROTECTED]

assign to extension maintainer



[2008-01-31 15:20:09] josmessa at uk dot ibm dot com

Description:

When the $regs argument is provided, mb_ereg will return the length of
the matched string. The integer returned though is the byte length of
the  string instead of the character length, which seems illogical for a
multibyte string function.

Reproduce code:
---


Expected result:

Multibyte String without $regs arg: int(1)
Multubyte String with $regs arg:int(21)
Character length of matched string: int(21)

Actual result:
--
Multibyte String without $regs arg: int(1)
Multubyte String with $regs arg:int(53)
Character length of matched string: int(21)





-- 
Edit this bug report at http://bugs.php.net/?id=43995&edit=1


#43994 [Asn->Fbk]: mb_ereg 'successfully' matching incorrectly

2008-02-16 Thread hirokawa
 ID:   43994
 Updated by:   [EMAIL PROTECTED]
 Reported By:  josmessa at uk dot ibm dot com
-Status:   Assigned
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: Windows XP SP2
 PHP Version:  5.2CVS-2008-01-31 (snap)
 Assigned To:  hirokawa
 New Comment:

Please try using this CVS snapshot:

  http://snaps.php.net/php5.3-latest.tar.gz
 
For Windows (zip):
 
  http://snaps.php.net/win32/php5.3-win32-latest.zip

For Windows (installer):

  http://snaps.php.net/win32/php5.3-win32-installer-latest.msi




Previous Comments:


[2008-01-31 17:06:05] [EMAIL PROTECTED]

Assign to extension maintainer



[2008-01-31 15:04:31] josmessa at uk dot ibm dot com

Description:

When mb_ereg is passed certain data types as the $pattern argument it
will return int(1), i.e. a successful match, when in fact it has not
matched. This is shown by setting the $regs argument and looking at the
returned array which only contains one element which is bool(false). 

Reproduce code:
---


Expected result:

http://pastebin.com/f5f5c20ff

Actual result:
--
http://pastebin.com/f3f966bd0





-- 
Edit this bug report at http://bugs.php.net/?id=43994&edit=1


#43993 [Asn->Ana]: mb_substr_count behaves differently to substr_count with overlapping needles

2008-02-16 Thread hirokawa
 ID:   43993
 Updated by:   [EMAIL PROTECTED]
 Reported By:  josmessa at uk dot ibm dot com
-Status:   Assigned
+Status:   Analyzed
 Bug Type: mbstring related
 Operating System: Windows XP SP2
 PHP Version:  5.2CVS-2008-01-31 (snap)
 Assigned To:  hirokawa


Previous Comments:


[2008-02-10 00:52:45] [EMAIL PROTECTED]

mb_substr_count is supporting overlapping needles.
Is this a problem of substr_count ?






[2008-01-31 17:08:52] [EMAIL PROTECTED]

Assigning to extension maintainer



[2008-01-31 14:27:02] josmessa at uk dot ibm dot com

Description:

In the documentation for substr_count there is a note that says: "Note:
This function doesn't count overlapped substrings. See the example
below! ". mb_substr_count does not replicate this behaviour, and as
mb_substr_count can overload substr_count I think it should.

Reproduce code:
---


Expected result:

mb_substr_count: int(1)
substr_count:int(1)


Actual result:
--
mb_substr_count: int(2)
substr_count:int(1)






-- 
Edit this bug report at http://bugs.php.net/?id=43993&edit=1


#43841 [Asn->Fbk]: mb_strrpos offset is byte count for negative values

2008-02-16 Thread hirokawa
 ID:   43841
 Updated by:   [EMAIL PROTECTED]
 Reported By:  [EMAIL PROTECTED]
-Status:   Assigned
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: Windows XP
 PHP Version:  5.2CVS-2008-01-14 (snap)
 Assigned To:  hirokawa
 New Comment:

Please try using this CVS snapshot:

  http://snaps.php.net/php5.3-latest.tar.gz
 
For Windows (zip):
 
  http://snaps.php.net/win32/php5.3-win32-latest.zip

For Windows (installer):

  http://snaps.php.net/win32/php5.3-win32-installer-latest.msi




Previous Comments:


[2008-02-12 09:53:06] [EMAIL PROTECTED]

I also thought I'd say now that I've committed a load of mbstring tests
to CVS if you haven't seen them already. Let me know if you'd like
anything changing in them.
Thanks!



[2008-02-12 09:51:05] [EMAIL PROTECTED]

Here is the entire mbstring section of my php.ini file, I haven't
changed it from the default that comes when you download PHP.
[mbstring]
; language for internal character representation.
;mbstring.language = Japanese

; internal/script encoding.
; Some encoding cannot work as internal encoding.
; (e.g. SJIS, BIG5, ISO-2022-*)
;mbstring.internal_encoding = EUC-JP

; http input encoding.
;mbstring.http_input = auto

; http output encoding. mb_output_handler must be
; registered as output buffer to function
;mbstring.http_output = SJIS

; enable automatic encoding translation according to
; mbstring.internal_encoding setting. Input chars are
; converted to internal encoding by setting this to On.
; Note: Do _not_ use automatic encoding translation for
;   portable libs/applications.
;mbstring.encoding_translation = Off

; automatic encoding detection order.
; auto means
;mbstring.detect_order = auto

; substitute_character used when character cannot be converted
; one from another
;mbstring.substitute_character = none;

; overload(replace) single byte functions by mbstring functions.
; mail(), ereg(), etc are overloaded by mb_send_mail(), mb_ereg(),
; etc. Possible values are 0,1,2,4 or combination of them.
; For example, 7 for overload everything.
; 0: No overload
; 1: Overload mail() function
; 2: Overload str*() functions
; 4: Overload ereg*() functions
;mbstring.func_overload = 0

Thanks



[2008-02-10 00:31:11] [EMAIL PROTECTED]

Could you show me the mbstring related setting (mbstring.*)
in your php.ini ?




[2008-01-30 15:58:56] [EMAIL PROTECTED]

assigning to maintainer



[2008-01-14 16:38:46] [EMAIL PROTECTED]

Description:

The offset argument appears to do a byte count for negative values of
offset. 
In the example below, $string_ascii is 21 characters long and
$string_mb is 21 characters (53 bytes) long. In both cases the needle
appears twice, first at position 9 and secondly at position 20. 
When the offset is -24, beyond the character length of the string, it
finds $needle at position 9, when $needle would be expected to be found
when offest is -12 (i.e. behave the same as the ASCII example).

It's also worth noting that strrpos returns a notice when the offset is
outside the boundary of the string whereas mb_strrpos does not.

This may be linked to this bug: http://bugs.php.net/43840.

Reproduce code:
---


Expected result:

-- Offset is -25 --
Multibyte String:   
Notice: mb_strrpos(): Offset is greater than the length of haystack
string in ...\mb_strrpos.php on line 9
bool(false)
ASCII String:
mb_strrpos:
Notice: mb_strrpos(): Offset is greater than the length of haystack
string in ...\mb_strrpos.php on line 14
bool(false)
strrpos:
Notice: strrpos(): Offset is greater than the length of haystack string
in ...\mb_strrpos.php on line 14
bool(false)

-- Offset is -24 --
Multibyte String:   
Notice: mb_strrpos(): Offset is greater than the length of haystack
string in ...\mb_strrpos.php on line 9
bool(false)
ASCII String:
mb_strrpos:
Notice: mb_strrpos(): Offset is greater than the length of haystack
string in ...\mb_strrpos.php on line 14
bool(false)
strrpos:
Notice: strrpos(): Offset is greater than the length of haystack string
in ...\mb_strrpos.php on line 14
bool(false)

-- Offset is -13 --
Multibyte String:   bool(false)
ASCII String:
mb_strrpos: bool(false)
strrpos:bool(false)

-- Offset is -12 --
Multibyte String:   int(9)
ASCII String:
mb_strrpos: int(9)
strrpos:int(9)


Actual result:
--
-- Offset is -25 --
Multibyte String:   bool(false)
ASCII String:
mb_strrpos: bool(false)
strrpos:   

#43840 [Asn->Fbk]: mb_strpos bounds check is byte count rather than a character count

2008-02-16 Thread hirokawa
 ID:   43840
 Updated by:   [EMAIL PROTECTED]
 Reported By:  [EMAIL PROTECTED]
-Status:   Assigned
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: Windows XP
 PHP Version:  5.2CVS-2008-01-14 (snap)
 Assigned To:  hirokawa
 New Comment:

Please try using this CVS snapshot:

  http://snaps.php.net/php5.3-latest.tar.gz
 
For Windows (zip):
 
  http://snaps.php.net/win32/php5.3-win32-latest.zip

For Windows (installer):

  http://snaps.php.net/win32/php5.3-win32-installer-latest.msi




Previous Comments:


[2008-02-12 09:50:04] [EMAIL PROTECTED]

Here is the entire mbstring section of my php.ini file, I haven't
changed it from the default that comes when you download PHP.
[mbstring]
; language for internal character representation.
;mbstring.language = Japanese

; internal/script encoding.
; Some encoding cannot work as internal encoding.
; (e.g. SJIS, BIG5, ISO-2022-*)
;mbstring.internal_encoding = EUC-JP

; http input encoding.
;mbstring.http_input = auto

; http output encoding. mb_output_handler must be
; registered as output buffer to function
;mbstring.http_output = SJIS

; enable automatic encoding translation according to
; mbstring.internal_encoding setting. Input chars are
; converted to internal encoding by setting this to On.
; Note: Do _not_ use automatic encoding translation for
;   portable libs/applications.
;mbstring.encoding_translation = Off

; automatic encoding detection order.
; auto means
;mbstring.detect_order = auto

; substitute_character used when character cannot be converted
; one from another
;mbstring.substitute_character = none;

; overload(replace) single byte functions by mbstring functions.
; mail(), ereg(), etc are overloaded by mb_send_mail(), mb_ereg(),
; etc. Possible values are 0,1,2,4 or combination of them.
; For example, 7 for overload everything.
; 0: No overload
; 1: Overload mail() function
; 2: Overload str*() functions
; 4: Overload ereg*() functions
;mbstring.func_overload = 0

Thanks



[2008-02-10 00:30:07] [EMAIL PROTECTED]

Could you show me the mbstring related setting (mbstring.*)
in your php.ini ?




[2008-01-30 15:57:54] [EMAIL PROTECTED]

assigning to maintainer



[2008-01-14 16:36:52] [EMAIL PROTECTED]

Description:

The bounds check for the offest argument in mb_strpos appears to be a
byte count rather than a character count.
In the example below, $string_ascii is 21 characters long and
$string_mb is 21 characters (53 bytes) long. In both cases the needle
appears twice, first at position 9 and secondly at position 20. 
With the multibyte string example, when the offset is past the
character count of the string it would be expected to return a warning
but instead a warning is returned when offest is past the byte count.

Reproduce code:
---


Expected result:

-- Offset is 20 --
--Multibyte String:--
int(20)
--ASCII String:--
int(20)

-- Offset is 21 --
--Multibyte String:--
bool(false)
--ASCII String:--
bool(false)

-- Offset is 22 --
--Multibyte String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 9
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 11
bool(false)

-- Offset is 53 --
--Multibyte String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 9
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 11
bool(false)

-- Offset is 54 --
--Multibyte String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 9
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 11
bool(false)


Actual result:
--
-- Offset is 20 --
--Multibyte String:--
int(20)
--ASCII String:--
int(20)

-- Offset is 21 --
--Multibyte String:--
bool(false)
--ASCII String:--
bool(false)

-- Offset is 22 --
--Multibyte String:--
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 11
bool(false)

-- Offset is 53 --
--Multibyte String:--
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 11
bool(false)

-- Offset is 54 --
--Multibyte String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 9
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 11
bool(false)






-- 
Edit this bug repo

#43993 [Asn]: mb_substr_count behaves differently to substr_count with overlapping needles

2008-02-09 Thread hirokawa
 ID:   43993
 Updated by:   [EMAIL PROTECTED]
 Reported By:  josmessa at uk dot ibm dot com
 Status:   Assigned
 Bug Type: mbstring related
 Operating System: Windows XP SP2
 PHP Version:  5.2CVS-2008-01-31 (snap)
 Assigned To:  hirokawa
 New Comment:

mb_substr_count is supporting overlapping needles.
Is this a problem of substr_count ?





Previous Comments:


[2008-01-31 17:08:52] [EMAIL PROTECTED]

Assigning to extension maintainer



[2008-01-31 14:27:02] josmessa at uk dot ibm dot com

Description:

In the documentation for substr_count there is a note that says: "Note:
This function doesn't count overlapped substrings. See the example
below! ". mb_substr_count does not replicate this behaviour, and as
mb_substr_count can overload substr_count I think it should.

Reproduce code:
---


Expected result:

mb_substr_count: int(1)
substr_count:int(1)


Actual result:
--
mb_substr_count: int(2)
substr_count:int(1)






-- 
Edit this bug report at http://bugs.php.net/?id=43993&edit=1


#43841 [Asn]: mb_strrpos offset is byte count for negative values

2008-02-09 Thread hirokawa
 ID:   43841
 Updated by:   [EMAIL PROTECTED]
 Reported By:  josmessa at uk dot ibm dot com
 Status:   Assigned
 Bug Type: mbstring related
 Operating System: Windows XP
 PHP Version:  5.2CVS-2008-01-14 (snap)
 Assigned To:  hirokawa
 New Comment:

Could you show me the mbstring related setting (mbstring.*)
in your php.ini ?



Previous Comments:


[2008-01-30 15:58:56] [EMAIL PROTECTED]

assigning to maintainer



[2008-01-14 16:38:46] josmessa at uk dot ibm dot com

Description:

The offset argument appears to do a byte count for negative values of
offset. 
In the example below, $string_ascii is 21 characters long and
$string_mb is 21 characters (53 bytes) long. In both cases the needle
appears twice, first at position 9 and secondly at position 20. 
When the offset is -24, beyond the character length of the string, it
finds $needle at position 9, when $needle would be expected to be found
when offest is -12 (i.e. behave the same as the ASCII example).

It's also worth noting that strrpos returns a notice when the offset is
outside the boundary of the string whereas mb_strrpos does not.

This may be linked to this bug: http://bugs.php.net/43840.

Reproduce code:
---


Expected result:

-- Offset is -25 --
Multibyte String:   
Notice: mb_strrpos(): Offset is greater than the length of haystack
string in ...\mb_strrpos.php on line 9
bool(false)
ASCII String:
mb_strrpos:
Notice: mb_strrpos(): Offset is greater than the length of haystack
string in ...\mb_strrpos.php on line 14
bool(false)
strrpos:
Notice: strrpos(): Offset is greater than the length of haystack string
in ...\mb_strrpos.php on line 14
bool(false)

-- Offset is -24 --
Multibyte String:   
Notice: mb_strrpos(): Offset is greater than the length of haystack
string in ...\mb_strrpos.php on line 9
bool(false)
ASCII String:
mb_strrpos:
Notice: mb_strrpos(): Offset is greater than the length of haystack
string in ...\mb_strrpos.php on line 14
bool(false)
strrpos:
Notice: strrpos(): Offset is greater than the length of haystack string
in ...\mb_strrpos.php on line 14
bool(false)

-- Offset is -13 --
Multibyte String:   bool(false)
ASCII String:
mb_strrpos: bool(false)
strrpos:bool(false)

-- Offset is -12 --
Multibyte String:   int(9)
ASCII String:
mb_strrpos: int(9)
strrpos:int(9)


Actual result:
--
-- Offset is -25 --
Multibyte String:   bool(false)
ASCII String:
mb_strrpos: bool(false)
strrpos:
Notice: strrpos(): Offset is greater than the length of haystack string
in ...\mb_strrpos.php on line 14
bool(false)

-- Offset is -24 --
Multibyte String:   int(9)
ASCII String:
mb_strrpos: bool(false)
strrpos:
Notice: strrpos(): Offset is greater than the length of haystack string
in ...\mb_strrpos.php on line 14
bool(false)

-- Offset is -13 --
Multibyte String:   int(9)
ASCII String:
mb_strrpos: bool(false)
strrpos:bool(false)

-- Offset is -12 --
Multibyte String:   int(9)
ASCII String:
mb_strrpos: int(9)
strrpos:int(9)





-- 
Edit this bug report at http://bugs.php.net/?id=43841&edit=1


#43840 [Asn]: mb_strpos bounds check is byte count rather than a character count

2008-02-09 Thread hirokawa
 ID:   43840
 Updated by:   [EMAIL PROTECTED]
 Reported By:  josmessa at uk dot ibm dot com
 Status:   Assigned
 Bug Type: mbstring related
 Operating System: Windows XP
 PHP Version:  5.2CVS-2008-01-14 (snap)
 Assigned To:  hirokawa
 New Comment:

Could you show me the mbstring related setting (mbstring.*)
in your php.ini ?



Previous Comments:


[2008-01-30 15:57:54] [EMAIL PROTECTED]

assigning to maintainer



[2008-01-14 16:36:52] josmessa at uk dot ibm dot com

Description:

The bounds check for the offest argument in mb_strpos appears to be a
byte count rather than a character count.
In the example below, $string_ascii is 21 characters long and
$string_mb is 21 characters (53 bytes) long. In both cases the needle
appears twice, first at position 9 and secondly at position 20. 
With the multibyte string example, when the offset is past the
character count of the string it would be expected to return a warning
but instead a warning is returned when offest is past the byte count.

Reproduce code:
---


Expected result:

-- Offset is 20 --
--Multibyte String:--
int(20)
--ASCII String:--
int(20)

-- Offset is 21 --
--Multibyte String:--
bool(false)
--ASCII String:--
bool(false)

-- Offset is 22 --
--Multibyte String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 9
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 11
bool(false)

-- Offset is 53 --
--Multibyte String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 9
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 11
bool(false)

-- Offset is 54 --
--Multibyte String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 9
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 11
bool(false)


Actual result:
--
-- Offset is 20 --
--Multibyte String:--
int(20)
--ASCII String:--
int(20)

-- Offset is 21 --
--Multibyte String:--
bool(false)
--ASCII String:--
bool(false)

-- Offset is 22 --
--Multibyte String:--
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 11
bool(false)

-- Offset is 53 --
--Multibyte String:--
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 11
bool(false)

-- Offset is 54 --
--Multibyte String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 9
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in
...\mb_strpos.php on line 11
bool(false)






-- 
Edit this bug report at http://bugs.php.net/?id=43840&edit=1


#39404 [Asn->Csd]: Support "entity" as substitute_character setting

2007-09-24 Thread hirokawa
 ID:  39404
 Updated by:  [EMAIL PROTECTED]
 Reported By: martin dot t dot kutschker at blackbox dot net
-Status:  Assigned
+Status:  Closed
 Bug Type:Feature/Change Request
 PHP Version: 5.2.0
 Assigned To: hirokawa
 New Comment:

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.




Previous Comments:


[2006-11-07 14:27:41] [EMAIL PROTECTED]

Reclassified as feature request & assigned to maintainer.



[2006-11-06 17:09:37] martin dot t dot kutschker at blackbox dot net

Fix spelling of "entity" in the summary.



[2006-11-06 16:56:19] martin dot t dot kutschker at blackbox dot net

Description:

It would be great if the charset conversion could also output SGML/HTML
entites for missing characters in the output charset. The option "long"
is not very HTML-friendly. But with "entity" any Unicode aware browser
could deal with the missing charater.

eg
mbstring.substitute_character=long => U+3000
mbstring.substitute_character=entity =>  







-- 
Edit this bug report at http://bugs.php.net/?id=39404&edit=1


#41147 [Asn->Csd]: mb_check_encoding fails to check invalid string

2007-09-24 Thread hirokawa
 ID:   41147
 Updated by:   [EMAIL PROTECTED]
 Reported By:  teracci2002 at yahoo dot co dot jp
-Status:   Assigned
+Status:   Closed
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  5.2.1
 Assigned To:  hirokawa
 New Comment:

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.

It is a documentation problem, and it is already fixed in CVS.



Previous Comments:


[2007-09-19 20:52:48] mike at silverorange dot com

0x00, 0xe3 is a valid byte sequence in UTF-8 but by itself is not a
valid UTF-8 string (it's missing two bytes).

The function is documented as checking the validity of a string so it
should return false for this case. If the function is only supposed to
validate byte-streams then the documentation should be fixed.



[2007-09-16 08:56:57] [EMAIL PROTECTED]


Sorry for delaying response.

0x00,0x81 is also valid byte sequence in Shift_JIS
because 0x81 is a valid first byte of a double-byte 
JIS X 0208 character.

See: http://en.wikipedia.org/wiki/Shift_jis

We cannot decide the byte stream is valid or 
invalid because the last byte of byte stream (0x81)
is a valid first byte of double-byte character.
In this case, true (valid) will be returned.

The byte stream including a valid first byte +
a invalid second byte returns false.

For example,

var_dump(mb_check_encoding("\x81\x00", "Shift_JIS"));

returns false (invalid).

It is because 0x81 is valid first byte of a double-byte
JIS X0208 character, but, 0x00 is invalid second byte of
a double-byte JIS X0208 character.

And, 
0x00, 0xe3 in UTF-8, it is also 
valid byte sequence (a null byte + first byte of 
a three-byte UTF-8 character).

See: http://en.wikipedia.org/wiki/UTF-8












[2007-09-04 22:38:26] [EMAIL PROTECTED]

Did you read it Rui? (why do your reports end up as 'Analyzed' all the
time? :)



[2007-09-04 14:55:58] teracci2002 at yahoo dot co dot jp

> 0x00+0xa1 is valid byte sequence in Shift_JIS sequence.

I know it.
But 0x00+0x81 is invalid sequence in Shift_JIS.
Then, why below statement returns "bool(true)" ?

var_dump(mb_check_encoding("\x00\x81", "Shift_JIS"));

Read bug report again, please.



[2007-09-04 14:30:06] [EMAIL PROTECTED]


> No one says 0x00,0xa1 is invalid character in ShiftJIS.
I didn't say that.

0x00+0xa1 is valid byte sequence in Shift_JIS sequence.
A character in Shift_JIS encoding is encoded in either single byte 
or double byte.
In this case, the byte stream is reconigzed as two character,
a null byte and a comma character in Katakana(0xa1) 
 
see: http://hp.vector.co.jp/authors/VA013241/misc/shiftjis.html





The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/41147

-- 
Edit this bug report at http://bugs.php.net/?id=41147&edit=1


#42101 [Asn]: mb_substr() misbehaves when length = PHP_INT_MAX (64bit issue)

2007-09-21 Thread hirokawa
 ID:   42101
 Updated by:   [EMAIL PROTECTED]
 Reported By:  mcorne at yahoo dot com
 Status:   Assigned
 Bug Type: mbstring related
 Operating System: Linux x86-64
 PHP Version:  5.2.4RC2-dev
 Assigned To:  hirokawa
 New Comment:

I reproduced the same issue with mb_substr() on my Athlon 64/x2
machine.

I believe that substr() is also has the same 64bit issue.

It is a sample script (tested on my x86/64 Ubuntu Linux, Athlon 64x2)


I think PHP itself is not 64bit compatible.
Why didn't you submit a bug report for substr() ?




Previous Comments:


[2007-08-17 13:49:09] [EMAIL PROTECTED]

Assigned to the maintainer of mbstring extension.



[2007-08-15 06:45:07] mcorne at yahoo dot com

Same issue on the latest release.
Test done on:
PHP Version => 5.2.4RC2-dev
System => Linux durbatuluk 2.6.20-16-generic #2 SMP Thu Jun 7 19:00:28
UTC 2007 x86_64
Build Date => Aug 13 2007 21:59:11



[2007-07-25 12:10:28] mcorne at yahoo dot com

Description:

mb_substr("\x44\xCC\x87", 0, PHP_INT_MAX, 'UTF-8') only captures the
first character on linux 64-bit instead of returning the whole string.
Note that this works fine on Windows XP and Linux 32-bit.

Reproduce code:
---
function substring($string, $length)
{
$substr = mb_substr($string, 0, $length , 'UTF-8');
$length = strlen($substr);
$chars = $length? unpack("C{$length}chars", $substr) : array();
$decs = array_map('dechex', $chars);
return array($substr, $decs);
}

$test['string'] = "\x44\xCC\x87";
$test['utf8'] = '\x44\xCC\x87';
$test['unicode'] = '\u0044\u0307';
$test['PHP_INT_MAX'] = PHP_INT_MAX;
$test['php_int_max'] = substring($test['string'], PHP_INT_MAX);
$test[''] = substring($test['string'], );

print_r($test);


Expected result:

Array
(
[string] => Ḋ
[utf8] => \x44\xCC\x87
[unicode] => \u0044\u0307
[PHP_INT_MAX] => 2147483647
[php_int_max] => Array
(
[0] => Ḋ
[1] => Array
(
[chars1] => 44
[chars2] => cc
[chars3] => 87
)

)

[] => Array
(
[0] => Ḋ
[1] => Array
(
[chars1] => 44
[chars2] => cc
[chars3] => 87
)

)

)

Actual result:
--
Array
(
[string] => Ḋ
[utf8] => \x44\xCC\x87
[unicode] => \u0044\u0307
[PHP_INT_MAX] => 2147483647
[php_int_max] => Array
(
[0] => D
[1] => Array
(
[chars1] => 44
)

)

[] => Array
(
[0] => Ḋ
[1] => Array
(
[chars1] => 44
[chars2] => cc
[chars3] => 87
)

)

)





-- 
Edit this bug report at http://bugs.php.net/?id=42101&edit=1


#42502 [Asn->Csd]: GCC no longer implements

2007-09-18 Thread hirokawa
 ID:   42502
 Updated by:   [EMAIL PROTECTED]
 Reported By:  supportnew at byethost dot com
-Status:   Assigned
+Status:   Closed
 Bug Type: mbstring related
 Operating System: debian linux 4
 PHP Version:  5.2.4
 Assigned To:  hirokawa
 New Comment:

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.

Detection routine for stdarg.h in ext/mbstring/config.m4 is
modified/simplified.





Previous Comments:


[2007-09-16 15:59:38] supportnew at byethost dot com

I done the following

web3:~# cd /root
web3:~# vi sample.c
web3:~# gcc sample.c -o sample
web3:~# ./sample
web3:~#

web3:~# cat sample.c
#include 
int foo(int x, ...) {
va_list va;
va_start(va, x);
va_arg(va, int);
va_arg(va, char *);
va_arg(va, double);
return 0;
}
int main() {
  return foo(10, "", 3.14);
}

on running the compiled sample nothing is returned.

Just a note, php 5.2.3 compiles file still on the same servers / with
same gcc, etc.



[2007-09-16 09:56:26] [EMAIL PROTECTED]

The variable length argument defined in stdarg.h is not properly
detected for your system.

A possible workaround is force to define HAVE_STDARG_PROTOTYPES
in your main/php_config.h

#define HAVE_STDARG_PROTOTYPES 1

Could you show me the return code of 
a small program shown bellow ?

sample.c

#include 
int foo(int x, ...) {
va_list va;
va_start(va, x);
va_arg(va, int);
va_arg(va, char *);
va_arg(va, double);
return 0;
}
int main() { 
  return foo(10, "", 3.14); 
}

> gcc sample.c -o sample






[2007-09-16 09:52:26] [EMAIL PROTECTED]

#include 
int foo(int x, ...) {
va_list va;
va_start(va, x);
va_arg(va, int);
va_arg(va, char *);
va_arg(va, double);
return 0;
}
int main() { 
return foo(10, "", 3.14); 
}



[2007-09-13 05:31:01] chris at acu dot edu

This problem is also reproduceable on Solaris 10.

./configure --prefix=/opt/php-5.2.4
--with-apxs2=/usr/local/httpd/bin/apxs --with-mysql=/usr/local/mysql
--with-libxml-dir=/usr --enable-calendar --with-gd=/usr/local
--with-ttf=/usr --with-freetype-dir=/usr --enable-exif
--with-jpeg-dir=/usr --with-png-dir=/usr --with-zlib-dir=/usr --with-xsl
--with-pdo-sqlite --with-pdo-mysql=/usr/local/mysql --with-pear
--with-iconv=/usr/local --enable-ftp --with-curl=/opt/php-5.2.4
--enable-mbstring --enable-embedded-mysqli --with-gettext

using

$ gcc -v
Reading specs from /usr/sfw/lib/gcc/i386-pc-solaris2.11/3.4.3/specs
Configured with:
/builds2/sfwnv-gate/usr/src/cmd/gcc/gcc-3.4.3/configure
--prefix=/usr/sfw --with-as=/usr/sfw/bin/gas --with-gnu-as
--with-ld=/usr/ccs/bin/ld --without-gnu-ld
--enable-languages=c,c++,f77,objc --enable-shared
Thread model: posix
gcc version 3.4.3 (csl-sol210-3_4-20050802)

produces the same error as supportnew at byethost dot com



[2007-09-12 11:58:15] [EMAIL PROTECTED]

Rui, feedback given.



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/42502

-- 
Edit this bug report at http://bugs.php.net/?id=42502&edit=1


#42502 [Asn]: GCC no longer implements

2007-09-16 Thread hirokawa
 ID:   42502
 Updated by:   [EMAIL PROTECTED]
 Reported By:  supportnew at byethost dot com
 Status:   Assigned
 Bug Type: mbstring related
 Operating System: debian linux 4
 PHP Version:  5.2.4
 Assigned To:  hirokawa
 New Comment:

The variable length argument defined in stdarg.h is not properly
detected for your system.

A possible workaround is force to define HAVE_STDARG_PROTOTYPES
in your main/php_config.h

#define HAVE_STDARG_PROTOTYPES 1

Could you show me the return code of 
a small program shown bellow ?

sample.c

#include 
int foo(int x, ...) {
va_list va;
va_start(va, x);
va_arg(va, int);
va_arg(va, char *);
va_arg(va, double);
return 0;
}
int main() { 
  return foo(10, "", 3.14); 
}

> gcc sample.c -o sample





Previous Comments:


[2007-09-16 09:52:26] [EMAIL PROTECTED]

#include 
int foo(int x, ...) {
va_list va;
va_start(va, x);
va_arg(va, int);
va_arg(va, char *);
va_arg(va, double);
return 0;
}
int main() { 
return foo(10, "", 3.14); 
}



[2007-09-13 05:31:01] chris at acu dot edu

This problem is also reproduceable on Solaris 10.

./configure --prefix=/opt/php-5.2.4
--with-apxs2=/usr/local/httpd/bin/apxs --with-mysql=/usr/local/mysql
--with-libxml-dir=/usr --enable-calendar --with-gd=/usr/local
--with-ttf=/usr --with-freetype-dir=/usr --enable-exif
--with-jpeg-dir=/usr --with-png-dir=/usr --with-zlib-dir=/usr --with-xsl
--with-pdo-sqlite --with-pdo-mysql=/usr/local/mysql --with-pear
--with-iconv=/usr/local --enable-ftp --with-curl=/opt/php-5.2.4
--enable-mbstring --enable-embedded-mysqli --with-gettext

using

$ gcc -v
Reading specs from /usr/sfw/lib/gcc/i386-pc-solaris2.11/3.4.3/specs
Configured with:
/builds2/sfwnv-gate/usr/src/cmd/gcc/gcc-3.4.3/configure
--prefix=/usr/sfw --with-as=/usr/sfw/bin/gas --with-gnu-as
--with-ld=/usr/ccs/bin/ld --without-gnu-ld
--enable-languages=c,c++,f77,objc --enable-shared
Thread model: posix
gcc version 3.4.3 (csl-sol210-3_4-20050802)

produces the same error as supportnew at byethost dot com



[2007-09-12 11:58:15] [EMAIL PROTECTED]

Rui, feedback given.



[2007-09-11 20:44:55] supportnew at byethost dot com

ahh , the file 

main/php_config.h exists , and

the following values are present

/* Define if stdarg.h is available */
/* #undef HAVE_STDARG_PROTOTYPES */


/* Define if you have the  header file.  */
#define HAVE_STDARG_H 1



[2007-09-11 20:39:39] supportnew at byethost dot com

Hi ,

 find -name config.h
./ext/pcre/pcrelib/config.h
./ext/pdo_sqlite/sqlite/src/config.h
./ext/bcmath/libbcmath/src/config.h
./ext/mbstring/libmbfl/config.h
./ext/mbstring/oniguruma/win32/config.h
./ext/mbstring/oniguruma/config.h
./ext/sqlite/libsqlite/src/config.h

I cant see a file called main/config.h could this be the cause ? 

This is a direct extract from php-5.2.4.tar.bz2 sources.



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/42502

-- 
Edit this bug report at http://bugs.php.net/?id=42502&edit=1


#42502 [Asn]: GCC no longer implements

2007-09-16 Thread hirokawa
 ID:   42502
 Updated by:   [EMAIL PROTECTED]
 Reported By:  supportnew at byethost dot com
 Status:   Assigned
 Bug Type: mbstring related
 Operating System: debian linux 4
 PHP Version:  5.2.4
 Assigned To:  hirokawa
 New Comment:

#include 
int foo(int x, ...) {
va_list va;
va_start(va, x);
va_arg(va, int);
va_arg(va, char *);
va_arg(va, double);
return 0;
}
int main() { 
return foo(10, "", 3.14); 
}


Previous Comments:


[2007-09-13 05:31:01] chris at acu dot edu

This problem is also reproduceable on Solaris 10.

./configure --prefix=/opt/php-5.2.4
--with-apxs2=/usr/local/httpd/bin/apxs --with-mysql=/usr/local/mysql
--with-libxml-dir=/usr --enable-calendar --with-gd=/usr/local
--with-ttf=/usr --with-freetype-dir=/usr --enable-exif
--with-jpeg-dir=/usr --with-png-dir=/usr --with-zlib-dir=/usr --with-xsl
--with-pdo-sqlite --with-pdo-mysql=/usr/local/mysql --with-pear
--with-iconv=/usr/local --enable-ftp --with-curl=/opt/php-5.2.4
--enable-mbstring --enable-embedded-mysqli --with-gettext

using

$ gcc -v
Reading specs from /usr/sfw/lib/gcc/i386-pc-solaris2.11/3.4.3/specs
Configured with:
/builds2/sfwnv-gate/usr/src/cmd/gcc/gcc-3.4.3/configure
--prefix=/usr/sfw --with-as=/usr/sfw/bin/gas --with-gnu-as
--with-ld=/usr/ccs/bin/ld --without-gnu-ld
--enable-languages=c,c++,f77,objc --enable-shared
Thread model: posix
gcc version 3.4.3 (csl-sol210-3_4-20050802)

produces the same error as supportnew at byethost dot com



[2007-09-12 11:58:15] [EMAIL PROTECTED]

Rui, feedback given.



[2007-09-11 20:44:55] supportnew at byethost dot com

ahh , the file 

main/php_config.h exists , and

the following values are present

/* Define if stdarg.h is available */
/* #undef HAVE_STDARG_PROTOTYPES */


/* Define if you have the  header file.  */
#define HAVE_STDARG_H 1



[2007-09-11 20:39:39] supportnew at byethost dot com

Hi ,

 find -name config.h
./ext/pcre/pcrelib/config.h
./ext/pdo_sqlite/sqlite/src/config.h
./ext/bcmath/libbcmath/src/config.h
./ext/mbstring/libmbfl/config.h
./ext/mbstring/oniguruma/win32/config.h
./ext/mbstring/oniguruma/config.h
./ext/sqlite/libsqlite/src/config.h

I cant see a file called main/config.h could this be the cause ? 

This is a direct extract from php-5.2.4.tar.bz2 sources.



[2007-09-04 14:06:54] [EMAIL PROTECTED]

Please show me if HAVE_STDARG_PROTOTYPES and HAVE_STDARG_H are
 defined or not in your main/config.h

I think that HAVE_STDARG_PROTOTYPES isn't properly defined.
If it is not defined stdarg.h is not existing in you include path.




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/42502

-- 
Edit this bug report at http://bugs.php.net/?id=42502&edit=1


#41147 [Asn]: mb_check_encoding fails to check invalid string

2007-09-16 Thread hirokawa
 ID:   41147
 Updated by:   [EMAIL PROTECTED]
 Reported By:  teracci2002 at yahoo dot co dot jp
 Status:   Assigned
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  5.2.1
 Assigned To:  hirokawa
 New Comment:


Sorry for delaying response.

0x00,0x81 is also valid byte sequence in Shift_JIS
because 0x81 is a valid first byte of a double-byte 
JIS X 0208 character.

See: http://en.wikipedia.org/wiki/Shift_jis

We cannot decide the byte stream is valid or 
invalid because the last byte of byte stream (0x81)
is a valid first byte of double-byte character.
In this case, true (valid) will be returned.

The byte stream including a valid first byte +
a invalid second byte returns false.

For example,

var_dump(mb_check_encoding("\x81\x00", "Shift_JIS"));

returns false (invalid).

It is because 0x81 is valid first byte of a double-byte
JIS X0208 character, but, 0x00 is invalid second byte of
a double-byte JIS X0208 character.

And, 
0x00, 0xe3 in UTF-8, it is also 
valid byte sequence (a null byte + first byte of 
a three-byte UTF-8 character).

See: http://en.wikipedia.org/wiki/UTF-8











Previous Comments:


[2007-09-04 22:38:26] [EMAIL PROTECTED]

Did you read it Rui? (why do your reports end up as 'Analyzed' all the
time? :)



[2007-09-04 14:55:58] teracci2002 at yahoo dot co dot jp

> 0x00+0xa1 is valid byte sequence in Shift_JIS sequence.

I know it.
But 0x00+0x81 is invalid sequence in Shift_JIS.
Then, why below statement returns "bool(true)" ?

var_dump(mb_check_encoding("\x00\x81", "Shift_JIS"));

Read bug report again, please.



[2007-09-04 14:30:06] [EMAIL PROTECTED]


> No one says 0x00,0xa1 is invalid character in ShiftJIS.
I didn't say that.

0x00+0xa1 is valid byte sequence in Shift_JIS sequence.
A character in Shift_JIS encoding is encoded in either single byte 
or double byte.
In this case, the byte stream is reconigzed as two character,
a null byte and a comma character in Katakana(0xa1) 
 
see: http://hp.vector.co.jp/authors/VA013241/misc/shiftjis.html





[2007-08-19 20:10:06] [EMAIL PROTECTED]

Someone disagrees, Rui.. :)



[2007-08-18 16:00:06] teracci2002 at yahoo dot co dot jp

Just read bug report again.

No one says 0x00,0xa1 is invalid character in ShiftJIS.



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/41147

-- 
Edit this bug report at http://bugs.php.net/?id=41147&edit=1



#42290 [Asn->Fbk]: mb_eregi_replace() is not case-insensitive with multibyte pattern

2007-09-04 Thread hirokawa
 ID:   42290
 Updated by:   [EMAIL PROTECTED]
 Reported By:  arysin at gmail dot com
-Status:   Assigned
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: *
 PHP Version:  5.2CVS-2007-08-14
 Assigned To:  hirokawa


Previous Comments:


[2007-08-21 15:46:44] [EMAIL PROTECTED]

arysin,

What kind of encoding you are using ?

For UTF-8 and ISO-8859-1, 0x8a is assigned to Line Tab.

  c.f.: http://en.wikipedia.org/wiki/ISO_8859-1
   http://en.wikipedia.org/wiki/UTF-8

In my understanding, 0x8a shouldn't be interpreted as
upper letter of 0x9a for ISO-8859-1/UTF-8.

If you are using CP1252 (Windows-1252), it is understandable,
but, CP1252 is not supported yet in the Oniguruma library
(multibyte regex engine of mbstring).
http://en.wikipedia.org/wiki/Windows-1252





[2007-08-19 20:05:04] [EMAIL PROTECTED]

I'm using the bundled PCRE library. I don't remember what the version
is.



[2007-08-19 02:27:06] [EMAIL PROTECTED]

I got the same result as arysin,
(PHP_5_2CVS 20070819, PCRE 7.2 2007-06-19)

Šiltas, Xiltas
Šiltas, Xiltas

I think PCRE is also not working.

Jani, which version of PHP/PCRE you are using ?




[2007-08-17 13:46:13] [EMAIL PROTECTED]

Assigned to the maintainer of mbstring extension.



[2007-08-14 09:11:37] [EMAIL PROTECTED]

I get this as output:

Šiltas, Xiltas
Xiltas, Xiltas

So I don't think there's anything wrong with PCRE, just mbstring stuff.



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/42290

-- 
Edit this bug report at http://bugs.php.net/?id=42290&edit=1


#41147 [Asn->Ana]: mb_check_encoding fails to check invalid string

2007-09-04 Thread hirokawa
 ID:   41147
 Updated by:   [EMAIL PROTECTED]
 Reported By:  teracci2002 at yahoo dot co dot jp
-Status:   Assigned
+Status:   Analyzed
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  5.2.1
 Assigned To:  hirokawa
 New Comment:


> No one says 0x00,0xa1 is invalid character in ShiftJIS.
I didn't say that.

0x00+0xa1 is valid byte sequence in Shift_JIS sequence.
A character in Shift_JIS encoding is encoded in either single byte 
or double byte.
In this case, the byte stream is reconigzed as two character,
a null byte and a comma character in Katakana(0xa1) 
 
see: http://hp.vector.co.jp/authors/VA013241/misc/shiftjis.html




Previous Comments:


[2007-08-19 20:10:06] [EMAIL PROTECTED]

Someone disagrees, Rui.. :)



[2007-08-18 16:00:06] teracci2002 at yahoo dot co dot jp

Just read bug report again.

No one says 0x00,0xa1 is invalid character in ShiftJIS.



[2007-08-18 14:45:13] [EMAIL PROTECTED]

It is expected behavior because 
0x00,0xa1 is null byte + valid ShiftJIS character.
mb_check_encoding should be used to detect invalid 
or corrupted multibyte characters.




[2007-04-20 10:11:03] teracci2002 at yahoo dot co dot jp

If the data is NOT valid, FALSE should be returned, I guess.
But actually it returns TRUE.
Am I wrong or missing your point?



[2007-04-20 09:27:44] [EMAIL PROTECTED]

Please explain why do you think it should succeed if the data is
invalid.



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/41147

-- 
Edit this bug report at http://bugs.php.net/?id=41147&edit=1


#29955 [Asn->Csd]: Support for Turkish/iso-8859-9

2007-09-04 Thread hirokawa
 ID:   29955
 Updated by:   [EMAIL PROTECTED]
 Reported By:  jan at horde dot org
-Status:   Assigned
+Status:   Closed
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  5CVS, 4CVS (2004-09-02)
 Assigned To:  hirokawa
 New Comment:

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.




Previous Comments:


[2007-08-23 23:03:44] [EMAIL PROTECTED]

Feedback given.



[2007-08-23 16:33:44] jan at horde dot org

No. The conversion has to be done this way for iso-8859-9 always, not
only if the current locale is Turkish. Turkish is the only language that
uses this charset.



[2007-08-17 22:19:19] [EMAIL PROTECTED]

This change is already back ported to PHP 5.2.
In my understanding, it shouldn't always applied to ISO-8859-9,
because the conversion result is depends on the locale.
(correct ?)







[2007-01-05 14:33:51] jan at horde dot org

Oh, and by the way, this conversion should always happen for
iso-8859-9, not only if mbstring.language is set to Turkish, because
this is completely useless in real world applications.



[2007-01-05 14:31:12] jan at horde dot org

Any chance this is going to be backported to PHP 5.2? I guess mbstring
is going to be obsolete with the Unicode and ICU support in PHP 6.



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/29955

-- 
Edit this bug report at http://bugs.php.net/?id=29955&edit=1


#42502 [Asn->Ana]: GCC no longer implements

2007-09-04 Thread hirokawa
 ID:   42502
 Updated by:   [EMAIL PROTECTED]
 Reported By:  supportnew at byethost dot com
-Status:   Assigned
+Status:   Analyzed
 Bug Type: mbstring related
 Operating System: debian linux 4
 PHP Version:  5.2.4
 Assigned To:  hirokawa
 New Comment:

Please show me if HAVE_STDARG_PROTOTYPES and HAVE_STDARG_H are
 defined or not in your main/config.h

I think that HAVE_STDARG_PROTOTYPES isn't properly defined.
If it is not defined stdarg.h is not existing in you include path.



Previous Comments:


[2007-09-03 08:17:14] [EMAIL PROTECTED]

Assigned to the mbstring maintainer.



[2007-08-31 18:07:51] supportnew at byethost dot com

Description:

When compiling the stable 5.2.4 branch of php the compile process dies
at the same point.


I have tried this on 3 seperate servers (using the different versions
of GCC )

Reproduce code:
---
download stable, configure with

./configure  --prefix=/usr/phpapache2
--with-apxs2=/usr/local/apache2/bin/apxs --disable-cgi
--with-config-file-path=/etc/php4/apache --enable-inline-optimization
--enable-memory-limit --disable-debug --disable-rpath --disable-static
--with-layout=GNU --with-pear=/usr/share/php --enable-calendar
--enable-track-vars --enable-trans-sid --enable-bcmath --without-bz2
--disable-ctype --with-iconv --enable-exif --disable-ftp --with-gettext
--enable-mbstring --disable-sockets --disable-wddx --with-xsl
--with-expat-dir=/usr --disable-yp --with-zlib --without-pgsql
--without-openssl --with-zip=/usr --disable-dbx
--with-exec-dir=/usr/lib/php4/libexec --with-mcrypt --without-sybase-ct
--with-mysql=/usr --with-zlib-dir=/usr --with-gd=/usr/local/gd
--with-jpeg-dir=/usr --with-png-dir=/usr --with-xpm-dir=/usr
--with-ttf=shared,/usr --with-t1lib --with-freetype-dir=/usr
--enable-gd-native-ttf --with-sqlite --with-mysqli --with-xsl
--enable-ctype --with-pdo-mysql --without-pdo-sqlite --with-pspell

using 

gcc -v
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v
--enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --enable-nls
--program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu
--enable-libstdcxx-debug --enable-mpfr --with-tune=i686
--enable-checking=release i486-linux-gnu
Thread model: posix
gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)


config.log can be found here

http://byet.org/config.log

Expected result:

no errors.

Actual result:
--
In file included from
/root/php-5.2.4/ext/mbstring/oniguruma/regerror.c:37:
/usr/lib/gcc/i486-linux-gnu/4.1.2/include/varargs.h:4:2: error: #error
"GCC no longer implements ."
/usr/lib/gcc/i486-linux-gnu/4.1.2/include/varargs.h:5:2: error: #error
"Revise your code to use ."
/root/php-5.2.4/ext/mbstring/oniguruma/regerror.c: In function
'onig_error_code_to_str':
/root/php-5.2.4/ext/mbstring/oniguruma/regerror.c:196: error: expected
declaration specifiers before 'va_dcl'
/root/php-5.2.4/ext/mbstring/oniguruma/regerror.c:265: error: expected
'=', ',', ';', 'asm' or '__attribute__' before 'OnigUChar'
/root/php-5.2.4/ext/mbstring/oniguruma/regerror.c:271: error: expected
declaration specifiers before 'va_dcl'
/root/php-5.2.4/ext/mbstring/oniguruma/regerror.c:270: error:
declaration for parameter 'fmt' but no such parameter
/root/php-5.2.4/ext/mbstring/oniguruma/regerror.c:269: error:
declaration for parameter 'pat_end' but no such parameter
/root/php-5.2.4/ext/mbstring/oniguruma/regerror.c:268: error:
declaration for parameter 'pat' but no such parameter
/root/php-5.2.4/ext/mbstring/oniguruma/regerror.c:267: error:
declaration for parameter 'enc' but no such parameter
/root/php-5.2.4/ext/mbstring/oniguruma/regerror.c:266: error:
declaration for parameter 'bufsize' but no such parameter
/root/php-5.2.4/ext/mbstring/oniguruma/regerror.c:334: error: expected
'{' at end of input
make: *** [ext/mbstring/oniguruma/regerror.lo] Error 1






-- 
Edit this bug report at http://bugs.php.net/?id=42502&edit=1


#42252 [Ana->Bgs]: Windows compile failure with mbstring and zend_multibyte

2007-08-29 Thread hirokawa
 ID:   42252
 Updated by:   [EMAIL PROTECTED]
 Reported By:  okabe at zend dot co dot jp
-Status:   Analyzed
+Status:   Bogus
 Bug Type: Compile Failure
 Operating System: Windows 2003
 PHP Version:  4.4.7
 Assigned To:  hirokawa
 New Comment:

The compile option is incorrect for this author.
So, it is not a bug.



Previous Comments:


[2007-08-29 13:11:24] [EMAIL PROTECTED]

You should'nt define HAVE_STRCASECMP if you don't have
strcasecmp().
For Visual C++, you should use stricmp instead of strcasecmp().
You should check your compile option.





[2007-08-20 06:05:47] okabe at zend dot co dot jp

Hirokawa-san,
Thank you a lot!

The situation progressed by building according to your advice.
php4dllts left three following link errors.

---
Linking...
   Creating library ..\Release_TS/php4ts.lib and object
..\Release_TS/php4ts.exp 
mbfl_encoding.obj : error LNK2001: unresolved external symbol
"_strcasecmp"
mbfl_language.obj : error LNK2001: unresolved external symbol
"_strcasecmp"
..\Release_TS\php4ts.dll : fatal error LNK1120: 1 unresolved externals
Error executing link.exe.

php.exe - 3 error(s), 89 warning(s)
---

I am looking at the next file, is this help me?
php-x.x.x\ext\mbstring\libmbfl\config.h.vc6



[2007-08-17 23:05:58] [EMAIL PROTECTED]

Did you added COPYING as  build files ?
You should add *.c *.h only.




[2007-08-14 11:04:23] okabe at zend dot co dot jp

Thank you for your reply!

I tried to add '/D "MBFL_DLL_EXPORT"' to php4dllts.
The building is advance from before.

However link error is generated.

..\ext\mbstring\mbregex\COPYING.LIB : fatal error LNK1136: file is not
available or be destroied.
link.exe have execution error.


And, I disabled file to build from project as follows:
..\ext\mbstring\mbregex\COPYING.LIB

but a lot of link errors were generated.

Do you have any idea?



[2007-08-11 03:23:30] [EMAIL PROTECTED]

I think that MBFL_DLL_EXPORT is undefined on compile option.
Please try again with '/D "MBFL_DLL_EXPORT"'.





The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/42252

-- 
Edit this bug report at http://bugs.php.net/?id=42252&edit=1


#42252 [Ana]: Windows compile failure with mbstring and zend_multibyte

2007-08-29 Thread hirokawa
 ID:   42252
 Updated by:   [EMAIL PROTECTED]
 Reported By:  okabe at zend dot co dot jp
 Status:   Analyzed
 Bug Type: Compile Failure
 Operating System: Windows 2003
 PHP Version:  4.4.7
 Assigned To:  hirokawa
 New Comment:

You should'nt define HAVE_STRCASECMP if you don't have
strcasecmp().
For Visual C++, you should use stricmp instead of strcasecmp().
You should check your compile option.




Previous Comments:


[2007-08-20 06:05:47] okabe at zend dot co dot jp

Hirokawa-san,
Thank you a lot!

The situation progressed by building according to your advice.
php4dllts left three following link errors.

---
Linking...
   Creating library ..\Release_TS/php4ts.lib and object
..\Release_TS/php4ts.exp 
mbfl_encoding.obj : error LNK2001: unresolved external symbol
"_strcasecmp"
mbfl_language.obj : error LNK2001: unresolved external symbol
"_strcasecmp"
..\Release_TS\php4ts.dll : fatal error LNK1120: 1 unresolved externals
Error executing link.exe.

php.exe - 3 error(s), 89 warning(s)
---

I am looking at the next file, is this help me?
php-x.x.x\ext\mbstring\libmbfl\config.h.vc6



[2007-08-17 23:05:58] [EMAIL PROTECTED]

Did you added COPYING as  build files ?
You should add *.c *.h only.




[2007-08-14 11:04:23] okabe at zend dot co dot jp

Thank you for your reply!

I tried to add '/D "MBFL_DLL_EXPORT"' to php4dllts.
The building is advance from before.

However link error is generated.

..\ext\mbstring\mbregex\COPYING.LIB : fatal error LNK1136: file is not
available or be destroied.
link.exe have execution error.


And, I disabled file to build from project as follows:
..\ext\mbstring\mbregex\COPYING.LIB

but a lot of link errors were generated.

Do you have any idea?



[2007-08-11 03:23:30] [EMAIL PROTECTED]

I think that MBFL_DLL_EXPORT is undefined on compile option.
Please try again with '/D "MBFL_DLL_EXPORT"'.





[2007-08-09 03:11:13] okabe at zend dot co dot jp

Description:

I want to build php4.4.7 with mbstring and zend multibyte because to
use Japanese character set that is called "Shift_JIS".

I am able to build normal win32's php.exe (have no mbstring) using
php4ts.dsw by following the php manual.

Then, at first, I add lines to config.w32.h as follows...
---
#define ZEND_MULTIBYTE 1
#define HAVE_MBSTRING 1
#define HAVE_MBREGEX  1
#define HAVE_MBSTR_CN 1
#define HAVE_MBSTR_JA 1
#define HAVE_MBSTR_KR 1
#define HAVE_MBSTR_RU 1
#define HAVE_MBSTR_TW 1
#define MBSTRING_EXPORTS 1
#undef COMPILE_DL_MBSTRING
---

and I add line to zend_config.w32.h as follows...
---
#define ZEND_MULTIBYTE 1
---

next,

I added source files about under ext/mbstring to php4dllts.dsp.
(making mbstring folder in Function Modules and import those files and
folders)

next,

I built the project, and watching the include errors.
This work was repeated and the path of the include error was
corrected.

Finally, 

four errors C2099(initializer is not a constant.) and much warnings
C4013(undefined; assuming extern returning int) and C4273(inconsistent
dll linkage. dllexport assumed.) remained.
---
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(783) :
warning C4013:
:
:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(110) : warning
C4273: 'mbfl_buffer_converter_new' :
:
:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_convert.c(214) : error
C2099: 
:
:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_encoding.c(119) :
error C2099:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_encoding.c(121) :
error C2099:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_encoding.c(131) :
error C2099:
---

Then, what should I do?
I'm using VC6.

Actual result:
--
build message is follows...

---
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(783) :
warning C4013: '_mbschr' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(784) :
warning C4013: '_mbscspn' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(785) :
warning C4013: '_mbsnbcat' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(786) :
warning C4013: '_mbsnbcpy' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(787) :
warning C4013: '_mbspbrk' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(788) :

#42393 [Asn->Bgs]: mb_strtoupper is replacing one of the cyrillic symbols with wrong one.

2007-08-24 Thread hirokawa
 ID:   42393
 Updated by:   [EMAIL PROTECTED]
 Reported By:  ivan dot delchev at softconsultgroup dot com
-Status:   Assigned
+Status:   Bogus
 Bug Type: mbstring related
 Operating System: Windows XP
 PHP Version:  5.2.3
 Assigned To:  hirokawa
 New Comment:

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

You must specify specific character encoding because the conversion
table between lower/upper chars is depends on the encoding.

Please try,

mb_strtoupper($main_string,"UTF-8")
or set mbstring.internal_encoding = UTF-8 in your php.ini.





Previous Comments:


[2007-08-23 14:40:23] ivan dot delchev at softconsultgroup dot com

[mbstring]
; language for internal character representation.
;mbstring.language = Japanese

; internal/script encoding.
; Some encoding cannot work as internal encoding.
; (e.g. SJIS, BIG5, ISO-2022-*)
;mbstring.internal_encoding = EUC-JP

; http input encoding.
;mbstring.http_input = auto

; http output encoding. mb_output_handler must be
; registered as output buffer to function
;mbstring.http_output = SJIS

; enable automatic encoding translation according to
; mbstring.internal_encoding setting. Input chars are
; converted to internal encoding by setting this to On.
; Note: Do _not_ use automatic encoding translation for
;   portable libs/applications.
;mbstring.encoding_translation = Off

; automatic encoding detection order.
; auto means
;mbstring.detect_order = auto

; substitute_character used when character cannot be converted
; one from another
;mbstring.substitute_character = none;

; overload(replace) single byte functions by mbstring functions.
; mail(), ereg(), etc are overloaded by mb_send_mail(), mb_ereg(),
; etc. Possible values are 0,1,2,4 or combination of them.
; For example, 7 for overload everything.
; 0: No overload
; 1: Overload mail() function
; 2: Overload str*() functions
; 4: Overload ereg*() functions
;mbstring.func_overload = 0

; enable strict encoding detection.
;mbstring.strict_encoding = Off



[2007-08-23 14:30:54] [EMAIL PROTECTED]

Please show me mbstring.language setting in php.ini.




[2007-08-23 09:06:53] [EMAIL PROTECTED]

Assigned to mbstring maintainer.



[2007-08-23 08:06:01] ivan dot delchev at softconsultgroup dot com

Description:

mb_strtoupper is doind wrong transformation for "å" in cyrillic
alphabetic. Whe wrong transformation "å"->"í".

Also the function is not UPPER the string!

Reproduce code:
---
// Ensure that the web browser encoding is UTF8 and edit application is
UTF8 compatible!
$main_string = "Òîâà å òåñò. Îòíîâî Òåñò. Êàêâî áè ñå ïîëó÷èëî ñ òîçè
ÒÅÑÒ äà âèäèì!";
var_dump($main_string);
var_dump(mb_strtoupper($main_string));

Expected result:

Dumped result to be the same. And second string to be UPPER!

Actual result:
--
string(120) "Òîâà å òåñò. Îòíîâî Òåñò. Êàêâî áè ñå ïîëó÷èëî ñ òîçè ÒÅÑÒ
äà âèäèì!"
string(120) "Òîâà ï òïñò. Îòíîâî Òïñò. Êàêâî áè ñï ïîëó÷èëî ñ òîçè ÒÅÑÒ
äà âèäèì!"






-- 
Edit this bug report at http://bugs.php.net/?id=42393&edit=1


#42393 [Asn]: mb_strtoupper is replacing one of the cyrillic symbols with wrong one.

2007-08-23 Thread hirokawa
 ID:   42393
 Updated by:   [EMAIL PROTECTED]
 Reported By:  ivan dot delchev at softconsultgroup dot com
 Status:   Assigned
 Bug Type: mbstring related
 Operating System: Windows XP
 PHP Version:  5.2.3
 Assigned To:  hirokawa
 New Comment:

Please show me mbstring.language setting in php.ini.



Previous Comments:


[2007-08-23 09:06:53] [EMAIL PROTECTED]

Assigned to mbstring maintainer.



[2007-08-23 08:06:01] ivan dot delchev at softconsultgroup dot com

Description:

mb_strtoupper is doind wrong transformation for "å" in cyrillic
alphabetic. Whe wrong transformation "å"->"í".

Also the function is not UPPER the string!

Reproduce code:
---
// Ensure that the web browser encoding is UTF8 and edit application is
UTF8 compatible!
$main_string = "Òîâà å òåñò. Îòíîâî Òåñò. Êàêâî áè ñå ïîëó÷èëî ñ òîçè
ÒÅÑÒ äà âèäèì!";
var_dump($main_string);
var_dump(mb_strtoupper($main_string));

Expected result:

Dumped result to be the same. And second string to be UPPER!

Actual result:
--
string(120) "Òîâà å òåñò. Îòíîâî Òåñò. Êàêâî áè ñå ïîëó÷èëî ñ òîçè ÒÅÑÒ
äà âèäèì!"
string(120) "Òîâà ï òïñò. Îòíîâî Òïñò. Êàêâî áè ñï ïîëó÷èëî ñ òîçè ÒÅÑÒ
äà âèäèì!"






-- 
Edit this bug report at http://bugs.php.net/?id=42393&edit=1


#42290 [Asn]: mb_eregi_replace() is not case-insensitive with multibyte pattern

2007-08-21 Thread hirokawa
 ID:   42290
 Updated by:   [EMAIL PROTECTED]
 Reported By:  arysin at gmail dot com
 Status:   Assigned
 Bug Type: mbstring related
 Operating System: *
 PHP Version:  5.2CVS-2007-08-14
 Assigned To:  hirokawa
 New Comment:

arysin,

What kind of encoding you are using ?

For UTF-8 and ISO-8859-1, 0x8a is assigned to Line Tab.

  c.f.: http://en.wikipedia.org/wiki/ISO_8859-1
   http://en.wikipedia.org/wiki/UTF-8

In my understanding, 0x8a shouldn't be interpreted as
upper letter of 0x9a for ISO-8859-1/UTF-8.

If you are using CP1252 (Windows-1252), it is understandable,
but, CP1252 is not supported yet in the Oniguruma library
(multibyte regex engine of mbstring).
http://en.wikipedia.org/wiki/Windows-1252




Previous Comments:


[2007-08-19 20:05:04] [EMAIL PROTECTED]

I'm using the bundled PCRE library. I don't remember what the version
is.



[2007-08-19 02:27:06] [EMAIL PROTECTED]

I got the same result as arysin,
(PHP_5_2CVS 20070819, PCRE 7.2 2007-06-19)

Šiltas, Xiltas
Šiltas, Xiltas

I think PCRE is also not working.

Jani, which version of PHP/PCRE you are using ?




[2007-08-17 13:46:13] [EMAIL PROTECTED]

Assigned to the maintainer of mbstring extension.



[2007-08-14 09:11:37] [EMAIL PROTECTED]

I get this as output:

Šiltas, Xiltas
Xiltas, Xiltas

So I don't think there's anything wrong with PCRE, just mbstring stuff.



[2007-08-14 01:28:44] arysin at gmail dot com

Description:

The function mb_eregi_replace() and/or function mb_ereg_replace() with
'i' option is not caseinsensitive for multibyte characters.
The same problem occurs for preg_replace() with /i option.

This bug was reported before twice:
1) (#3) and was marked as Bogus stating it's not php bug. 
2) (#25953) was marked as Won't fix with note "Probably the issue will
be resolved in php5."

As one cannot add anything to closed bug I'd like to reopen this bug
here for the reason stated below:

This library is not linked dynamically, on contrary its source is
present in php source repository. So there's no way for php users to
have this problem fixed without php itself being fixed.
More than that, the version of oniguruma in php repository is pretty
old so at least importing the newer version of it would be a nice try to
fix this bug.


Reproduce code:
---



Expected result:

Xiltas, Xiltas
Xiltas, Xiltas


Actual result:
--
Šiltas, Xiltas
Šiltas, Xiltas






-- 
Edit this bug report at http://bugs.php.net/?id=42290&edit=1


#42290 [Asn->Ana]: mb_eregi_replace() is not case-insensitive with multibyte pattern

2007-08-18 Thread hirokawa
 ID:   42290
 Updated by:   [EMAIL PROTECTED]
 Reported By:  arysin at gmail dot com
-Status:   Assigned
+Status:   Analyzed
 Bug Type: mbstring related
 Operating System: *
 PHP Version:  5.2CVS-2007-08-14
 Assigned To:  hirokawa
 New Comment:

I got the same result as arysin,
(PHP_5_2CVS 20070819, PCRE 7.2 2007-06-19)

Šiltas, Xiltas
Šiltas, Xiltas

I think PCRE is also not working.

Jani, which version of PHP/PCRE you are using ?



Previous Comments:


[2007-08-17 13:46:13] [EMAIL PROTECTED]

Assigned to the maintainer of mbstring extension.



[2007-08-14 09:11:37] [EMAIL PROTECTED]

I get this as output:

Šiltas, Xiltas
Xiltas, Xiltas

So I don't think there's anything wrong with PCRE, just mbstring stuff.



[2007-08-14 01:28:44] arysin at gmail dot com

Description:

The function mb_eregi_replace() and/or function mb_ereg_replace() with
'i' option is not caseinsensitive for multibyte characters.
The same problem occurs for preg_replace() with /i option.

This bug was reported before twice:
1) (#3) and was marked as Bogus stating it's not php bug. 
2) (#25953) was marked as Won't fix with note "Probably the issue will
be resolved in php5."

As one cannot add anything to closed bug I'd like to reopen this bug
here for the reason stated below:

This library is not linked dynamically, on contrary its source is
present in php source repository. So there's no way for php users to
have this problem fixed without php itself being fixed.
More than that, the version of oniguruma in php repository is pretty
old so at least importing the newer version of it would be a nice try to
fix this bug.


Reproduce code:
---



Expected result:

Xiltas, Xiltas
Xiltas, Xiltas


Actual result:
--
Šiltas, Xiltas
Šiltas, Xiltas






-- 
Edit this bug report at http://bugs.php.net/?id=42290&edit=1


#42085 [Asn->Csd]: mb_strrpos() inconsistent with strrpos() with negative offset

2007-08-18 Thread hirokawa
 ID:   42085
 Updated by:   [EMAIL PROTECTED]
 Reported By:  arjen at react dot nl
-Status:   Assigned
+Status:   Closed
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  5.2.3
 Assigned To:  hirokawa
 New Comment:

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.




Previous Comments:


[2007-08-17 13:48:34] [EMAIL PROTECTED]

Assigned to the maintainer of mbstring extension.



[2007-07-24 10:53:10] arjen at react dot nl

Description:

>From the manual: 
Note: As of PHP 5.2.0 offset may be specified to begin searching an
arbitrary number of characters into the string. Negative values will
stop searching at an arbitrary point prior to the end of the string. 

Negative offsets are not working.

Reproduce code:
---
var_dump(mb_strrpos('abcd', 'd', -2));
var_dump(mb_strrpos('abcdd', 'd', -2));

Expected result:

boolean false
int 3

Actual result:
--
int 3
int 4





-- 
Edit this bug report at http://bugs.php.net/?id=42085&edit=1


#41147 [Asn->Bgs]: mb_check_encoding fails to check invalid string

2007-08-18 Thread hirokawa
 ID:   41147
 Updated by:   [EMAIL PROTECTED]
 Reported By:  teracci2002 at yahoo dot co dot jp
-Status:   Assigned
+Status:   Bogus
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  5.2.1
 Assigned To:  hirokawa
 New Comment:

It is expected behavior because 
0x00,0xa1 is null byte + valid ShiftJIS character.
mb_check_encoding should be used to detect invalid 
or corrupted multibyte characters.



Previous Comments:


[2007-04-20 10:11:03] teracci2002 at yahoo dot co dot jp

If the data is NOT valid, FALSE should be returned, I guess.
But actually it returns TRUE.
Am I wrong or missing your point?



[2007-04-20 09:27:44] [EMAIL PROTECTED]

Please explain why do you think it should succeed if the data is
invalid.



[2007-04-20 09:21:42] teracci2002 at yahoo dot co dot jp

Description:

mb_check_encoding returns true when specific invalid EUC-JP / Shift_JIS
/ UTF-8 char sequence supplied.


Reproduce code:
---
//(1)
var_dump(mb_check_encoding("\x00\xA1", "EUC-JP"));
//(2)
var_dump(mb_check_encoding("\x00\x81", "Shift_JIS"));
//(3)
var_dump(mb_check_encoding("\x00\xE3", "UTF-8"));

Expected result:

//(1)
bool(false)
//(2)
bool(false)
//(3)
bool(false)

Actual result:
--
//(1)
bool(true)
//(2)
bool(true)
//(3)
bool(true)






-- 
Edit this bug report at http://bugs.php.net/?id=41147&edit=1


#37724 [Ver->Csd]: mb_detect_encoding returns wrong result when text contains a trailing accent

2007-08-17 Thread hirokawa
 ID:   37724
 Updated by:   [EMAIL PROTECTED]
 Reported By:  oylbqelmhfbxzg at mailinator dot com
-Status:   Verified
+Status:   Closed
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  4.4.2
 Assigned To:  hirokawa
 New Comment:

Thank you for your bug report. This issue has already been fixed
in the latest released version of PHP, which you can download at 
http://www.php.net/downloads.php

If strict mode detection is used in PHP 5.2.3,
both results are ISO-8859-1.





Previous Comments:


[2007-08-17 22:32:39] [EMAIL PROTECTED]

It is also happend in PHP 5.2.



[2007-02-21 11:28:24] gabriel at unisolution dot de

I read that 

mb_detect_encoding($string,array('ISO-8859-1','UTF-8'));

always return ISO-8859-1.

Try this;

(php 4.4.0)



[2006-09-14 22:48:09] [EMAIL PROTECTED]

Could you show me mbstring part of php.ini ?
And please show me the simple script to verify your ploblem.
I executed this tiny script, and if forks fine. 
(with Fedora Linux 5, PHP 5.1.5)





[2006-07-23 12:05:59] [EMAIL PROTECTED]

Rui, does this exist also in PHP 5.x branches/HEAD?
If so and this is real bug, update the version. :)



[2006-06-07 08:54:27] oylbqelmhfbxzg at mailinator dot com

Description:

Since bug 36994 was closed..

Both
 $string = "testö"
in a utf-8 text file, and 
 $string = "testö"
in an iso-8859-1 file (converted using iconv) return "UTF-8" with
mb_detect_encoding, even when strict is on.






-- 
Edit this bug report at http://bugs.php.net/?id=37724&edit=1


#42252 [Opn->Ana]: Windows compile failure with mbstring and zend_multibyte

2007-08-17 Thread hirokawa
 ID:   42252
 Updated by:   [EMAIL PROTECTED]
 Reported By:  okabe at zend dot co dot jp
-Status:   Open
+Status:   Analyzed
 Bug Type: Compile Failure
 Operating System: Windows 2003
 PHP Version:  4.4.7
 Assigned To:  hirokawa
 New Comment:

Did you added COPYING as  build files ?
You should add *.c *.h only.



Previous Comments:


[2007-08-14 11:04:23] okabe at zend dot co dot jp

Thank you for your reply!

I tried to add '/D "MBFL_DLL_EXPORT"' to php4dllts.
The building is advance from before.

However link error is generated.

..\ext\mbstring\mbregex\COPYING.LIB : fatal error LNK1136: file is not
available or be destroied.
link.exe have execution error.


And, I disabled file to build from project as follows:
..\ext\mbstring\mbregex\COPYING.LIB

but a lot of link errors were generated.

Do you have any idea?



[2007-08-11 03:23:30] [EMAIL PROTECTED]

I think that MBFL_DLL_EXPORT is undefined on compile option.
Please try again with '/D "MBFL_DLL_EXPORT"'.





[2007-08-09 03:11:13] okabe at zend dot co dot jp

Description:

I want to build php4.4.7 with mbstring and zend multibyte because to
use Japanese character set that is called "Shift_JIS".

I am able to build normal win32's php.exe (have no mbstring) using
php4ts.dsw by following the php manual.

Then, at first, I add lines to config.w32.h as follows...
---
#define ZEND_MULTIBYTE 1
#define HAVE_MBSTRING 1
#define HAVE_MBREGEX  1
#define HAVE_MBSTR_CN 1
#define HAVE_MBSTR_JA 1
#define HAVE_MBSTR_KR 1
#define HAVE_MBSTR_RU 1
#define HAVE_MBSTR_TW 1
#define MBSTRING_EXPORTS 1
#undef COMPILE_DL_MBSTRING
---

and I add line to zend_config.w32.h as follows...
---
#define ZEND_MULTIBYTE 1
---

next,

I added source files about under ext/mbstring to php4dllts.dsp.
(making mbstring folder in Function Modules and import those files and
folders)

next,

I built the project, and watching the include errors.
This work was repeated and the path of the include error was
corrected.

Finally, 

four errors C2099(initializer is not a constant.) and much warnings
C4013(undefined; assuming extern returning int) and C4273(inconsistent
dll linkage. dllexport assumed.) remained.
---
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(783) :
warning C4013:
:
:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(110) : warning
C4273: 'mbfl_buffer_converter_new' :
:
:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_convert.c(214) : error
C2099: 
:
:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_encoding.c(119) :
error C2099:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_encoding.c(121) :
error C2099:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_encoding.c(131) :
error C2099:
---

Then, what should I do?
I'm using VC6.

Actual result:
--
build message is follows...

---
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(783) :
warning C4013: '_mbschr' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(784) :
warning C4013: '_mbscspn' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(785) :
warning C4013: '_mbsnbcat' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(786) :
warning C4013: '_mbsnbcpy' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(787) :
warning C4013: '_mbspbrk' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(788) :
warning C4013: '_mbsrchr' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(789) :
warning C4013: '_mbsspn' undefined; assuming extern returning int
 :
omit
 :
configure: php4dllts - Win32
Release_TS
Generating ext/standard/parsedate.c
compiling resource...
compiling...
 :
omit
 :
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(110) : warning
C4273: 'mbfl_buffer_converter_new' : inconsistent dll linkage. dllexport
assumed.
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(154) : warning
C4273: 'mbfl_buffer_converter_delete' : inconsistent dll linkage.
dllexport assumed.
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(169) : warning
C4273: 'mbfl_buffer_converter_reset' : inconsistent dll linkage.
dllexport assumed.
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(175) : warning
C4273: 'mbfl_buffer_converter_illegal_mode' : inconsistent dll linkage.
dllexp

#37724 [Asn->Ver]: mb_detect_encoding returns wrong result when text contains a trailing accent

2007-08-17 Thread hirokawa
 ID:   37724
 Updated by:   [EMAIL PROTECTED]
 Reported By:  oylbqelmhfbxzg at mailinator dot com
-Status:   Assigned
+Status:   Verified
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  4.4.2
 Assigned To:  hirokawa
 New Comment:

It is also happend in PHP 5.2.


Previous Comments:


[2007-02-21 11:28:24] gabriel at unisolution dot de

I read that 

mb_detect_encoding($string,array('ISO-8859-1','UTF-8'));

always return ISO-8859-1.

Try this;

(php 4.4.0)



[2006-09-14 22:48:09] [EMAIL PROTECTED]

Could you show me mbstring part of php.ini ?
And please show me the simple script to verify your ploblem.
I executed this tiny script, and if forks fine. 
(with Fedora Linux 5, PHP 5.1.5)





[2006-07-23 12:05:59] [EMAIL PROTECTED]

Rui, does this exist also in PHP 5.x branches/HEAD?
If so and this is real bug, update the version. :)



[2006-06-07 08:54:27] oylbqelmhfbxzg at mailinator dot com

Description:

Since bug 36994 was closed..

Both
 $string = "testö"
in a utf-8 text file, and 
 $string = "testö"
in an iso-8859-1 file (converted using iconv) return "UTF-8" with
mb_detect_encoding, even when strict is on.






-- 
Edit this bug report at http://bugs.php.net/?id=37724&edit=1


#29955 [Asn->Fbk]: Support for Turkish/iso-8859-9

2007-08-17 Thread hirokawa
 ID:   29955
 Updated by:   [EMAIL PROTECTED]
 Reported By:  jan at horde dot org
-Status:   Assigned
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  5CVS, 4CVS (2004-09-02)
 Assigned To:  hirokawa
 New Comment:

This change is already back ported to PHP 5.2.
In my understanding, it shouldn't always applied to ISO-8859-9,
because the conversion result is depends on the locale.
(correct ?)






Previous Comments:


[2007-01-05 14:33:51] jan at horde dot org

Oh, and by the way, this conversion should always happen for
iso-8859-9, not only if mbstring.language is set to Turkish, because
this is completely useless in real world applications.



[2007-01-05 14:31:12] jan at horde dot org

Any chance this is going to be backported to PHP 5.2? I guess mbstring
is going to be obsolete with the Unicode and ICU support in PHP 6.



[2005-12-23 14:56:27] [EMAIL PROTECTED]

Please try using this CVS snapshot:

  http://snaps.php.net/php6.0-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php6.0-win32-latest.zip

Turkish language support is added in CVS HEAD.
When mbstring.language = Turkish,
Turkish case filding will be performed in ISO-8859-9.
(upper:0x69 -> 0xdd, lower:0x49->0xfd)
Otherwise, normal case folding is performed.
(upper:0x69 -> 0x49, lower:0x49->0x69)




[2005-12-23 14:28:29] [EMAIL PROTECTED]

"man iso-8859-9" will tell you.

"i" maps to "0xdd"
and
"0xfd" maps to "I"

See also:
http://www.eki.ee/letter/chardata.cgi?lang=tr+Turkish&script=latin



[2005-12-23 14:24:06] jan at horde dot org

See http://www.gymel.com/charsets/ISO8859-9.html#U0069 and
http://www.gymel.com/charsets/ISO8859-9.html#U0049 under "Bemerkungen:"
(remarks).



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/29955

-- 
Edit this bug report at http://bugs.php.net/?id=29955&edit=1


#42252 [Asn->Fbk]: Windows compile failure with mbstring and zend_multibyte

2007-08-10 Thread hirokawa
 ID:   42252
 Updated by:   [EMAIL PROTECTED]
 Reported By:  okabe at zend dot co dot jp
-Status:   Assigned
+Status:   Feedback
 Bug Type: Compile Failure
 Operating System: Windows 2003
 PHP Version:  4.4.7
 Assigned To:  hirokawa
 New Comment:

I think that MBFL_DLL_EXPORT is undefined on compile option.
Please try again with '/D "MBFL_DLL_EXPORT"'.




Previous Comments:


[2007-08-09 03:11:13] okabe at zend dot co dot jp

Description:

I want to build php4.4.7 with mbstring and zend multibyte because to
use Japanese character set that is called "Shift_JIS".

I am able to build normal win32's php.exe (have no mbstring) using
php4ts.dsw by following the php manual.

Then, at first, I add lines to config.w32.h as follows...
---
#define ZEND_MULTIBYTE 1
#define HAVE_MBSTRING 1
#define HAVE_MBREGEX  1
#define HAVE_MBSTR_CN 1
#define HAVE_MBSTR_JA 1
#define HAVE_MBSTR_KR 1
#define HAVE_MBSTR_RU 1
#define HAVE_MBSTR_TW 1
#define MBSTRING_EXPORTS 1
#undef COMPILE_DL_MBSTRING
---

and I add line to zend_config.w32.h as follows...
---
#define ZEND_MULTIBYTE 1
---

next,

I added source files about under ext/mbstring to php4dllts.dsp.
(making mbstring folder in Function Modules and import those files and
folders)

next,

I built the project, and watching the include errors.
This work was repeated and the path of the include error was
corrected.

Finally, 

four errors C2099(initializer is not a constant.) and much warnings
C4013(undefined; assuming extern returning int) and C4273(inconsistent
dll linkage. dllexport assumed.) remained.
---
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(783) :
warning C4013:
:
:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(110) : warning
C4273: 'mbfl_buffer_converter_new' :
:
:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_convert.c(214) : error
C2099: 
:
:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_encoding.c(119) :
error C2099:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_encoding.c(121) :
error C2099:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_encoding.c(131) :
error C2099:
---

Then, what should I do?
I'm using VC6.

Actual result:
--
build message is follows...

---
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(783) :
warning C4013: '_mbschr' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(784) :
warning C4013: '_mbscspn' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(785) :
warning C4013: '_mbsnbcat' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(786) :
warning C4013: '_mbsnbcpy' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(787) :
warning C4013: '_mbspbrk' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(788) :
warning C4013: '_mbsrchr' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(789) :
warning C4013: '_mbsspn' undefined; assuming extern returning int
 :
omit
 :
configure: php4dllts - Win32
Release_TS
Generating ext/standard/parsedate.c
compiling resource...
compiling...
 :
omit
 :
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(110) : warning
C4273: 'mbfl_buffer_converter_new' : inconsistent dll linkage. dllexport
assumed.
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(154) : warning
C4273: 'mbfl_buffer_converter_delete' : inconsistent dll linkage.
dllexport assumed.
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(169) : warning
C4273: 'mbfl_buffer_converter_reset' : inconsistent dll linkage.
dllexport assumed.
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(175) : warning
C4273: 'mbfl_buffer_converter_illegal_mode' : inconsistent dll linkage.
dllexport assumed.
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(191) : warning
C4273: 'mbfl_buffer_converter_illegal_substchar' : inconsistent dll
linkage. dllexport assumed.
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(207) : warning
C4273: 'mbfl_buffer_converter_strncat' : inconsistent dll linkage.
dllexport assumed.
 :
omit
 :
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_convert.c(100) :
warning C4273: 'mbfl_convert_filter_list' : inconsistent dll linkage.
dllexport assumed.
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_convert.c(214) : error
C2099: initializer is not a constant.
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_convert.c(225) :
warning C4273: 'mbfl_convert_filter_new' : inconsiste

#42252 [Opn->Asn]: Windows compile failure with mbstring and zend_multibyte

2007-08-10 Thread hirokawa
 ID:   42252
 Updated by:   [EMAIL PROTECTED]
 Reported By:  okabe at zend dot co dot jp
-Status:   Open
+Status:   Assigned
 Bug Type: Compile Failure
 Operating System: Windows 2003
 PHP Version:  4.4.7
-Assigned To:  
+Assigned To:  hirokawa


Previous Comments:


[2007-08-09 03:11:13] okabe at zend dot co dot jp

Description:

I want to build php4.4.7 with mbstring and zend multibyte because to
use Japanese character set that is called "Shift_JIS".

I am able to build normal win32's php.exe (have no mbstring) using
php4ts.dsw by following the php manual.

Then, at first, I add lines to config.w32.h as follows...
---
#define ZEND_MULTIBYTE 1
#define HAVE_MBSTRING 1
#define HAVE_MBREGEX  1
#define HAVE_MBSTR_CN 1
#define HAVE_MBSTR_JA 1
#define HAVE_MBSTR_KR 1
#define HAVE_MBSTR_RU 1
#define HAVE_MBSTR_TW 1
#define MBSTRING_EXPORTS 1
#undef COMPILE_DL_MBSTRING
---

and I add line to zend_config.w32.h as follows...
---
#define ZEND_MULTIBYTE 1
---

next,

I added source files about under ext/mbstring to php4dllts.dsp.
(making mbstring folder in Function Modules and import those files and
folders)

next,

I built the project, and watching the include errors.
This work was repeated and the path of the include error was
corrected.

Finally, 

four errors C2099(initializer is not a constant.) and much warnings
C4013(undefined; assuming extern returning int) and C4273(inconsistent
dll linkage. dllexport assumed.) remained.
---
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(783) :
warning C4013:
:
:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(110) : warning
C4273: 'mbfl_buffer_converter_new' :
:
:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_convert.c(214) : error
C2099: 
:
:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_encoding.c(119) :
error C2099:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_encoding.c(121) :
error C2099:
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_encoding.c(131) :
error C2099:
---

Then, what should I do?
I'm using VC6.

Actual result:
--
build message is follows...

---
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(783) :
warning C4013: '_mbschr' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(784) :
warning C4013: '_mbscspn' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(785) :
warning C4013: '_mbsnbcat' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(786) :
warning C4013: '_mbsnbcpy' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(787) :
warning C4013: '_mbspbrk' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(788) :
warning C4013: '_mbsrchr' undefined; assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\INCLUDE\tchar.h(789) :
warning C4013: '_mbsspn' undefined; assuming extern returning int
 :
omit
 :
configure: php4dllts - Win32
Release_TS
Generating ext/standard/parsedate.c
compiling resource...
compiling...
 :
omit
 :
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(110) : warning
C4273: 'mbfl_buffer_converter_new' : inconsistent dll linkage. dllexport
assumed.
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(154) : warning
C4273: 'mbfl_buffer_converter_delete' : inconsistent dll linkage.
dllexport assumed.
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(169) : warning
C4273: 'mbfl_buffer_converter_reset' : inconsistent dll linkage.
dllexport assumed.
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(175) : warning
C4273: 'mbfl_buffer_converter_illegal_mode' : inconsistent dll linkage.
dllexport assumed.
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(191) : warning
C4273: 'mbfl_buffer_converter_illegal_substchar' : inconsistent dll
linkage. dllexport assumed.
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfilter.c(207) : warning
C4273: 'mbfl_buffer_converter_strncat' : inconsistent dll linkage.
dllexport assumed.
 :
omit
 :
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_convert.c(100) :
warning C4273: 'mbfl_convert_filter_list' : inconsistent dll linkage.
dllexport assumed.
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_convert.c(214) : error
C2099: initializer is not a constant.
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_convert.c(225) :
warning C4273: 'mbfl_convert_filter_new' : inconsistent dll linkage.
dllexport assumed.
 :
omit
 :
C:\work\php-4.4.7\ext\mbstring\libmbfl\mbfl\mbfl_encoding.c(119) :
error C2099: initi

#37103 [Asn->Fbk]: libmbfl headers not installed

2006-10-01 Thread hirokawa
 ID:   37103
 Updated by:   [EMAIL PROTECTED]
 Reported By:  Fedora at FamilleCollet dot com
-Status:   Assigned
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: Linux (Fedora)
 PHP Version:  5.1.6
 Assigned To:  hirokawa
 New Comment:

Please try using this CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php5.2-win32-latest.zip




Previous Comments:


[2006-09-26 22:23:00] [EMAIL PROTECTED]

Assigned to the maintainer.



[2006-09-19 16:26:40] Fedora at FamilleCollet dot com

For php-5.2.0

I don't understand permissions on ext/mbstring/libmbfl/mbfl/mbfl_defs.h
(rwxr-xr-x) while other headers are rw-r--r--

I think the '*' in the config.m4 file is the problem.
So i change my previous patch to simply remove it and the packaging is
complete with all the headers (and mailparse pecl extension build
fine).

--- ext/mbstring/config.m4.orig 2006-09-18 17:46:08.0 +0200
+++ ext/mbstring/config.m4  2006-09-18 17:47:08.0 +0200
@@ -302,7 +302,7 @@
   dnl libmbfl is required
   PHP_MBSTRING_SETUP_LIBMBFL
   PHP_MBSTRING_EXTENSION
-  PHP_INSTALL_HEADERS([ext/mbstring], [libmbfl/config.h
libmbfl/mbfl/eaw_table.h libmbfl/mbfl/mbfilter.h
libmbfl/mbfl/mbfilter_8bit.h libmbfl/mbfl/mbfilter_pass.h
libmbfl/mbfl/mbfilter_wchar.h libmbfl/mbfl/mbfl_allocators.h
libmbfl/mbfl/mbfl_consts.h libmbfl/mbfl/mbfl_convert.h
libmbfl/mbfl/mbfl_defs.h* libmbfl/mbfl/mbfl_encoding.h
libmbfl/mbfl/mbfl_filter_output.h libmbfl/mbfl/mbfl_ident.h
libmbfl/mbfl/mbfl_language.h libmbfl/mbfl/mbfl_memory_device.h
libmbfl/mbfl/mbfl_string.h ])
+  PHP_INSTALL_HEADERS([ext/mbstring], [libmbfl/config.h
libmbfl/mbfl/eaw_table.h libmbfl/mbfl/mbfilter.h
libmbfl/mbfl/mbfilter_8bit.h libmbfl/mbfl/mbfilter_pass.h
libmbfl/mbfl/mbfilter_wchar.h libmbfl/mbfl/mbfl_allocators.h
libmbfl/mbfl/mbfl_consts.h libmbfl/mbfl/mbfl_convert.h
libmbfl/mbfl/mbfl_defs.h  libmbfl/mbfl/mbfl_encoding.h
libmbfl/mbfl/mbfl_filter_output.h libmbfl/mbfl/mbfl_ident.h
libmbfl/mbfl/mbfl_language.h libmbfl/mbfl/mbfl_memory_device.h
libmbfl/mbfl/mbfl_string.h ])
 fi

 # vim600: sts=2 sw=2 et



[2006-09-18 15:55:09] Fedora at FamilleCollet dot com

Still present in php-5.1.6 (only half corrected)

Patch :
--- ext/mbstring/config.m4.orig 2006-07-24 18:07:44.0 +0200
+++ ext/mbstring/config.m4  2006-07-24 18:08:03.0 +0200
@@ -293,7 +293,7 @@
   dnl libmbfl is required
   PHP_MBSTRING_SETUP_LIBMBFL
   PHP_MBSTRING_EXTENSION
-  PHP_INSTALL_HEADERS([ext/mbstring], [libmbfl/ libmbfl/mbfl])
+  PHP_INSTALL_HEADERS([ext/mbstring], [libmbfl/ libmbfl/mbfl/])
 fi
 
 # vim600: sts=2 sw=2 et


Even present in php-5.2.0RC5-dev (missing only one file : mbfl_defs.h)

Patch :
--- ext/mbstring/config.m4.orig 2006-09-18 17:46:08.0 +0200
+++ ext/mbstring/config.m4  2006-09-18 17:47:08.0 +0200
@@ -302,7 +302,7 @@
   dnl libmbfl is required
   PHP_MBSTRING_SETUP_LIBMBFL
   PHP_MBSTRING_EXTENSION
-  PHP_INSTALL_HEADERS([ext/mbstring], [libmbfl/config.h
libmbfl/mbfl/eaw_table.h libmbfl/mbfl/mbfilter.h
libmbfl/mbfl/mbfilter_8bit.h libmbfl/mbfl/mbfilter_pass.h
libmbfl/mbfl/mbfilter_wchar.h libmbfl/mbfl/mbfl_allocators.h
libmbfl/mbfl/mbfl_consts.h libmbfl/mbfl/mbfl_convert.h
libmbfl/mbfl/mbfl_defs.h* libmbfl/mbfl/mbfl_encoding.h
libmbfl/mbfl/mbfl_filter_output.h libmbfl/mbfl/mbfl_ident.h
libmbfl/mbfl/mbfl_language.h libmbfl/mbfl/mbfl_memory_device.h
libmbfl/mbfl/mbfl_string.h ])
+  PHP_INSTALL_HEADERS([ext/mbstring], [libmbfl/ libmbfl/mbfl/])
 fi
 
 # vim600: sts=2 sw=2 et



[2006-04-17 22:14:35] [EMAIL PROTECTED]

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.





[2006-04-17 12:07:35] Fedora at FamilleCollet dot com

Here is the little patch i use witch solve the problem
Hope this help.

--- ext/mbstring/config.m4.orig 2006-04-17 12:41:13.0 +0200
+++ ext/mbstring/config.m4  2006-04-17 12:42:55.0 +0200
@@ -293,7 +293,7 @@
   dnl libmbfl is required
   PHP_MBSTRING_SETUP_LIBMBFL
   PHP_MBSTRING_EXTENSION
-  PHP_INSTALL_HEADERS([ext/mbstring], [libmbfl libmbfl/mbfl])
+  PHP_INSTALL_HEADERS([ext/mbstring], [libmbfl/ libmbfl/mbfl/])
 fi

 # vim600: sts=2 sw=2 et



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug r

#38892 [Opn->Csd]: Bug 38778 and 38452 (--enable-mbstring) also with gcc-4.1.0/Linux

2006-09-21 Thread hirokawa
 ID:   38892
 Updated by:   [EMAIL PROTECTED]
 Reported By:  Maylein at uni-hd dot de
-Status:   Open
+Status:   Closed
 Bug Type: Compile Failure
 Operating System: Linux 2.6.17
 PHP Version:  5.1.6
 Assigned To:  hirokawa
 New Comment:

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.




Previous Comments:


[2006-09-21 08:53:04] Maylein at uni-hd dot de

With the latest snapshop (php5.2-200609210830)
the --enable-mbstring parameter works.



[2006-09-20 23:10:53] [EMAIL PROTECTED]

Please try using this CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php5.2-win32-latest.zip





[2006-09-20 07:33:35] Maylein at uni-hd dot de

Description:

The bug described in #38778 and
#38452 (OSX, gcc-4.0.1) also occurs with Linux and
gcc-4.1.0.






-- 
Edit this bug report at http://bugs.php.net/?id=38892&edit=1


#38892 [Asn->Fbk]: Bug 38778 and 38452 (--enable-mbstring) also with gcc-4.1.0/Linux

2006-09-20 Thread hirokawa
 ID:   38892
 Updated by:   [EMAIL PROTECTED]
 Reported By:  Maylein at uni-hd dot de
-Status:   Assigned
+Status:   Feedback
 Bug Type: Compile Failure
 Operating System: Linux 2.6.17
 PHP Version:  5.1.6
 Assigned To:  hirokawa
 New Comment:

Please try using this CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php5.2-win32-latest.zip




Previous Comments:


[2006-09-20 07:33:35] Maylein at uni-hd dot de

Description:

The bug described in #38778 and
#38452 (OSX, gcc-4.0.1) also occurs with Linux and
gcc-4.1.0.






-- 
Edit this bug report at http://bugs.php.net/?id=38892&edit=1


#38452 [Ana->Bgs]: cvs build fails @ --enable-mbstring

2006-09-17 Thread hirokawa
 ID:   38452
 Updated by:   [EMAIL PROTECTED]
 Reported By:  openmacnews at gmail dot com
-Status:   Analyzed
+Status:   Bogus
 Bug Type: Compile Failure
 Operating System: OSX 10.4.7
 PHP Version:  5CVS-2006-08-14 (CVS)
 Assigned To:  hirokawa
 New Comment:

Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions.  Due to the volume
of reports we can not explain in detail here why your report is not
a bug.  The support channels will be able to provide an explanation
for you.

Thank you for your interest in PHP.

This problem is not PHP 5.x issue, but it is caused by
a bug of gcc 4.0.1 in Xcode.

The possible workarounds are,
 1.try newer/better version of XCode.
 2.use gcc-3.3 instead of gcc-4.0.1.
 3.disable compile optimization switch (-O2) 
 4.rewrite  
   #undef HAVE_STDARG_PROTOTYPES
   to
   #define HAVE_STDARG_PROTOTYPES  1
   in main/php_config.h after configure executed.




Previous Comments:


[2006-09-16 11:56:37] [EMAIL PROTECTED]

It seems to be caused by bugs of gcc 4.0.1 of XCode 2.4 
on Mac OSX PPC.

Can you try gcc-3.3 instead of gcc-4.0.1 ?

$ CC=/usr/bin/gcc-3.3
$ ./configure [options]
$ make




[2006-09-14 22:54:00] [EMAIL PROTECTED]

Since PHP 5.2, newer version of oniguruma (multibyte regex library for
Ruby) is bundled.
This problem maybe caused by this change.
I ask for the original author of oniguruma about 
this problem.





[2006-09-06 20:13:02] ralph at cs dot cf dot ac dot uk

See Bug #34977.



[2006-09-05 21:28:54] ralph at cs dot cf dot ac dot uk

This seems like a rerun of a bug I reported about a year 
ago, which has come back to rear its ugky head again

In 5.2RC3, MacOS 10.4.7, Xcode 2.4, PPC Mac G4 I get:

In file included from /usr/local/src/php-5.2.0RC3/ext/
mbstring/oniguruma/regerror.c:37:
/usr/lib/gcc/powerpc-apple-darwin8/4.0.1/include/varargs.h:
4:2: error: #error "GCC no longer implements ."
/usr/lib/gcc/powerpc-apple-darwin8/4.0.1/include/varargs.h:
5:2: error: #error "Revise your code to use ."
/usr/local/src/php-5.2.0RC3/ext/mbstring/oniguruma/
regerror.c: In function 'onig_error_code_to_str':
/usr/local/src/php-5.2.0RC3/ext/mbstring/oniguruma/
regerror.c:196: error: parse error before 'va_dcl'
/usr/local/src/php-5.2.0RC3/ext/mbstring/oniguruma/
regerror.c:214: error: parse error before 'OnigErrorInfo'
/usr/local/src/php-5.2.0RC3/ext/mbstring/oniguruma/
regerror.c: In function 'onig_snprintf_with_pattern':
/usr/local/src/php-5.2.0RC3/ext/mbstring/oniguruma/
regerror.c:271: error: parse error before 'va_dcl'
make: *** [ext/mbstring/oniguruma/regerror.lo] Error 1



[2006-09-02 19:54:52] seth at pricepages dot org

I just got this error with the latest CVS of PHP 5.2.x. I'm 
running XCode 2.4. Is this going to be fixed soon?



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/38452

-- 
Edit this bug report at http://bugs.php.net/?id=38452&edit=1


#38452 [Ana]: cvs build fails @ --enable-mbstring

2006-09-16 Thread hirokawa
 ID:   38452
 Updated by:   [EMAIL PROTECTED]
 Reported By:  openmacnews at gmail dot com
 Status:   Analyzed
 Bug Type: Compile Failure
 Operating System: OSX 10.4.7
 PHP Version:  5CVS-2006-08-14 (CVS)
 Assigned To:  hirokawa
 New Comment:

It seems to be caused by bugs of gcc 4.0.1 of XCode 2.4 
on Mac OSX PPC.

Can you try gcc-3.3 instead of gcc-4.0.1 ?

$ CC=/usr/bin/gcc-3.3
$ ./configure [options]
$ make



Previous Comments:


[2006-09-14 22:54:00] [EMAIL PROTECTED]

Since PHP 5.2, newer version of oniguruma (multibyte regex library for
Ruby) is bundled.
This problem maybe caused by this change.
I ask for the original author of oniguruma about 
this problem.





[2006-09-06 20:13:02] ralph at cs dot cf dot ac dot uk

See Bug #34977.



[2006-09-05 21:28:54] ralph at cs dot cf dot ac dot uk

This seems like a rerun of a bug I reported about a year 
ago, which has come back to rear its ugky head again

In 5.2RC3, MacOS 10.4.7, Xcode 2.4, PPC Mac G4 I get:

In file included from /usr/local/src/php-5.2.0RC3/ext/
mbstring/oniguruma/regerror.c:37:
/usr/lib/gcc/powerpc-apple-darwin8/4.0.1/include/varargs.h:
4:2: error: #error "GCC no longer implements ."
/usr/lib/gcc/powerpc-apple-darwin8/4.0.1/include/varargs.h:
5:2: error: #error "Revise your code to use ."
/usr/local/src/php-5.2.0RC3/ext/mbstring/oniguruma/
regerror.c: In function 'onig_error_code_to_str':
/usr/local/src/php-5.2.0RC3/ext/mbstring/oniguruma/
regerror.c:196: error: parse error before 'va_dcl'
/usr/local/src/php-5.2.0RC3/ext/mbstring/oniguruma/
regerror.c:214: error: parse error before 'OnigErrorInfo'
/usr/local/src/php-5.2.0RC3/ext/mbstring/oniguruma/
regerror.c: In function 'onig_snprintf_with_pattern':
/usr/local/src/php-5.2.0RC3/ext/mbstring/oniguruma/
regerror.c:271: error: parse error before 'va_dcl'
make: *** [ext/mbstring/oniguruma/regerror.lo] Error 1



[2006-09-02 19:54:52] seth at pricepages dot org

I just got this error with the latest CVS of PHP 5.2.x. I'm 
running XCode 2.4. Is this going to be fixed soon?



[2006-08-24 20:48:00] php-bug-38452 at ryandesign dot com

I successfully compiled PHP 5.1.4 on Mac OS X 10.4.6 PPC G4 with a
certain set of configure options on June 21, 2006, and today I wanted
to upgrade to 5.1.5 but got the aforementioned error. Went back and
tried to compile the exact same 5.1.4 that worked before, and got same
error. So nothing changed in PHP code, but something that changed in
the system. According to my installer receipts, since June 21, I have
installed Mac OS X 10.4.7, QuickTime 7.1.2, iTunes 6.0.5, Security
Update 2006-004 and Xcode 2.4.

Interestingly before all this trouble I was able to install PHP 5.1.5
through MacPorts with no errors, and it also enables mbstring. I don't
know why it works there and not when I compile manually (even using
their exact same configure options).



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/38452

-- 
Edit this bug report at http://bugs.php.net/?id=38452&edit=1


#38452 [Asn->Ana]: cvs build fails @ --enable-mbstring

2006-09-14 Thread hirokawa
 ID:   38452
 Updated by:   [EMAIL PROTECTED]
 Reported By:  openmacnews at gmail dot com
-Status:   Assigned
+Status:   Analyzed
 Bug Type: Compile Failure
 Operating System: OSX 10.4.7
 PHP Version:  5CVS-2006-08-14 (CVS)
 Assigned To:  hirokawa
 New Comment:

Since PHP 5.2, newer version of oniguruma (multibyte regex library for
Ruby) is bundled.
This problem maybe caused by this change.
I ask for the original author of oniguruma about 
this problem.




Previous Comments:


[2006-09-06 20:13:02] ralph at cs dot cf dot ac dot uk

See Bug #34977.



[2006-09-05 21:28:54] ralph at cs dot cf dot ac dot uk

This seems like a rerun of a bug I reported about a year 
ago, which has come back to rear its ugky head again

In 5.2RC3, MacOS 10.4.7, Xcode 2.4, PPC Mac G4 I get:

In file included from /usr/local/src/php-5.2.0RC3/ext/
mbstring/oniguruma/regerror.c:37:
/usr/lib/gcc/powerpc-apple-darwin8/4.0.1/include/varargs.h:
4:2: error: #error "GCC no longer implements ."
/usr/lib/gcc/powerpc-apple-darwin8/4.0.1/include/varargs.h:
5:2: error: #error "Revise your code to use ."
/usr/local/src/php-5.2.0RC3/ext/mbstring/oniguruma/
regerror.c: In function 'onig_error_code_to_str':
/usr/local/src/php-5.2.0RC3/ext/mbstring/oniguruma/
regerror.c:196: error: parse error before 'va_dcl'
/usr/local/src/php-5.2.0RC3/ext/mbstring/oniguruma/
regerror.c:214: error: parse error before 'OnigErrorInfo'
/usr/local/src/php-5.2.0RC3/ext/mbstring/oniguruma/
regerror.c: In function 'onig_snprintf_with_pattern':
/usr/local/src/php-5.2.0RC3/ext/mbstring/oniguruma/
regerror.c:271: error: parse error before 'va_dcl'
make: *** [ext/mbstring/oniguruma/regerror.lo] Error 1



[2006-09-02 19:54:52] seth at pricepages dot org

I just got this error with the latest CVS of PHP 5.2.x. I'm 
running XCode 2.4. Is this going to be fixed soon?



[2006-08-24 20:48:00] php-bug-38452 at ryandesign dot com

I successfully compiled PHP 5.1.4 on Mac OS X 10.4.6 PPC G4 with a
certain set of configure options on June 21, 2006, and today I wanted
to upgrade to 5.1.5 but got the aforementioned error. Went back and
tried to compile the exact same 5.1.4 that worked before, and got same
error. So nothing changed in PHP code, but something that changed in
the system. According to my installer receipts, since June 21, I have
installed Mac OS X 10.4.7, QuickTime 7.1.2, iTunes 6.0.5, Security
Update 2006-004 and Xcode 2.4.

Interestingly before all this trouble I was able to install PHP 5.1.5
through MacPorts with no errors, and it also enables mbstring. I don't
know why it works there and not when I compile manually (even using
their exact same configure options).



[2006-08-16 15:34:13] openmacnews at gmail dot com

>I bet openmacnews upgraded XCode between Aug 4th and 14th...

yup.

> I have however discovered something else: it works fine on 
Intel OS X with XCode 2.4 (MacBook). I'm seeing this problem 
on a Dual G4, also with XCode 2.4. Looking more like a gcc PPC issue to
me.

hmm ...

all my tests ARE on PPC -- no MacIntels.



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/38452

-- 
Edit this bug report at http://bugs.php.net/?id=38452&edit=1


#37724 [Asn->Ana]: mb_detect_encoding returns wrong result when text contains a trailing accent

2006-09-14 Thread hirokawa
 ID:   37724
 Updated by:   [EMAIL PROTECTED]
 Reported By:  oylbqelmhfbxzg at mailinator dot com
-Status:   Assigned
+Status:   Analyzed
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  4.4.2
 Assigned To:  hirokawa
 New Comment:

Could you show me mbstring part of php.ini ?
And please show me the simple script to verify your ploblem.
I executed this tiny script, and if forks fine. 
(with Fedora Linux 5, PHP 5.1.5)




Previous Comments:


[2006-07-23 12:05:59] [EMAIL PROTECTED]

Rui, does this exist also in PHP 5.x branches/HEAD?
If so and this is real bug, update the version. :)



[2006-06-07 08:54:27] oylbqelmhfbxzg at mailinator dot com

Description:

Since bug 36994 was closed..

Both
 $string = "testö"
in a utf-8 text file, and 
 $string = "testö"
in an iso-8859-1 file (converted using iconv) return "UTF-8" with
mb_detect_encoding, even when strict is on.






-- 
Edit this bug report at http://bugs.php.net/?id=37724&edit=1


#24106 [Bgs]: UTF8 to SJIS bug

2006-09-14 Thread hirokawa
 ID:   24106
 Updated by:   [EMAIL PROTECTED]
 Reported By:  richard at enfour dot co dot jp
 Status:   Bogus
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  4.3.2
 Assigned To:  hirokawa
 New Comment:

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php




Previous Comments:


[2003-06-30 07:45:11] [EMAIL PROTECTED]

I tested also on Linux using PHP 4.3.3RC1.


the output byte code is E748+90D5, as you are expecting.
I think it works fine.



[2003-06-28 09:16:53] [EMAIL PROTECTED]

I tested by a tiny script using PHP 4.3.3RC1 on Windows2000,

the output byte code is E748+90D5, as you are expecting.
I think it works fine.






[2003-06-10 02:00:57] richard at enfour dot co dot jp

It maybe elsewhere but I found a case where UTF-8 to 
SJIS mb_convert_encoding mashes a Japanese text string.

The string is the kanji for "souseki"
Unicode:
U8e2a+8de1

In SJIS it should be:
E748+90D5
but gets mashed.

EUC works...




-- 
Edit this bug report at http://bugs.php.net/?id=24106&edit=1


#36994 [Asn->Bgs]: mb_detect_encoding returns wrong result with trailing accent

2006-04-17 Thread hirokawa
 ID:   36994
 Updated by:   [EMAIL PROTECTED]
 Reported By:  ynynmzvqofeaz at mailinator dot com
-Status:   Assigned
+Status:   Bogus
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  4.4.2
 Assigned To:  hirokawa
 New Comment:

It is not a bug, it is a specification.
You should use 'strict' mode in mb_detect_encoding() 
if you need to return correct result.

mb_detect_encoding() treat the string as byte-stream.
{0x61,0x63,0x63,0x65,0x6e,0x74,0x75,0xe9} is a correct
UTF-8 byte stream.
In this case, 0xe9 is treat as the first byte of
multibyte character. 

{0x61,0x63,0x63,0x65,0x6e,0x74,0x75,0xe9,0x65} is wrong
UTF-8 byte stream because 0xe965 is invalid byte sequence in
UTF-8.

If you need to remove the incomplete multibyte character from
detection, please try to use 'strict' option like,
echo mb_detect_encoding($s1 , 'UTF-8, ISO-8859-1',true);



Previous Comments:


[2006-04-10 11:53:02] ynynmzvqofeaz at mailinator dot com

Ignore the last comment. Do this:

Create two files with the following content, and name them test_iso1
and test_utf8:


Make sure the encoding is correct:
$ file test_iso1
should return iso-8859-1
$ file test_utf8
should return utf-8

If they do not return the correct encoding, use iconv to convert them,
e.g.
$ iconv -f utf-8 -t iso-8859-1 test_iso1 >test_iso1.fixed
or
$ iconv -f iso-8859-1 -t utf-8 test_utf8 >test_utf8.fixed

Now run each script. The test_iso1 script should return a type of iso1,
the test_utf8 script should return a type of utf8.

Workaround: append an extra character to the end of the string, and
then remove it(!)



[2006-04-10 07:02:09] ynynmzvqofeaz1 at mailinator dot com

*sigh*



use iconv -f utf-8 -t iso-8859-1 INFILE > OUTFILE
if "file OUTFILE" says utf-8.



[2006-04-06 11:49:43] ynynmzvqofeaz at mailinator dot com

Description:

mb_detect_encoding returns wrong result when text contains a trailing
accent.
See http://www.php.net/manual/en/function.mb-detect-encoding.php#55228






-- 
Edit this bug report at http://bugs.php.net/?id=36994&edit=1


#36140 [Ana->Fbk]: mb_encode_mimeheader not working properly

2006-04-17 Thread hirokawa
 ID:   36140
 Updated by:   [EMAIL PROTECTED]
 Reported By:  sugan_b at yahoo dot co dot in
-Status:   Analyzed
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: FC3
 PHP Version:  5.1.2
 Assigned To:  hirokawa
 New Comment:

Please test this code.
The string of first argument should be encoded in the internal encoding
(EUC-JP in this case.) .
You should not convert the string into ISO-2022-JP
using mb_convert_encoding()

php.ini:
 mbstring.language = Japanese
 mbstring.internal_encoding = EUC-JP

sample.php:




Previous Comments:


[2006-04-10 13:07:09] [EMAIL PROTECTED]

Rui, this might be good candidate for fixing. :)



[2006-02-17 11:15:42] sugan_b at yahoo dot co dot in

Sorry for the late response.
This is my understanding. Please correct me if i am wrong.
The mb_encode_mimeheader function is doing base64 encoding process in
mbfilter.c. base64 encoding encodes from the front by each 3bytes of
the object character and if the
length of character has no more 3bytes, it converts the rest 1 or 2
bytes into "=", value "0x3d". mb_encode_mimeheader is actually doing
this process but According to RFC2047, A long
word exceeds 75bytes should be expressed by multiple lines. 
mb_encode_mimeheader does not care about this line separating action
and thus the bit value of encoded character is misencoded and sent to
mail receiver incorrectly.

I think that is the reason for  the subject value to get garbled.



[2006-02-06 15:09:18] [EMAIL PROTECTED]

Could you please check it by the simpler script ?

php.ini:
 mbstring.language = Japanese
 mbstring.internal_encoding = EUC-JP

sample.php:

It works well or  not ?
If the result is a correct MIME encoded string, 
mb_encode_mimeheader() work fine.




[2006-01-25 10:58:04] [EMAIL PROTECTED]

Rui, can you check this out please?



[2006-01-24 12:18:54] sugan_b at yahoo dot co dot in

The actual result obtained is:
【問合】
メールのSubjectが8;z2=$1$7$F$7$^$$$^$9



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/36140

-- 
Edit this bug report at http://bugs.php.net/?id=36140&edit=1


#36140 [Asn->Ana]: mb_encode_mimeheader not working properly

2006-04-17 Thread hirokawa
 ID:   36140
 Updated by:   [EMAIL PROTECTED]
 Reported By:  sugan_b at yahoo dot co dot in
-Status:   Assigned
+Status:   Analyzed
 Bug Type: mbstring related
 Operating System: FC3
 PHP Version:  5.1.2
 Assigned To:  hirokawa


Previous Comments:


[2006-04-10 13:07:09] [EMAIL PROTECTED]

Rui, this might be good candidate for fixing. :)



[2006-02-17 11:15:42] sugan_b at yahoo dot co dot in

Sorry for the late response.
This is my understanding. Please correct me if i am wrong.
The mb_encode_mimeheader function is doing base64 encoding process in
mbfilter.c. base64 encoding encodes from the front by each 3bytes of
the object character and if the
length of character has no more 3bytes, it converts the rest 1 or 2
bytes into "=", value "0x3d". mb_encode_mimeheader is actually doing
this process but According to RFC2047, A long
word exceeds 75bytes should be expressed by multiple lines. 
mb_encode_mimeheader does not care about this line separating action
and thus the bit value of encoded character is misencoded and sent to
mail receiver incorrectly.

I think that is the reason for  the subject value to get garbled.



[2006-02-06 15:09:18] [EMAIL PROTECTED]

Could you please check it by the simpler script ?

php.ini:
 mbstring.language = Japanese
 mbstring.internal_encoding = EUC-JP

sample.php:

It works well or  not ?
If the result is a correct MIME encoded string, 
mb_encode_mimeheader() work fine.




[2006-01-25 10:58:04] [EMAIL PROTECTED]

Rui, can you check this out please?



[2006-01-24 12:18:54] sugan_b at yahoo dot co dot in

The actual result obtained is:
【問合】
メールのSubjectが8;z2=$1$7$F$7$^$$$^$9



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/36140

-- 
Edit this bug report at http://bugs.php.net/?id=36140&edit=1


#36489 [Asn->Ana]: Mimeheaders not properly encoded/decoded

2006-03-12 Thread hirokawa
 ID:   36489
 Updated by:   [EMAIL PROTECTED]
 Reported By:  saeven at saeven dot net
-Status:   Assigned
+Status:   Analyzed
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  5.1.2
 Assigned To:  hirokawa
 New Comment:

I executed your code using PHP 5.1.2/Fedora Core3,
and I get the expected result.

 test_тест

My php.ini setting is,
 mbstring.detect_order => auto
 mbstring.encoding_translation => Off
 mbstring.func_overload => 0
 mbstring.http_input => auto
 mbstring.http_output => pass
 mbstring.internal_encoding => ISO-8859-1
 mbstring.language => neutral
 mbstring.script_encoding => no value
 mbstring.strict_detection => Off
 mbstring.substitute_character => no value

PLease show me your php.ini settings which is related to mbstring.




Previous Comments:


[2006-03-01 08:52:22] [EMAIL PROTECTED]

Assigned to the maintainer.



[2006-02-22 21:05:40] saeven at saeven dot net

..you'll have to imagine that the htmlencoded entities are utf8
characters.



[2006-02-22 21:03:43] saeven at saeven dot net

Your system htmlencoded the utf8 content. Let me try again:

Code:
---
$encode = mb_encode_mimeheader( "test_тест",
"utf-8", 'Q' );
echo mb_decode_mimeheader( $encode );


Expected Result:
--
test_тест;


Actual Result:

test_



[2006-02-22 21:00:24] saeven at saeven dot net

Description:

mbstring is improperly encoding/decoding mimeheaders.

Reproduce code:
---
$encode = mb_encode_mimeheader( "test_тест",
"utf-8", 'Q' );
echo mb_decode_mimeheader( $encode );

Expected result:

test_тест

Actual result:
--
test_





-- 
Edit this bug report at http://bugs.php.net/?id=36489&edit=1


#34119 [NoF->Csd]: mb_ereg chokes on \x80-\xF7

2006-02-16 Thread hirokawa
 ID:   34119
 Updated by:   [EMAIL PROTECTED]
 Reported By:  ondrej at sury dot org
-Status:   No Feedback
+Status:   Closed
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  4.4.0
 Assigned To:  hirokawa
 New Comment:

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.




Previous Comments:


[2006-01-01 01:00:06] php-bugs at lists dot php dot net

No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".



[2005-12-24 06:20:10] [EMAIL PROTECTED]

Please try using this CVS snapshot:

  http://snaps.php.net/php4-STABLE-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php4-win32-STABLE-latest.zip

It works fine for me.
Please check it by CVS snapshot of PHP 4.4.

Code:


Result in PHP 4.4.2RC2-dev (Linux FedoraCore4):
 The username contains an illegal character.




[2005-12-24 02:32:29] [EMAIL PROTECTED]

Rui, check this out please.



[2005-08-13 20:29:39] [EMAIL PROTECTED]

Moriyoshi, I guess you need to backport something..?



[2005-08-13 13:12:22] ondrej at sury dot org

Description:

mb_ereg prints invalid regular expression error on certain characters.

This is fixed in php 5.0.4.

More information could be found at:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=278044

Reproduce code:
---
$name = "user/1/viewuser/1/edit";
if (mb_ereg("[^\x80-\xF7 [:alnum:[EMAIL PROTECTED]", $name)) print('The username
contains an illegal character.');

Expected result:

The username contains an illegal character.

Actual result:
--
Warning: mb_ereg(): mbregex compile err: invalid regular expression in
/var/www/mb_ereg.php on line 4






-- 
Edit this bug report at http://bugs.php.net/?id=34119&edit=1


#32062 [NoF->Bgs]: mbstring fails to match encoding with some locale settings

2006-02-16 Thread hirokawa
 ID:   32062
 Updated by:   [EMAIL PROTECTED]
 Reported By:  [EMAIL PROTECTED]
-Status:   No Feedback
+Status:   Bogus
 Bug Type: mbstring related
 Operating System: *
 PHP Version:  5CVS, 4CVS (2005-02-22)
 Assigned To:  hirokawa
 New Comment:

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php




Previous Comments:


[2005-12-31 01:00:04] php-bugs at lists dot php dot net

No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".



[2005-12-23 15:43:27] [EMAIL PROTECTED]

mbstring (libmbfl) uses strcasecmp() which depends on the locale.
If the specified locale is not supported in the system,
encoding match fails.
It is not the problem of mbstring, but it is the problem of 
system setting.






[2005-12-21 23:24:01] [EMAIL PROTECTED]

Rui, check this too if you don't mind. :)



[2005-02-22 16:17:51] [EMAIL PROTECTED]

Right, there were typos. The reproduce code should've 
been





[2005-02-22 15:25:03] [EMAIL PROTECTED]

tr_TR == Turkish, and ISO-8859-1 is not a valid character set of that
locale, no?



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/32062

-- 
Edit this bug report at http://bugs.php.net/?id=32062&edit=1


#19909 [NoF->Bgs]: No PCRE support for mbstring

2006-02-16 Thread hirokawa
 ID:   19909
 Updated by:   [EMAIL PROTECTED]
 Reported By:  paul at honeylocust dot com
-Status:   No Feedback
+Status:   Bogus
 Bug Type: mbstring related
 Operating System: any
 PHP Version:  4.3.0-pre1
 New Comment:

Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions.  Due to the volume
of reports we can not explain in detail here why your report is not
a bug.  The support channels will be able to provide an explanation
for you.

Thank you for your interest in PHP.




Previous Comments:


[2002-11-10 18:20:41] [EMAIL PROTECTED]

No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Open". Thank you.





[2002-10-14 16:22:40] [EMAIL PROTECTED]

Since when has PCRE been a multibyte string encoding?
Can you be more specific about what you need?

PHP ships with UTF-8 enabled PCRE by default (if you
use the bundled PCRE), and has done since before 4.2
for unix (4.2 and later for windows).

mbstring != PCRE; their functions are completely different.



[2002-10-14 14:48:41] paul at honeylocust dot com

Mbstring doesn't provide support for PCRE.  This is unfortunate since
there is (experimental) UTF-8 support in PCRE which is as simple to
turn on as flipping a compile flag and specifying an option to
pcre_compile()




-- 
Edit this bug report at http://bugs.php.net/?id=19909&edit=1


#36140 [Ana->Fbk]: mb_encode_mimeheader not working properly

2006-02-10 Thread hirokawa
 ID:   36140
 Updated by:   [EMAIL PROTECTED]
 Reported By:  sugan_b at yahoo dot co dot in
-Status:   Analyzed
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: FC3
 PHP Version:  5.1.2
 Assigned To:  hirokawa
 New Comment:

Not enough information was provided for us to be able
to handle this bug. Please re-read the instructions at
http://bugs.php.net/how-to-report.php

If you can provide more information, feel free to add it
to this bug and change the status back to "Open".

Thank you for your interest in PHP.





Previous Comments:


[2006-02-06 15:09:18] [EMAIL PROTECTED]

Could you please check it by the simpler script ?

php.ini:
 mbstring.language = Japanese
 mbstring.internal_encoding = EUC-JP

sample.php:

It works well or  not ?
If the result is a correct MIME encoded string, 
mb_encode_mimeheader() work fine.




[2006-01-25 10:58:04] [EMAIL PROTECTED]

Rui, can you check this out please?



[2006-01-24 12:18:54] sugan_b at yahoo dot co dot in

The actual result obtained is:
【問合】
メールのSubjectが8;z2=$1$7$F$7$^$$$^$9



[2006-01-24 12:15:12] sugan_b at yahoo dot co dot in

Description:

 A small part of my application(using only PHP 5.1.2 and Apache 2.2.0)
incorporates mailing functionality which uses
"mb_encode_mimeheader()".I am using ISO-2202-JP charset but Multibyte
characters written in ISO-2202-JP charset code are not sent to the 
receiver correctly.



Reproduce code:
---
index.php





test   
To¡§
Subject¡§ 
   



send.php:









Mail Sending has Completed.




Actual result:
--
Actual Result:
The value of Subject field :
 問合】
メールのSubjectが8;z2=$1$7$F$7$^$$$^$9


I have configured php using --enable-mbstring=all option.





-- 
Edit this bug report at http://bugs.php.net/?id=36140&edit=1


#36140 [Asn->Ana]: mb_encode_mimeheader not working properly

2006-02-06 Thread hirokawa
 ID:   36140
 Updated by:   [EMAIL PROTECTED]
 Reported By:  sugan_b at yahoo dot co dot in
-Status:   Assigned
+Status:   Analyzed
 Bug Type: mbstring related
 Operating System: FC3
 PHP Version:  5.1.2
 Assigned To:  hirokawa
 New Comment:

Could you please check it by the simpler script ?

php.ini:
 mbstring.language = Japanese
 mbstring.internal_encoding = EUC-JP

sample.php:

It works well or  not ?
If the result is a correct MIME encoded string, 
mb_encode_mimeheader() work fine.



Previous Comments:


[2006-01-25 10:58:04] [EMAIL PROTECTED]

Rui, can you check this out please?



[2006-01-24 12:18:54] sugan_b at yahoo dot co dot in

The actual result obtained is:
【問合】
メールのSubjectが8;z2=$1$7$F$7$^$$$^$9



[2006-01-24 12:15:12] sugan_b at yahoo dot co dot in

Description:

 A small part of my application(using only PHP 5.1.2 and Apache 2.2.0)
incorporates mailing functionality which uses
"mb_encode_mimeheader()".I am using ISO-2202-JP charset but Multibyte
characters written in ISO-2202-JP charset code are not sent to the 
receiver correctly.



Reproduce code:
---
index.php





test   
To¡§
Subject¡§ 
   



send.php:









Mail Sending has Completed.




Actual result:
--
Actual Result:
The value of Subject field :
 問合】
メールのSubjectが8;z2=$1$7$F$7$^$$$^$9


I have configured php using --enable-mbstring=all option.





-- 
Edit this bug report at http://bugs.php.net/?id=36140&edit=1


#35711 [Asn->Csd]: [PATCH] ISO-8859 charset not correctly detected

2005-12-25 Thread hirokawa
 ID:   35711
 Updated by:   [EMAIL PROTECTED]
 Reported By:  matteo at beccati dot com
-Status:   Assigned
+Status:   Closed
 Bug Type: mbstring related
 Operating System: Debian GNU/Linux
 PHP Version:  5.1CVS-2005-12-24 (snap)
 Assigned To:  hirokawa
 New Comment:

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.

mbstring.strict_detection is introduced to specify the strict mode
encoding detection.



Previous Comments:


[2005-12-25 10:40:42] [EMAIL PROTECTED]

O.K., 
I will commit your patch with minor modification.




[2005-12-24 14:38:32] [EMAIL PROTECTED]

Rui, could you take a look at this once again?



[2005-12-24 13:59:10] matteo at beccati dot com

I've made a patch which adds an mbstring.strict_detection php.ini flag
that specifies the default behaviour (defaults to off). I just started
taking a look to PHP internals so I could have made mistakes; make test
passes the mbstring related checks, I'll do more tests later.

http://beccati.com/download/mbstring-patch-20051224.txt



[2005-12-24 12:30:08] matteo at beccati dot com

These are great news and I'm really thankful for your help. Now
mb_detect_encoding is correctly working when the strict flag is set,
but...

- There's no way to set the strict flag in mb_convert_encoding; however
one could use mb_detect_encoding with the strict flag as source
charset.

- There's no way to set the strict flag for http_input translation,
which indeed would be much more useful (that's how I found the problem
described here).



[2005-12-24 02:23:06] [EMAIL PROTECTED]

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.

The character-end detection was introduced in the strict mode
(mb_detect_encoding ($s,$list,TRUE)).
Please try the strict mode.







The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/35711

-- 
Edit this bug report at http://bugs.php.net/?id=35711&edit=1


#35711 [Asn->Ana]: [PATCH] ISO-8859 charset not correctly detected

2005-12-25 Thread hirokawa
 ID:   35711
 Updated by:   [EMAIL PROTECTED]
 Reported By:  matteo at beccati dot com
-Status:   Assigned
+Status:   Analyzed
 Bug Type: mbstring related
 Operating System: Debian GNU/Linux
 PHP Version:  5.1CVS-2005-12-24 (snap)
 Assigned To:  hirokawa
 New Comment:

O.K., 
I will commit your patch with minor modification.



Previous Comments:


[2005-12-24 14:38:32] [EMAIL PROTECTED]

Rui, could you take a look at this once again?



[2005-12-24 13:59:10] matteo at beccati dot com

I've made a patch which adds an mbstring.strict_detection php.ini flag
that specifies the default behaviour (defaults to off). I just started
taking a look to PHP internals so I could have made mistakes; make test
passes the mbstring related checks, I'll do more tests later.

http://beccati.com/download/mbstring-patch-20051224.txt



[2005-12-24 12:30:08] matteo at beccati dot com

These are great news and I'm really thankful for your help. Now
mb_detect_encoding is correctly working when the strict flag is set,
but...

- There's no way to set the strict flag in mb_convert_encoding; however
one could use mb_detect_encoding with the strict flag as source
charset.

- There's no way to set the strict flag for http_input translation,
which indeed would be much more useful (that's how I found the problem
described here).



[2005-12-24 02:23:06] [EMAIL PROTECTED]

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.

The character-end detection was introduced in the strict mode
(mb_detect_encoding ($s,$list,TRUE)).
Please try the strict mode.







[2005-12-24 01:03:21] [EMAIL PROTECTED]

Have you ever tried the strict mode (default:FALSE) ?

string mb_detect_encoding ( string str [, mixed encoding_list [, bool
strict]] )




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/35711

-- 
Edit this bug report at http://bugs.php.net/?id=35711&edit=1


#34119 [Asn->Fbk]: mb_ereg chokes on \x80-\xF7

2005-12-23 Thread hirokawa
 ID:   34119
 Updated by:   [EMAIL PROTECTED]
 Reported By:  ondrej at sury dot org
-Status:   Assigned
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  4.4.0
 Assigned To:  hirokawa
 New Comment:

Please try using this CVS snapshot:

  http://snaps.php.net/php4-STABLE-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php4-win32-STABLE-latest.zip

It works fine for me.
Please check it by CVS snapshot of PHP 4.4.

Code:


Result in PHP 4.4.2RC2-dev (Linux FedoraCore4):
 The username contains an illegal character.



Previous Comments:


[2005-12-24 02:32:29] [EMAIL PROTECTED]

Rui, check this out please.



[2005-08-13 20:29:39] [EMAIL PROTECTED]

Moriyoshi, I guess you need to backport something..?



[2005-08-13 13:12:22] ondrej at sury dot org

Description:

mb_ereg prints invalid regular expression error on certain characters.

This is fixed in php 5.0.4.

More information could be found at:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=278044

Reproduce code:
---
$name = "user/1/viewuser/1/edit";
if (mb_ereg("[^\x80-\xF7 [:alnum:[EMAIL PROTECTED]", $name)) print('The username
contains an illegal character.');

Expected result:

The username contains an illegal character.

Actual result:
--
Warning: mb_ereg(): mbregex compile err: invalid regular expression in
/var/www/mb_ereg.php on line 4






-- 
Edit this bug report at http://bugs.php.net/?id=34119&edit=1


#35711 [Fbk->Csd]: [PATCH] ISO-8859 charset not correctly detected

2005-12-23 Thread hirokawa
 ID:   35711
 Updated by:   [EMAIL PROTECTED]
 Reported By:  matteo at beccati dot com
-Status:   Feedback
+Status:   Closed
 Bug Type: mbstring related
 Operating System: Debian GNU/Linux
 PHP Version:  5.1.1
 Assigned To:  hirokawa
 New Comment:

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.

The character-end detection was introduced in the strict mode
(mb_detect_encoding ($s,$list,TRUE)).
Please try the strict mode.






Previous Comments:


[2005-12-24 01:03:21] [EMAIL PROTECTED]

Have you ever tried the strict mode (default:FALSE) ?

string mb_detect_encoding ( string str [, mixed encoding_list [, bool
strict]] )




[2005-12-20 17:10:56] matteo at beccati dot com

Of course, I agree that 0xe8 is a valid if taken as part of a multibyte
character, but I don't think it could be considered valid it the next
bytes are missing (because the string ends prematurely). The iconv
extension raises notices when it finds illegal or incomplete multibyte
characters, I don't see why mbstring should accept as a valid UTF-8 a
string which indeed isn't.

The same should apply to other multibyte encodings.



[2005-12-20 15:44:31] [EMAIL PROTECTED]

Please note that encoding detection is not always perfect.
Especially, when the string is too short, the wrong detection might be
caused.
In your case, it is not a bug, but it is the specification.
UTF-8 is a variable length multibyte encoding format,
the length of a character in UTF-8 is from one to six.
Please look at ext/mbstring/libmbfl/filter/mbfilter_utf8.c:about 249L.
0xe8 is a valid byte sequence as the 1st character of 3 byte code.
We cannot detect 0xe8 is ISO-8859-1 or UTF-8,
because this byte is valid in both encodings.
In this case, the response will be choose 
from the order defined by mb_detect_order().
I suggest to use the sufficient length of string
for the reliable encoding detection.













[2005-12-19 09:03:36] [EMAIL PROTECTED]

Rui, can you check this out please?



[2005-12-19 09:00:50] matteo at beccati dot com

Oops, I just realized that I forgot the -u flag :)

Here is the downlaodable patch:

http://beccati.com/download/mbstring-patch-20051219.txt



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/35711

-- 
Edit this bug report at http://bugs.php.net/?id=35711&edit=1


#35711 [WFx->Fbk]: [PATCH] ISO-8859 charset not correctly detected

2005-12-23 Thread hirokawa
 ID:   35711
 Updated by:   [EMAIL PROTECTED]
 Reported By:  matteo at beccati dot com
-Status:   Wont fix
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: Debian GNU/Linux
 PHP Version:  5.1.1
 Assigned To:  hirokawa
 New Comment:

Have you ever tried the strict mode (default:FALSE) ?

string mb_detect_encoding ( string str [, mixed encoding_list [, bool
strict]] )



Previous Comments:


[2005-12-20 17:10:56] matteo at beccati dot com

Of course, I agree that 0xe8 is a valid if taken as part of a multibyte
character, but I don't think it could be considered valid it the next
bytes are missing (because the string ends prematurely). The iconv
extension raises notices when it finds illegal or incomplete multibyte
characters, I don't see why mbstring should accept as a valid UTF-8 a
string which indeed isn't.

The same should apply to other multibyte encodings.



[2005-12-20 15:44:31] [EMAIL PROTECTED]

Please note that encoding detection is not always perfect.
Especially, when the string is too short, the wrong detection might be
caused.
In your case, it is not a bug, but it is the specification.
UTF-8 is a variable length multibyte encoding format,
the length of a character in UTF-8 is from one to six.
Please look at ext/mbstring/libmbfl/filter/mbfilter_utf8.c:about 249L.
0xe8 is a valid byte sequence as the 1st character of 3 byte code.
We cannot detect 0xe8 is ISO-8859-1 or UTF-8,
because this byte is valid in both encodings.
In this case, the response will be choose 
from the order defined by mb_detect_order().
I suggest to use the sufficient length of string
for the reliable encoding detection.













[2005-12-19 09:03:36] [EMAIL PROTECTED]

Rui, can you check this out please?



[2005-12-19 09:00:50] matteo at beccati dot com

Oops, I just realized that I forgot the -u flag :)

Here is the downlaodable patch:

http://beccati.com/download/mbstring-patch-20051219.txt



[2005-12-19 08:48:47] [EMAIL PROTECTED]

Please provide any patches in unified diff format. (like the first
one). And downloadable somewhere.



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/35711

-- 
Edit this bug report at http://bugs.php.net/?id=35711&edit=1


#32062 [Asn->Fbk]: mbstring fails to match encoding with some locale settings

2005-12-23 Thread hirokawa
 ID:   32062
 Updated by:   [EMAIL PROTECTED]
 Reported By:  [EMAIL PROTECTED]
-Status:   Assigned
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: *
 PHP Version:  5CVS, 4CVS (2005-02-22)
 Assigned To:  hirokawa
 New Comment:

mbstring (libmbfl) uses strcasecmp() which depends on the locale.
If the specified locale is not supported in the system,
encoding match fails.
It is not the problem of mbstring, but it is the problem of 
system setting.





Previous Comments:


[2005-12-21 23:24:01] [EMAIL PROTECTED]

Rui, check this too if you don't mind. :)



[2005-02-22 16:17:51] [EMAIL PROTECTED]

Right, there were typos. The reproduce code should've 
been





[2005-02-22 15:25:03] [EMAIL PROTECTED]

tr_TR == Turkish, and ISO-8859-1 is not a valid character set of that
locale, no?



[2005-02-22 06:55:52] [EMAIL PROTECTED]

Description:

mbstring fails to match encoding name against any one of 
the supported encodings with some locale settings.

Irrelevant to bug #29955.

Reproduce code:
---


Expected result:

string(1) "a"
string(1) "a"
string(1) "a"
string(1) "a"

Actual result:
--
string(1) "a"
string(1) "a"

Warning: mb_convert_encoding(): Illegal character 
encoding specified in %s on line %d
string(1) "a"
string(1) "a"





-- 
Edit this bug report at http://bugs.php.net/?id=32062&edit=1


#29955 [Asn->Fbk]: mb_strtoupper() / lower() broken with some locale

2005-12-23 Thread hirokawa
 ID:   29955
 Updated by:   [EMAIL PROTECTED]
 Reported By:  jan at horde dot org
-Status:   Assigned
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  5CVS, 4CVS (2004-09-02)
 Assigned To:  hirokawa
 New Comment:

Please try using this CVS snapshot:

  http://snaps.php.net/php6.0-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php6.0-win32-latest.zip

Turkish language support is added in CVS HEAD.
When mbstring.language = Turkish,
Turkish case filding will be performed in ISO-8859-9.
(upper:0x69 -> 0xdd, lower:0x49->0xfd)
Otherwise, normal case folding is performed.
(upper:0x69 -> 0x49, lower:0x49->0x69)



Previous Comments:


[2005-12-23 14:28:29] [EMAIL PROTECTED]

"man iso-8859-9" will tell you.

"i" maps to "0xdd"
and
"0xfd" maps to "I"

See also:
http://www.eki.ee/letter/chardata.cgi?lang=tr+Turkish&script=latin



[2005-12-23 14:24:06] jan at horde dot org

See http://www.gymel.com/charsets/ISO8859-9.html#U0069 and
http://www.gymel.com/charsets/ISO8859-9.html#U0049 under "Bemerkungen:"
(remarks).



[2005-12-23 14:10:05] [EMAIL PROTECTED]

I don't know which is the standard way (0x49 or 0xdd).
In ISO-8859-9 (Turkish),
upper case of 'i' (0x69) always should be translated to 'I' 
with dot (0xdd) ?
If yes, please let me know some URLs which describe 
the mapping.





[2005-12-21 23:22:57] [EMAIL PROTECTED]

Rui, yet another for you. Moriyoshi seems to have vanished..



[2005-08-08 00:18:22] [EMAIL PROTECTED]

Are you going to fix this or not? If not, change the status to 'wont
fix'.




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/29955

-- 
Edit this bug report at http://bugs.php.net/?id=29955&edit=1


#29955 [Asn->Ana]: mb_strtoupper() / lower() broken with some locale

2005-12-23 Thread hirokawa
 ID:   29955
 Updated by:   [EMAIL PROTECTED]
 Reported By:  jan at horde dot org
-Status:   Assigned
+Status:   Analyzed
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  5CVS, 4CVS (2004-09-02)
 Assigned To:  hirokawa
 New Comment:

I don't know which is the standard way (0x49 or 0xdd).
In ISO-8859-9 (Turkish),
upper case of 'i' (0x69) always should be translated to 'I' 
with dot (0xdd) ?
If yes, please let me know some URLs which describe 
the mapping.




Previous Comments:


[2005-12-21 23:22:57] [EMAIL PROTECTED]

Rui, yet another for you. Moriyoshi seems to have vanished..



[2005-08-08 00:18:22] [EMAIL PROTECTED]

Are you going to fix this or not? If not, change the status to 'wont
fix'.




[2005-05-13 08:00:26] [EMAIL PROTECTED]

Turkish locale would need complete overhaul on the 
entire extension because the locale's character 
properties and required case folding behaviour are very 
special.

PHP-ICU extension could support anything, but that's 
just an ongoing work by l0t3k.




[2005-05-13 02:26:18] mustafa at deu dot edu dot tr

I get the same results like jan.

I need to get UTF-8 output for consuming a web service and I configured
my php 5.0.4 with --enable-mbstring=all parameter (on linux that has
been set with Turkish locale)

I see that mbstring extension has limited language support in source
code. (German, English, Japanese, Korean, Russian, Chinese)

Is there a way to add our (Turkish) language to source code? Any
references about this extension's source?



[2005-02-22 11:10:06] [EMAIL PROTECTED]

It turned out this is because mbstring doesn't take the 
locale into consideration.





The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/29955

-- 
Edit this bug report at http://bugs.php.net/?id=29955&edit=1


#28899 [Asn->Csd]: mb_substr() and substr() work differently

2005-12-23 Thread hirokawa
 ID:   28899
 Updated by:   [EMAIL PROTECTED]
 Reported By:  mauroi at digbang dot com
-Status:   Assigned
+Status:   Closed
 Bug Type: mbstring related
 Operating System: *
 PHP Version:  5CVS, 4CVS (2004-12-12)
 Assigned To:  hirokawa
 New Comment:

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.

With this fix, when mbstring.func_overload is enable for string
functions, mb_substr() will return FALSE for substr('', 0).
When mbstring.func_overload is disabled, mb_substr() will return ''.



Previous Comments:


[2005-12-21 23:21:15] [EMAIL PROTECTED]

Rui, can you check this please? Seems a bit odd that mbstring
overloaded substr() works differently from the PHP core substr()..




[2005-02-23 16:56:34] drraf at tlen dot pl

If mb_string() can overload substr() (when function overloading in on
when using mbstring) - in my opinion mb_substr() should be fixed.



[2005-02-03 03:25:48] [EMAIL PROTECTED]

Whatever is the "logical" behaviour of the function, it doesn't really
matter: We will NOT change the behaviour of substr() at this point.
Thus the only place to change is mbstring. 



[2004-12-20 13:58:20] mauroi at digbang dot com

just to mention it... lot of code written with the mb_* function
overload relies on substr returning a zero length string... changing
substr to work like mb_substr won't break anything (i think)



[2004-12-20 10:28:55] [EMAIL PROTECTED]

The very nature of "substr" is that the function returns 
the specified part of the string whenever the range is 
valid and returns an error status if it is out of range.

If a null string is a valid string entity, then it 
should be able to be referred to by index "0" and thus 
the implementation returns a null string instead of 
false. Or you would say this isn't really logical? :)




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/28899

-- 
Edit this bug report at http://bugs.php.net/?id=28899&edit=1


#35711 [Asn->Ana]: [PATCH] ISO-8859 charset not correctly detected

2005-12-20 Thread hirokawa
 ID:   35711
 Updated by:   [EMAIL PROTECTED]
 Reported By:  matteo at beccati dot com
-Status:   Assigned
+Status:   Analyzed
 Bug Type: mbstring related
 Operating System: Debian GNU/Linux
 PHP Version:  5.1.1
 Assigned To:  hirokawa
 New Comment:

Please note that encoding detection is not always perfect.
Especially, when the string is too short, the wrong detection might be
caused.
In your case, it is not a bug, but it is the specification.
UTF-8 is a variable length multibyte encoding format,
the length of a character in UTF-8 is from one to six.
Please look at ext/mbstring/libmbfl/filter/mbfilter_utf8.c:about 249L.
0xe8 is a valid byte sequence as the 1st character of 3 byte code.
We cannot detect 0xe8 is ISO-8859-1 or UTF-8,
because this byte is valid in both encodings.
In this case, the response will be choose 
from the order defined by mb_detect_order().
I suggest to use the sufficient length of string
for the reliable encoding detection.












Previous Comments:


[2005-12-19 09:03:36] [EMAIL PROTECTED]

Rui, can you check this out please?



[2005-12-19 09:00:50] matteo at beccati dot com

Oops, I just realized that I forgot the -u flag :)

Here is the downlaodable patch:

http://beccati.com/download/mbstring-patch-20051219.txt



[2005-12-19 08:48:47] [EMAIL PROTECTED]

Please provide any patches in unified diff format. (like the first
one). And downloadable somewhere.



[2005-12-16 23:50:13] matteo at beccati dot com

I've made a patch which seems to fix the issue. It basicly checks
filter status during judgement. Status seems to be != 0 only when it is
matching a multibyte character. I added anyway a fallback to the old
judgement routine, just in case no matching encoding is found.

Index: ext/mbstring/libmbfl/mbfl/mbfilter.c
===
RCS file: /repository/php-src/ext/mbstring/libmbfl/mbfl/mbfilter.c,v
retrieving revision 1.7.2.1
diff -u -r1.7.2.1 mbfilter.c
--- ext/mbstring/libmbfl/mbfl/mbfilter.c5 Nov 2005 04:49:57
-  1.7.2.1
+++ ext/mbstring/libmbfl/mbfl/mbfilter.c16 Dec 2005 22:46:26
-
@@ -575,12 +575,22 @@

for (i = 0; i < num; i++) {
filter = &flist[i];
-   if (!filter->flag) {
+   if (!filter->flag && !filter->status) {
encoding = filter->encoding;
break;
}
}

+   if (!encoding) {
+   for (i = 0; i < num; i++) {
+   filter = &flist[i];
+   if (!filter->flag) {
+   encoding = filter->encoding;
+   break;
+   }
+   }
+   }
+
/* cleanup */
/* dtors should be called in reverse order */
i = num; while (--i >= 0) {



[2005-12-16 17:18:27] matteo at beccati dot com

Description:

I was evaluating the mbstring extension because of its capabilities to
filter and convert input parameter to the correct encoding. During my
test I found out that an ISO-8859-1 string which ends with an an
accented character is wrongly detected as UTF-8, even if it ends with
an incomplete multibyte character (using iconv to convert the string
raises such notice).

Also reproduced with PHP 4.3.11 on FreeBSD 4 and 5.0.2 on Win32.


Reproduce code:
---


Expected result:

Trying: string(7) "Test: à"

Notice: iconv(): Detected an incomplete multibyte character in input
string in test.php on line 13
Detected encoding: ISO-8859-1
Converted string:string(8) "Test: Ã "

Trying: string(8) "Test: àa"

Notice: iconv(): Detected an illegal character in input string in
/var/www/mbstring/test.php on line 13
Detected encoding: ISO-8859-1
Converted string:string(9) "Test: Ã a"


Actual result:
--
Trying: string(7) "Test: à"

Notice: iconv(): Detected an incomplete multibyte character in input
string in test.php on line 13
Detected encoding: UTF-8
Converted string:string(6) "Test: "

Trying: string(8) "Test: àa"

Notice: iconv(): Detected an illegal character in input string in
/var/www/mbstring/test.php on line 13
Detected encoding: ISO-8859-1
Converted string:string(9) "Test: Ã a"






-- 
Edit this bug report at http://bugs.php.net/?id=35711&edit=1


#35307 [Asn->Ana]: By the unexpected header can be injected at the mb_send_mail

2005-11-21 Thread hirokawa
 ID:   35307
 Updated by:   [EMAIL PROTECTED]
 Reported By:  s dot masugata at digicom dot dnp dot co dot jp
-Status:   Assigned
+Status:   Analyzed
 Bug Type: mbstring related
 Operating System: Solaris8
 PHP Version:  5CVS, 4CVS (2005-11-21) (snap)
 Assigned To:  hirokawa


Previous Comments:


[2005-11-21 09:10:31] [EMAIL PROTECTED]

Assigned to the maintainer.



[2005-11-21 02:13:25] s dot masugata at digicom dot dnp dot co dot jp

Description:

The unexpected header can be injected at the mb_send_mail function.
The mail function is doing the check of the unexpected  control code to
"To" and "Subject".
However, the mb_send_mail function isn't doing a check.


By the feature of the function overload, mail function is exchanged for
the mb_send_mail function.
Therefore, it thinks that the check like the mail function is necessary
about the mb_send_mail function, too.

It is "To" that seems to need a check.
The report is PHP4 but needs the same correction about PHP5.


--- php-4.4.2RC1/ext/mbstring/mbstring.c,orig   2005-11-05
10:14:05.0 +0900
+++ php-4.4.2RC1/ext/mbstring/mbstring.c2005-11-21 09:42:42.0
+0900
@@ -3460,6 +3460,22 @@
  *  Sends an email message with MIME scheme
  */
 #if HAVE_SENDMAIL
+#define SKIP_LONG_HEADER_SEP_MBSTRING(str, pos)
\
+   if (str[pos] == '\r' && str[pos + 1] == '\n' && (str[pos + 2] == ' '
|| str[pos + 2] == '\t')) { \
+   pos += 3;   
\
+   while (str[pos] == ' ' || str[pos] == '\t') {   
\
+   pos++;  
\
+   }   
\
+   continue;   
\
+   }   
\
+   else if (str[pos] == '\n' && (str[pos + 1] == ' ' || str[pos + 1] ==
'\t')) {\
+   pos += 2;   
\
+   while (str[pos] == ' ' || str[pos] == '\t') {   
\
+   pos++;  
\
+   }   
\
+   continue;   
\
+   }   
\
+
 PHP_FUNCTION(mb_send_mail)
 {
int argc, n;
@@ -3475,6 +3491,8 @@
mbfl_memory_device device;  /* automatic allocateable buffer for
additional header */
const mbfl_language *lang;
int err = 0;
+   char *to_r;
+   int to_len, i;
 
/* initialize */
mbfl_memory_device_init(&device, 0, 0);
@@ -3501,6 +3519,29 @@
convert_to_string_ex(argv[0]);
if (Z_STRVAL_PP(argv[0])) {
to = Z_STRVAL_PP(argv[0]);
+   to_len = Z_STRLEN_PP(argv[0]);
+   if (to_len > 0) {
+   to_r = estrndup(to, to_len);
+   for (; to_len; to_len--) {
+   if (!isspace((unsigned char) to_r[to_len - 1])) 
{
+   break;
+   }
+   to_r[to_len - 1] = '\0';
+   }
+   for (i = 0; to_r[i]; i++) {
+   if (iscntrl((unsigned char) to_r[i])) {
+   /* According to RFC 822, 
section 3.1.1 long headers may be
separated into
+* parts using CRLF followed at least 
one linear-white-space
character ('\t' or ' ').
+* To prevent these separators from 
being replaced with a space,
we use the
+* SKIP_LONG_HEADER_SEP_MBSTRING to 
skip over them.
+*/
+   SKIP_LO

#35253 [Asn->Csd]: Length of the encoded character string violates a RFC.

2005-11-18 Thread hirokawa
 ID:   35253
 Updated by:   [EMAIL PROTECTED]
 Reported By:  s dot masugata at digicom dot dnp dot co dot jp
-Status:   Assigned
+Status:   Closed
 Bug Type: mbstring related
 Operating System: Solaris8
 PHP Version:  5CVS, 4CVS (2005-11-17) (snap)
 Assigned To:  hirokawa
 New Comment:

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.




Previous Comments:


[2005-11-18 09:13:27] [EMAIL PROTECTED]

Rui, this is yours, I presume. :)




[2005-11-18 09:04:06] s dot masugata at digicom dot dnp dot co dot jp

Thank you for reply. :)

Sorry, Mistaking by the sample scripts to have been specified by me
of.
it's as the following:
";
   mail( "[EMAIL PROTECTED]", "TEST Subject", "TEST Body",
$Cc );

?>

Cc:
=?EUC-JP?B?pKKkoqSipKKkoqSipKKkoqSipKKkoqSipKKkoqSipKKkoqSipKKkoqSipKI=?=

=?EUC-JP?B?pKKkoqSipKKkoqSipKKkoqSipKKkoqSipKKkoqSipKKkoqSipKKkoqSipKI=?=
 =?EUC-JP?B?pKKkog==?= <[EMAIL PROTECTED]>

Because it doesn't consider a field identifier "Cc: " about the
mb_encode_mimeheader function, the field length at the head exceeds 76
characters.

It found the patch which solves this problem.
http://www.geocities.jp/rui_hirokawa/php/patch/php4_mb_mime_offset.patch.txt

Please committing, I'm happy. :)
rui knows details.



[2005-11-17 14:17:22] [EMAIL PROTECTED]

Please put the reproduce script somewhere and paste the link here, so
it won't get corrupted by the bug system.



[2005-11-17 13:21:43] s dot masugata at digicom dot dnp dot co dot jp

Attempted to try in php5-200511170730 but the status doesn't change.



[2005-11-17 09:21:34] [EMAIL PROTECTED]

Please try using this CVS snapshot:

  http://snaps.php.net/php5-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php5-win32-latest.zip





The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/35253

-- 
Edit this bug report at http://bugs.php.net/?id=35253&edit=1


#33720 [Fbk->Csd]: mb_encode_mimeheader does not work

2005-11-08 Thread hirokawa
 ID:   33720
 Updated by:   [EMAIL PROTECTED]
 Reported By:  s dot masugata at digicom dot dnp dot co dot jp
-Status:   Feedback
+Status:   Closed
 Bug Type: mbstring related
 Operating System: Solaris8
 PHP Version:  4.4.0
 Assigned To:  hirokawa
 New Comment:

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.




Previous Comments:


[2005-11-07 23:07:18] [EMAIL PROTECTED]

Please try using this CVS snapshot:

  http://snaps.php.net/php4-STABLE-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php4-win32-STABLE-latest.zip





[2005-08-17 10:18:02] misiek at dione dot ids dot pl

Argh, these characters in $str are s' and l/ in iso8859-2 and should be
written in 8-bit not as html entities.



[2005-08-17 10:13:56] misiek at dione dot ids dot pl

This function doesn't work for me in different situation, too:



should produce:
costam =?ISO-8859-2?Q?U=BFytkownik=20co=B6tam=20inne?=
(and this works on php 5.0.4)

but on 4.4.0 it produces:
costam U|ytkownik co[tam inne



[2005-07-18 02:22:31] [EMAIL PROTECTED]

Moriyoshi: Somehow it looks like you never fixed #321311 anywhere, just
added a NEWS entry ???





[2005-07-16 05:14:58] s dot masugata at digicom dot dnp dot co dot jp

Description:

http://bugs.php.net/bug.php?id=32311
mb_encode_mimeheader is did not operate by the influence that corrected
this problem. 


Reproduce code:
---



Expected result:

string(34) "=?ISO-2022-JP?B?GyRCST1CahsoQg==?="

Actual result:
--
string(2) "hL"





-- 
Edit this bug report at http://bugs.php.net/?id=33720&edit=1


#33720 [Asn->Fbk]: mb_encode_mimeheader does not work

2005-11-07 Thread hirokawa
 ID:   33720
 Updated by:   [EMAIL PROTECTED]
 Reported By:  s dot masugata at digicom dot dnp dot co dot jp
-Status:   Assigned
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: Solaris8
 PHP Version:  4.4.0
-Assigned To:  moriyoshi
+Assigned To:  hirokawa
 New Comment:

Please try using this CVS snapshot:

  http://snaps.php.net/php4-STABLE-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php4-win32-STABLE-latest.zip




Previous Comments:


[2005-08-17 10:18:02] misiek at dione dot ids dot pl

Argh, these characters in $str are s' and l/ in iso8859-2 and should be
written in 8-bit not as html entities.



[2005-08-17 10:13:56] misiek at dione dot ids dot pl

This function doesn't work for me in different situation, too:



should produce:
costam =?ISO-8859-2?Q?U=BFytkownik=20co=B6tam=20inne?=
(and this works on php 5.0.4)

but on 4.4.0 it produces:
costam U|ytkownik co[tam inne



[2005-07-18 02:22:31] [EMAIL PROTECTED]

Moriyoshi: Somehow it looks like you never fixed #321311 anywhere, just
added a NEWS entry ???





[2005-07-16 05:14:58] s dot masugata at digicom dot dnp dot co dot jp

Description:

http://bugs.php.net/bug.php?id=32311
mb_encode_mimeheader is did not operate by the influence that corrected
this problem. 


Reproduce code:
---



Expected result:

string(34) "=?ISO-2022-JP?B?GyRCST1CahsoQg==?="

Actual result:
--
string(2) "hL"





-- 
Edit this bug report at http://bugs.php.net/?id=33720&edit=1


#34830 [Fbk->Csd]: hirokawa

2005-11-04 Thread hirokawa
 ID:   34830
 Updated by:   [EMAIL PROTECTED]
-Summary:  mail() does not fetch mail.force_extra_parameters
 Reported By:  [EMAIL PROTECTED]
-Status:   Feedback
+Status:   Closed
 Bug Type: PHP options/info functions
 Operating System: Linux
 PHP Version:  4.4.1RC1
 New Comment:

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.




Previous Comments:


[2005-10-31 01:25:49] [EMAIL PROTECTED]

And same in understandable english please?



[2005-10-11 16:56:18] [EMAIL PROTECTED]

Description:

mail.force_extra_parameters became effective by mb_send_mail() by PHP
4.4.1RC1.
I also want mail() to confirm this.

mb_send_mail() is the upper compatible function of mail().
Therefore, this is not "The addition of a feature", it is "fixed bug".









-- 
Edit this bug report at http://bugs.php.net/?id=34830&edit=1


#31987 [Asn->Csd]: hirokawa

2005-02-19 Thread hirokawa
 ID:   31987
 Updated by:   [EMAIL PROTECTED]
 Reported By:  [EMAIL PROTECTED]
-Status:   Assigned
+Status:   Closed
 Bug Type: mbstring related
 Operating System: Windows XP PRo SP2
 PHP Version:  5CVS-2005-02-15 (dev)
 Assigned To:  fujimoto
 New Comment:

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.




Previous Comments:


[2005-02-19 09:01:08] [EMAIL PROTECTED]

Here is a patch to solve this problem,
I don't know if it has any side-effect.

*** Zend/zend_language_scanner.l13 Feb 2005 13:50:48 -  1.120
--- Zend/zend_language_scanner.l19 Feb 2005 07:42:35 -
***
*** 135,140 
--- 135,144 
SCNG(script_org_size) = 0;
SCNG(script_filtered) = NULL;
SCNG(script_filtered_size) = 0;
+   SCNG(input_filter) = NULL;
+   SCNG(output_filter) = NULL;
+   SCNG(script_encoding) = NULL;
+   SCNG(internal_encoding) = NULL;
  #endif /* ZEND_MULTIBYTE */
  }



[2005-02-15 16:05:12] [EMAIL PROTECTED]

Description:

When ZEND_MULTIBYTE option is defined on Windows XP Pro SP2,
and PHP is compiled as Apache 2 module,
PHP 5.0.x break up by phpinfo().

Here is backtrace by VC6++ for PHP 5.0.4dev,

zif_phpinfo()
 -> php_print_info(-1 TSRMLS_CC);
   -> zend_html_puts(zend_version, 
   strlen(zend_verison TSRMLS_CC)
 -> (L66)
LANG_SCNG(output_filter)(&filtered, 
   &filtered_len, s, len TSRMLS_CC)

 filtered = 0x "";
 &filtered = 0x04fcf54c
 filtered_len = -858993460
 s = "Zend Engine v2.1.0-dev,..."
 len = 66


Reproduce code:
---



Expected result:

php releated info.



Actual result:
--
PHP 5.0.4dev executed as Apache 2 module will crash.





-- 
Edit this bug report at http://bugs.php.net/?id=31987&edit=1


#25995 [Asn->Csd]: mbstring configuration variables are marked PHP_INI_ALL, but this is false.

2003-11-02 Thread hirokawa
 ID:   25995
 Updated by:   [EMAIL PROTECTED]
-Summary:  multipart/form-date file upload problem.
 Reported By:  s dot masugata at digicom dot dnp dot co dot jp
-Status:   Assigned
+Status:   Closed
 Bug Type: mbstring related
-Operating System: FreeBSD/Linux/Solaris(sparc)
+Operating System: Linux
-PHP Version:  4.3.4RC2
+PHP Version:  4.3.0
 Assigned To:  hirokawa
 New Comment:

This bug has been fixed in CVS.

In case this was a PHP problem, snapshots of the sources are packaged
every three hours; this change will be in the next snapshot. You can
grab the snapshot at http://snaps.php.net/.
 
In case this was a documentation problem, the fix will show up soon at
http://www.php.net/manual/.

In case this was a PHP.net website problem, the change will show
up on the PHP.net site and on the mirror sites in short time.
 
Thank you for the report, and for helping us make PHP better.

This problem is fixed in PHP4.3.4RC4 and it will be applied also for
PHP 4.3.4 Release.
 


Previous Comments:


[2003-10-28 09:26:18] [EMAIL PROTECTED]

Assigned to Rui, he already had a patch.
It was decided not to go in 4.3.4 though.



[2003-10-26 18:31:04] s dot masugata at digicom dot dnp dot co dot jp

Description:

./configure:
--enable-zend-multibyte
--enable-mbstring
--enable-mbregex

php.ini:
output_buffering = Off
output_handler =
mbstring.language = Japanese
mbstring.encoding_translation = On
mbstring.internal_encoding= EUC-JP
mbstring.http_input   = pass
mbstring.http_output  = pass
mbstring.detect_order =
SJIS-win,SJIS,eucJP-win,EUC_JP,UTF-8,UTF-7,ISO-2022-JP,JIS,ASCII
mbstring.substitute_character = none
mbstring.func_overload= 1
mbstring.script_encoding  = SJIS


The value of $POST disregards the contents set up by php.ini, and 
is changed into an internal encoding.

In this setup, it should not be changed into an internal encoding.
However, it will be changed.

My purpose wants to only use the function to change script encoding 
by zend-multibyte into an internal encoding, and the function which 
even input data changes into an internal encoding is not needed.

The transmitted input data may be broken down by this unjust 
operation(for example, the special character by the specific model).


I think that this operation is not specification but a bug with 
it being individual.

It seems that it is fixed in the snapshot in the built on Oct 26,
 2003 20:30 GMT, is it fixed as well as a snapshot when PHP4.3.4 
is released?

thank you.






-- 
Edit this bug report at http://bugs.php.net/?id=25995&edit=1


#24309 [Asn->Csd]: mb_detect_encoding return EUC-JP for invalid EUC-JP char sequence

2003-07-13 Thread hirokawa
 ID:   24309
 Updated by:   [EMAIL PROTECTED]
 Reported By:  jc at mega-bucks dot co dot jp
-Status:   Assigned
+Status:   Closed
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  4.3.3RC1
 Assigned To:  hirokawa
 New Comment:

This bug has been fixed in CVS.

In case this was a PHP problem, snapshots of the sources are packaged
every three hours; this change will be in the next snapshot. You can
grab the snapshot at http://snaps.php.net/.
 
In case this was a documentation problem, the fix will show up soon at
http://www.php.net/manual/.

In case this was a PHP.net website problem, the change will show
up on the PHP.net site and on the mirror sites in short time.
 
Thank you for the report, and for helping us make PHP better.

I made a mistake.
B7F6 is correct byte code in EUC-JP. But,
BA7E is not correct byte code in EUC-JP.
So, it is not correct EUC-JP byte sequence.

mb_detect_encoding() can choose a best candidate in 
encoding list, but, it can't detect the corruption.
mb_detect_encoding() assumes byte characters are not corrupted and it
stops the detection if the number of 
candidate is only one.

I added 'strict detection mode' to detect corrupted string in CVS
tree.
You should specify TRUE in third argument of mb_detect_encoding to use
strict detection mode.














Previous Comments:


[2003-07-07 06:15:52] [EMAIL PROTECTED]

Assigned to the one person who knows what this is about.. :)




[2003-07-01 21:39:00] jc at mega-bucks dot co dot jp

hirokawa wrote:

"URL decoded byte sequance of 'search_words=%B7%F6%BA%7E' is
B7E6+BA7E, which is correct EUC-JP character sequence."

You mean the decoded sequence is B7F6+BA7E, not B7E6+BA7E, right?

"B7E6+BA7E [...] is correct EUC-JP byte sequence."

Again do you mean that B7F6+BA7E is correct EUC-JP? I don't think it is
...

Thanks.



[2003-06-30 20:01:11] jc at mega-bucks dot co dot jp

Are you sure? ^_^

I am not an encoding expert so if you say that it is a valid sequence I
believe you but ...

I am using postgreSQL as a database and it says that it is not a valid
EUC sequence. So either PHP is wrong or the database is wrong :)

Here is my test code:

  echo "Checking $string ...";
  $sql = "select id from products where name like '$string'";
  $conn = pg_connect("host=$IP port=5432 dbname=$DB user=postgres");
  $res  = pg_query($conn, $sql);
  $err_msg = pg_last_error($conn);
  if (preg_match("/Invalid EUC_JP character sequence found/",
$err_msg)) {
echo "NOT VALID";
  }

The error message returned by the DB is:

pg_query(): Query failed: ERROR:  Invalid EUC_JP character sequence
found (0xba7e)

The output is:

Checking 喧� ...
NOT VALID

I'll post this to the postgreSQL developer's list also in case it is a
bug in postgreSQL.

If you are certain that this character sequence is valid can you point
me to a ressource I can use to show the postgreSQL team that they have
a bug that needs fixing?

Thanks!



[2003-06-30 07:49:30] [EMAIL PROTECTED]

It is not a bug of mbstring.
0xb7,0xf6,0xba,0x7e is a correct byte seqence of EUC-JP.




[2003-06-28 09:40:13] [EMAIL PROTECTED]

URL decoded byte sequance of 'search_words=%B7%F6%BA%7E' is
B7E6+BA7E, which is correct EUC-JP character sequence.



Encoding detection is not perfect, it may make mistake if the length of
string is too short.

But, I believe encoding detection of mbstring works fine in this case.
B7E6+BA7E is not correct byte sequence of SJIS, UTF-8, ISO2022-JP. It
is correct EUC-JP byte sequence.







The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/24309

-- 
Edit this bug report at http://bugs.php.net/?id=24309&edit=1



#24309 [Ana->Bgs]: mb_detect_encoding return EUC-JP for invalid EUC-JP char sequence

2003-06-30 Thread hirokawa
 ID:   24309
 Updated by:   [EMAIL PROTECTED]
 Reported By:  jc at mega-bucks dot co dot jp
-Status:   Analyzed
+Status:   Bogus
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  4.3.3RC1
 Assigned To:  hirokawa
 New Comment:

It is not a bug of mbstring.
0xb7,0xf6,0xba,0x7e is a correct byte seqence of EUC-JP.



Previous Comments:


[2003-06-28 09:40:13] [EMAIL PROTECTED]

URL decoded byte sequance of 'search_words=%B7%F6%BA%7E' is
B7E6+BA7E, which is correct EUC-JP character sequence.



Encoding detection is not perfect, it may make mistake if the length of
string is too short.

But, I believe encoding detection of mbstring works fine in this case.
B7E6+BA7E is not correct byte sequence of SJIS, UTF-8, ISO2022-JP. It
is correct EUC-JP byte sequence.







[2003-06-24 02:52:51] jc at mega-bucks dot co dot jp

Description:

I've just run into a strange "bug". I have a form on my web site that
takes input from the user and then uses that to do a search of a
postgresql database.

The form is set to be EUC-JP, but this weekend a user submitted a
query
that postgres reject because it "contains invalid EUC-JP" characters.
Luckily the error was logged and I was able to track it down.

I thought that maybe the user had entered some bad characters in the
form or used some strange encoding so I should better check to make
sure
that the encoding of the submitted form data really is EUC-JP using
mb_detect_encoding(). But unfortunately mb_detect_encoding() says that
the invalid string *is* in EUC-JP!?

The query string is as it appears in the URL is:

search_words=%B7%F6%BA%7E

In the script that parses this query I have put the following:

$words = $_GET["words"];
$enc = mb_detect_encoding($aI["words"]);
echo "encoding is $enc and the query is ($words)";die;

The result is:

encoding is EUC-JP and the query is (喧?)

As you can see the query string is *not* a valid EUC-JP sequence ...

Reproduce code:
---
$words = $_GET["words"];
$enc = mb_detect_encoding($aI["words"]);
echo "encoding is $enc and the query is ($words)";die;

Expected result:

SJIS (?) or Undefined.

mb_detect_encoding() does not specify what it returns if an invalid
character sequence for which the encoding cannot be detectec is passed
in.

In the above case the character sequence is valid SJIS I believe ...

Actual result:
--
EUC-JP





-- 
Edit this bug report at http://bugs.php.net/?id=24309&edit=1



#24106 [Ana->Bgs]: mbstring configuration variables are marked PHP_INI_ALL, but this is false.

2003-06-30 Thread hirokawa
 ID:   24106
 Updated by:   [EMAIL PROTECTED]
-Summary:  UTF8 to SJIS bug
 Reported By:  richard at enfour dot co dot jp
-Status:   Analyzed
+Status:   Bogus
 Bug Type: mbstring related
-Operating System: MacOSX
+Operating System: Linux
-PHP Version:  4.3.2
+PHP Version:  4.3.0
 Assigned To:  hirokawa
 New Comment:

I tested also on Linux using PHP 4.3.3RC1.


the output byte code is E748+90D5, as you are expecting.
I think it works fine.


Previous Comments:


[2003-06-28 09:16:53] [EMAIL PROTECTED]

I tested by a tiny script using PHP 4.3.3RC1 on Windows2000,

the output byte code is E748+90D5, as you are expecting.
I think it works fine.






[2003-06-10 02:00:57] richard at enfour dot co dot jp

It maybe elsewhere but I found a case where UTF-8 to 
SJIS mb_convert_encoding mashes a Japanese text string.

The string is the kanji for "souseki"
Unicode:
U8e2a+8de1

In SJIS it should be:
E748+90D5
but gets mashed.

EUC works...




-- 
Edit this bug report at http://bugs.php.net/?id=24106&edit=1



#24309 [Opn->Ana]: mb_detect_encoding return EUC-JP for invalid EUC-JP char sequence

2003-06-28 Thread hirokawa
 ID:   24309
 Updated by:   [EMAIL PROTECTED]
 Reported By:  jc at mega-bucks dot co dot jp
-Status:   Open
+Status:   Analyzed
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  4.3.3RC1
 Assigned To:  hirokawa


Previous Comments:


[2003-06-28 09:40:13] [EMAIL PROTECTED]

URL decoded byte sequance of 'search_words=%B7%F6%BA%7E' is
B7E6+BA7E, which is correct EUC-JP character sequence.



Encoding detection is not perfect, it may make mistake if the length of
string is too short.

But, I believe encoding detection of mbstring works fine in this case.
B7E6+BA7E is not correct byte sequence of SJIS, UTF-8, ISO2022-JP. It
is correct EUC-JP byte sequence.







[2003-06-24 02:52:51] jc at mega-bucks dot co dot jp

Description:

I've just run into a strange "bug". I have a form on my web site that
takes input from the user and then uses that to do a search of a
postgresql database.

The form is set to be EUC-JP, but this weekend a user submitted a
query
that postgres reject because it "contains invalid EUC-JP" characters.
Luckily the error was logged and I was able to track it down.

I thought that maybe the user had entered some bad characters in the
form or used some strange encoding so I should better check to make
sure
that the encoding of the submitted form data really is EUC-JP using
mb_detect_encoding(). But unfortunately mb_detect_encoding() says that
the invalid string *is* in EUC-JP!?

The query string is as it appears in the URL is:

search_words=%B7%F6%BA%7E

In the script that parses this query I have put the following:

$words = $_GET["words"];
$enc = mb_detect_encoding($aI["words"]);
echo "encoding is $enc and the query is ($words)";die;

The result is:

encoding is EUC-JP and the query is (喧?)

As you can see the query string is *not* a valid EUC-JP sequence ...

Reproduce code:
---
$words = $_GET["words"];
$enc = mb_detect_encoding($aI["words"]);
echo "encoding is $enc and the query is ($words)";die;

Expected result:

SJIS (?) or Undefined.

mb_detect_encoding() does not specify what it returns if an invalid
character sequence for which the encoding cannot be detectec is passed
in.

In the above case the character sequence is valid SJIS I believe ...

Actual result:
--
EUC-JP





-- 
Edit this bug report at http://bugs.php.net/?id=24309&edit=1



#24309 [Opn]: mb_detect_encoding return EUC-JP for invalid EUC-JP char sequence

2003-06-28 Thread hirokawa
 ID:   24309
 Updated by:   [EMAIL PROTECTED]
 Reported By:  jc at mega-bucks dot co dot jp
 Status:   Open
 Bug Type: mbstring related
 Operating System: Linux
 PHP Version:  4.3.3RC1
 New Comment:

URL decoded byte sequance of 'search_words=%B7%F6%BA%7E' is
B7E6+BA7E, which is correct EUC-JP character sequence.



Encoding detection is not perfect, it may make mistake if the length of
string is too short.

But, I believe encoding detection of mbstring works fine in this case.
B7E6+BA7E is not correct byte sequence of SJIS, UTF-8, ISO2022-JP. It
is correct EUC-JP byte sequence.






Previous Comments:


[2003-06-24 02:52:51] jc at mega-bucks dot co dot jp

Description:

I've just run into a strange "bug". I have a form on my web site that
takes input from the user and then uses that to do a search of a
postgresql database.

The form is set to be EUC-JP, but this weekend a user submitted a
query
that postgres reject because it "contains invalid EUC-JP" characters.
Luckily the error was logged and I was able to track it down.

I thought that maybe the user had entered some bad characters in the
form or used some strange encoding so I should better check to make
sure
that the encoding of the submitted form data really is EUC-JP using
mb_detect_encoding(). But unfortunately mb_detect_encoding() says that
the invalid string *is* in EUC-JP!?

The query string is as it appears in the URL is:

search_words=%B7%F6%BA%7E

In the script that parses this query I have put the following:

$words = $_GET["words"];
$enc = mb_detect_encoding($aI["words"]);
echo "encoding is $enc and the query is ($words)";die;

The result is:

encoding is EUC-JP and the query is (喧?)

As you can see the query string is *not* a valid EUC-JP sequence ...

Reproduce code:
---
$words = $_GET["words"];
$enc = mb_detect_encoding($aI["words"]);
echo "encoding is $enc and the query is ($words)";die;

Expected result:

SJIS (?) or Undefined.

mb_detect_encoding() does not specify what it returns if an invalid
character sequence for which the encoding cannot be detectec is passed
in.

In the above case the character sequence is valid SJIS I believe ...

Actual result:
--
EUC-JP





-- 
Edit this bug report at http://bugs.php.net/?id=24309&edit=1



#24106 [Opn->Ana]: UTF8 to SJIS bug

2003-06-28 Thread hirokawa
 ID:   24106
 Updated by:   [EMAIL PROTECTED]
 Reported By:  richard at enfour dot co dot jp
-Status:   Open
+Status:   Analyzed
 Bug Type: mbstring related
 Operating System: MacOSX
 PHP Version:  4.3.2
 New Comment:

I tested by a tiny script using PHP 4.3.3RC1 on Windows2000,

the output byte code is E748+90D5, as you are expecting.
I think it works fine.





Previous Comments:


[2003-06-10 02:00:57] richard at enfour dot co dot jp

It maybe elsewhere but I found a case where UTF-8 to 
SJIS mb_convert_encoding mashes a Japanese text string.

The string is the kanji for "souseki"
Unicode:
U8e2a+8de1

In SJIS it should be:
E748+90D5
but gets mashed.

EUC works...




-- 
Edit this bug report at http://bugs.php.net/?id=24106&edit=1



Bug #16959: ereg_replace returns wrong characters

2002-05-01 Thread hirokawa

From: [EMAIL PROTECTED]
Operating system: Linux and Win2K
PHP version:  4.2.0
PHP Bug Type: Regexps related
Bug description:  ereg_replace returns wrong characters

This is small script for testing.


In PHP <= 4.1.x,  this script returns empty string.
But, PHP 4.2.0 and PHP 4.2.1RC1 return '\1'.
The return value in PHP >= 4.2.0 should be corrected.
ereg_replace() is used for session id propagation in GET mode of PHPlib.
It is not work well because
it is affected by this problem.
 
-- 
Edit bug report at http://bugs.php.net/?id=16959&edit=1
-- 
Fixed in CVS:http://bugs.php.net/fix.php?id=16959&r=fixedcvs
Fixed in release:http://bugs.php.net/fix.php?id=16959&r=alreadyfixed
Need backtrace:  http://bugs.php.net/fix.php?id=16959&r=needtrace
Try newer version:   http://bugs.php.net/fix.php?id=16959&r=oldversion
Not developer issue: http://bugs.php.net/fix.php?id=16959&r=support
Expected behavior:   http://bugs.php.net/fix.php?id=16959&r=notwrong
Not enough info: http://bugs.php.net/fix.php?id=16959&r=notenoughinfo
Submitted twice: http://bugs.php.net/fix.php?id=16959&r=submittedtwice




Bug #15925 Updated: ext/gd/config.m4 broken in HEAD

2002-03-07 Thread hirokawa

 ID:   15925
 Updated by:   [EMAIL PROTECTED]
 Reported By:  [EMAIL PROTECTED]
-Status:   Critical
+Status:   Closed
 Bug Type: GD related
-Operating System: doesn't matter a single bit
+Operating System: doesn't matter a single bit
 PHP Version:  4.0CVS-2002-03-07
 New Comment:

This bug has been fixed in CVS.

I applied reverse patch for PHP 4.1.2.
gd2/freetype2 detection related issue is solved by this fix.

But, GD/Freetype related issue Rasmus mentioned is still existing and
need to befixed.



Previous Comments:


[2002-03-07 02:28:15] [EMAIL PROTECTED]

Guys, ext/gd/config.m4 needs to be fixed before 4.2 goes out.  Right
now
it does a terrible job detecting gd2/freetype2 TTF functions as the
tests
are missing the required -lfreetype.  Rolling back to config.m4 from
the
4.1.2 release makes everything work just fine.

-Rasmus




-- 
Edit this bug report at http://bugs.php.net/?id=15925&edit=1