#35647 [NoF-Opn]: tidy does not produce valid utf8 when the encoding is specified in the config

2006-01-27 Thread bugs at nikmakepeace dot com
 ID:   35647
 User updated by:  bugs at nikmakepeace dot com
 Reported By:  bugs at nikmakepeace dot com
-Status:   No Feedback
+Status:   Open
 Bug Type: XML related
 Operating System: FC3
 PHP Version:  5.1.1
 New Comment:

The source is available at
http://www.nikmakepeace.com/testcases/tidy-utf8.phps

Be sure to force your browser's character encoding to utf-8 before
copying it.

Note also that changing the last line to  echo
tidy_repair_string($dirty, $config, 'utf8'); produces the desired
results, but should not be necessary.


Previous Comments:


[2005-12-20 01:00:05] php-bugs at lists dot php dot net

No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to Open.



[2005-12-12 22:05:50] [EMAIL PROTECTED]

Put the data somewhere in the Net and paste the link here, please.




[2005-12-12 18:44:35] bugs at nikmakepeace dot com

Description:

If you specify utf8 encoding using the config options 'char-encoding',
'input-encoding' and 'output-encoding' with tidy it converts HTML
entities into their latin1, single-byte equivalents rather than the
correct, multi-byte utf-8 encodings (or just leaving them as entities)


The result is that nbsp; is converted into 0xA0, eacute; is converted
into 0xE9 and so on. This is not valid UTF-8 and so well-behaving XML
parsers, including PHP's DOM, fail.

Specifying 'utf8' as the third parameter works correctly.

Reproduce code:
---
?php
$dirty='a
href=http://fr.yahoo.com/r/n/fd/7/*http://fr.news.yahoo.com/12122005/202/beatrice-dalle-temoigne-au-proces-de-son-mari-accuse-de.html;Beacute;atrice
Dalle teacute;moigne au procegrave;s de son mari accuseacute; de
viol/abr/
smallnobra
href=http://rd.yahoo.co.jp/toppage/topinfo/rikunabi/051213/?http://katsuyou.rikunabi-shinsotsu.yahoo.co.jp/2007/;人と差がつく就職活動をしよう/a/nobr
- nobra
href=http://rd.yahoo.co.jp/toppage/topinfo/event_xmas/051125/?http://xmas.yahoo.co.jp/;ポイント5倍のクリスマスギフトは12時まで!/a/nobr/small';

$config['char-encoding']='utf8';
$config['input-encoding']='utf8';
$config['output-encoding']='utf8';
$config['output-xhtml']=true;

echo tidy_repair_string($dirty, $config);
?


Expected result:

Note well the correct unicode e-acute and e-grave in the French text.

?xml version=1.0?
html xmlns=http://www.w3.org/1999/xhtml;
head
title/title
/head
body
a href=
http://fr.yahoo.com/r/n/fd/7/*http://fr.news.yahoo.com/12122005/202/beatrice-dalle-temoigne-au-proces-de-son-mari-accuse-de.html;
Béatrice Dalle témoigne au procès de son mari accusé de
viol/abr /
smallnobra href=
http://rd.yahoo.co.jp/toppage/topinfo/rikunabi/051213/?http://katsuyou.rikunabi-shinsotsu.yahoo.co.jp/2007/;
人と差がつく就職活動をしよう/a/nobr - nobra href=
http://rd.yahoo.co.jp/toppage/topinfo/event_xmas/051125/?http://xmas.yahoo.co.jp/;
ポイント5倍のクリスマスギフトは12時まで!/a/nobr/small
/body
/html


Actual result:
--
Note how the e-acute and e-grave has been replaced with a non-unicode
character.

?xml version=1.0?
html xmlns=http://www.w3.org/1999/xhtml;
head
title/title
/head
body
a href=
http://fr.yahoo.com/r/n/fd/7/*http://fr.news.yahoo.com/12122005/202/beatrice-dalle-temoigne-au-proces-de-son-mari-accuse-de.html;
B�atrice Dalle t�moigne au proc�s de son mari accus� de
viol/abr /
smallnobra href=
http://rd.yahoo.co.jp/toppage/topinfo/rikunabi/051213/?http://katsuyou.rikunabi-shinsotsu.yahoo.co.jp/2007/;
人と差がつく就職活動をしよう/a/nobr -
nobra href=
http://rd.yahoo.co.jp/toppage/topinfo/event_xmas/051125/?http://xmas.yahoo.co.jp/;
ポイント5倍のクリスマスギフトは12時まで!/a/nobr/small
/body
/html






-- 
Edit this bug report at http://bugs.php.net/?id=35647edit=1


#36171 [NEW]: $_FILES array keys have periods replaced with underscores

2006-01-26 Thread bugs at nikmakepeace dot com
From: bugs at nikmakepeace dot com
Operating system: Fedora Core 3
PHP version:  5.1.2
PHP Bug Type: Variables related
Bug description:  $_FILES array keys have periods replaced with underscores

Description:

After a file upload, the $_FILES array does not faithfully reflect the
structure of the form from which the upload was made. An input with the
valid HTML name of 'a.b' becomes the key 'a_b' in the $_FILES array.

The same is true of the other request superglobals.

The PHP manual says that an array index can be any string, and the HTML 5
rec says that the name attribute can be CDATA. 

Now that register_globals is on its last legs in the community, could you
make it so that the dot transformation happens only when you explictly
extract() or import_request_variables()

Reproduce code:
---
form enctype=multipart/form-data method=POST
input name=a.b type=file /
input type=submit /
/form

Expected result:

Something like this:

$_FILES is array ( 
'a.b' = array ( 
  'name' = '5364-16.jpg', 
  'type' = 'image/jpeg', 
  'tmp_name' = 
  '/tmp/php7vvvc0', 
  'error' = 0, 
  'size' = 66554, ), 
)

Actual result:
--
$_FILES is array ( 
'a_b' = array ( 
  'name' = '5364-16.jpg', 
  'type' = 'image/jpeg', 
  'tmp_name' = 
  '/tmp/php7vvvc0', 
  'error' = 0, 
  'size' = 66554, ), 
)

-- 
Edit bug report at http://bugs.php.net/?id=36171edit=1
-- 
Try a CVS snapshot (PHP 4.4): 
http://bugs.php.net/fix.php?id=36171r=trysnapshot44
Try a CVS snapshot (PHP 5.1): 
http://bugs.php.net/fix.php?id=36171r=trysnapshot51
Try a CVS snapshot (PHP 6.0): 
http://bugs.php.net/fix.php?id=36171r=trysnapshot60
Fixed in CVS: http://bugs.php.net/fix.php?id=36171r=fixedcvs
Fixed in release: 
http://bugs.php.net/fix.php?id=36171r=alreadyfixed
Need backtrace:   http://bugs.php.net/fix.php?id=36171r=needtrace
Need Reproduce Script:http://bugs.php.net/fix.php?id=36171r=needscript
Try newer version:http://bugs.php.net/fix.php?id=36171r=oldversion
Not developer issue:  http://bugs.php.net/fix.php?id=36171r=support
Expected behavior:http://bugs.php.net/fix.php?id=36171r=notwrong
Not enough info:  
http://bugs.php.net/fix.php?id=36171r=notenoughinfo
Submitted twice:  
http://bugs.php.net/fix.php?id=36171r=submittedtwice
register_globals: http://bugs.php.net/fix.php?id=36171r=globals
PHP 3 support discontinued:   http://bugs.php.net/fix.php?id=36171r=php3
Daylight Savings: http://bugs.php.net/fix.php?id=36171r=dst
IIS Stability:http://bugs.php.net/fix.php?id=36171r=isapi
Install GNU Sed:  http://bugs.php.net/fix.php?id=36171r=gnused
Floating point limitations:   http://bugs.php.net/fix.php?id=36171r=float
No Zend Extensions:   http://bugs.php.net/fix.php?id=36171r=nozend
MySQL Configuration Error:http://bugs.php.net/fix.php?id=36171r=mysqlcfg


#35647 [NEW]: tidy does not produce vald utf8 when the encoding is specified in the config

2005-12-12 Thread bugs at nikmakepeace dot com
From: bugs at nikmakepeace dot com
Operating system: FC3
PHP version:  5.1.1
PHP Bug Type: Unknown/Other Function
Bug description:  tidy does not produce vald utf8 when the encoding is 
specified in the config

Description:

If you specify utf8 encoding using the config options 'char-encoding',
'input-encoding' and 'output-encoding' with tidy it converts HTML entities
into their latin1, single-byte equivalents rather than the correct,
multi-byte utf-8 encodings (or just leaving them as entities) 

The result is that nbsp; is converted into 0xA0, eacute; is converted
into 0xE9 and so on. This is not valid UTF-8 and so well-behaving XML
parsers, including PHP's DOM, fail.

Specifying 'utf8' as the third parameter works correctly.

Reproduce code:
---
?php
$dirty='a
href=http://fr.yahoo.com/r/n/fd/7/*http://fr.news.yahoo.com/12122005/202/beatrice-dalle-temoigne-au-proces-de-son-mari-accuse-de.html;Beacute;atrice
Dalle teacute;moigne au procegrave;s de son mari accuseacute; de
viol/abr/
smallnobra
href=http://rd.yahoo.co.jp/toppage/topinfo/rikunabi/051213/?http://katsuyou.rikunabi-shinsotsu.yahoo.co.jp/2007/;人と差がつく就職活動をしよう/a/nobr
- nobra
href=http://rd.yahoo.co.jp/toppage/topinfo/event_xmas/051125/?http://xmas.yahoo.co.jp/;ポイント5倍のクリスマスギフトは12時まで!/a/nobr/small';

$config['char-encoding']='utf8';
$config['input-encoding']='utf8';
$config['output-encoding']='utf8';
$config['output-xhtml']=true;

echo tidy_repair_string($dirty, $config);
?


Expected result:

Note well the correct unicode e-acute and e-grave in the French text.

?xml version=1.0?
html xmlns=http://www.w3.org/1999/xhtml;
head
title/title
/head
body
a href=
http://fr.yahoo.com/r/n/fd/7/*http://fr.news.yahoo.com/12122005/202/beatrice-dalle-temoigne-au-proces-de-son-mari-accuse-de.html;
Béatrice Dalle témoigne au procès de son mari accusé de
viol/abr /
smallnobra href=
http://rd.yahoo.co.jp/toppage/topinfo/rikunabi/051213/?http://katsuyou.rikunabi-shinsotsu.yahoo.co.jp/2007/;
人と差がつく就職活動をしよう/a/nobr - nobra href=
http://rd.yahoo.co.jp/toppage/topinfo/event_xmas/051125/?http://xmas.yahoo.co.jp/;
ポイント5倍のクリスマスギフトは12時まで!/a/nobr/small
/body
/html


Actual result:
--
Note how the e-acute and e-grave has been replaced with a non-unicode
character.

?xml version=1.0?
html xmlns=http://www.w3.org/1999/xhtml;
head
title/title
/head
body
a href=
http://fr.yahoo.com/r/n/fd/7/*http://fr.news.yahoo.com/12122005/202/beatrice-dalle-temoigne-au-proces-de-son-mari-accuse-de.html;
B�atrice Dalle t�moigne au proc�s de son mari accus� de
viol/abr /
smallnobra href=
http://rd.yahoo.co.jp/toppage/topinfo/rikunabi/051213/?http://katsuyou.rikunabi-shinsotsu.yahoo.co.jp/2007/;
人と差がつく就職活動をしよう/a/nobr -
nobra href=
http://rd.yahoo.co.jp/toppage/topinfo/event_xmas/051125/?http://xmas.yahoo.co.jp/;
ポイント5倍のクリスマスギフトは12時まで!/a/nobr/small
/body
/html


-- 
Edit bug report at http://bugs.php.net/?id=35647edit=1
-- 
Try a CVS snapshot (PHP 4.4): 
http://bugs.php.net/fix.php?id=35647r=trysnapshot44
Try a CVS snapshot (PHP 5.1): 
http://bugs.php.net/fix.php?id=35647r=trysnapshot51
Try a CVS snapshot (PHP 6.0): 
http://bugs.php.net/fix.php?id=35647r=trysnapshot60
Fixed in CVS: http://bugs.php.net/fix.php?id=35647r=fixedcvs
Fixed in release: 
http://bugs.php.net/fix.php?id=35647r=alreadyfixed
Need backtrace:   http://bugs.php.net/fix.php?id=35647r=needtrace
Need Reproduce Script:http://bugs.php.net/fix.php?id=35647r=needscript
Try newer version:http://bugs.php.net/fix.php?id=35647r=oldversion
Not developer issue:  http://bugs.php.net/fix.php?id=35647r=support
Expected behavior:http://bugs.php.net/fix.php?id=35647r=notwrong
Not enough info:  
http://bugs.php.net/fix.php?id=35647r=notenoughinfo
Submitted twice:  
http://bugs.php.net/fix.php?id=35647r=submittedtwice
register_globals: http://bugs.php.net/fix.php?id=35647r=globals
PHP 3 support discontinued:   http://bugs.php.net/fix.php?id=35647r=php3
Daylight Savings: http://bugs.php.net/fix.php?id=35647r=dst
IIS Stability:http://bugs.php.net/fix.php?id=35647r=isapi
Install GNU Sed:  http://bugs.php.net/fix.php?id=35647r=gnused
Floating point limitations:   http://bugs.php.net/fix.php?id=35647r=float
No Zend Extensions:   http://bugs.php.net/fix.php?id=35647r=nozend
MySQL Configuration Error:http://bugs.php.net/fix.php?id=35647r=mysqlcfg