From:             mbjr at mbjr dot hu
Operating system: Linux
PHP version:      5.1.2
PHP Bug Type:     Feature/Change Request
Bug description:  UTF-8 DeAccentizer

Description:
------------
Although UTF-8 is becoming widely supported, many people in relevant
countries are placing search string w/o any accents and special
characters, as they got used to the old system.

The only way atm to produce accent-free string is manual strtr for in
every case when such character is found.

Reproduce code:
---------------
n/a

Expected result:
----------------
Árvíztűrő tükörfúrógép -> Arvizturo tukorfurogep

These all below should be converted to "o":

Ò = capital letter o with grave
Ó = capital letter o with acute
Ô = capital letter o with circumflex
Õ = capital letter o with tilde
Ö = capital letter o with diaeresis
Ō = capital letter o with macron
Ŏ = capital letter o with breve
Ő = capital letter o with double acute
Ơ = capital letter o with horn
Ǒ = capital letter o with caron
Ǫ = capital letter o with ogonek
Ǭ = capital letter o with ogonek and macron
Ȍ = capital letter o with double grave
Ȏ = capital letter o with inverted breve
Ȫ = capital letter o with diaeresis and macron
Ȭ = capital letter o with tilde and macron
Ȯ = capital letter o with dot above
Ȱ = capital letter o with dot above and macron
Ṍ = capital letter o with tilde and acute
Ṏ = capital letter o with tilde and diaeresis
Ṑ = capital letter o with macron and grave
Ṓ = capital letter o with macron and acute
Ọ = capital letter o with dot below
Ỏ = capital letter o with hook above
Ố = capital letter o with circumflex and acute
Ồ = capital letter o with circumflex and grave
Ổ = capital letter o with circumflex and hook above
Ỗ = capital letter o with circumflex and tilde
Ộ = capital letter o with circumflex and dot below
Ớ = capital letter o with horn and acute
Ờ = capital letter o with horn and grave
Ở = capital letter o with horn and hook above
Ỡ = capital letter o with horn and tilde
Ợ = capital letter o with horn and dot below

Those 34 pieces above are latin capital letters but there're another 34
pieces for their small case, which means in the extended latin script set
we have 68 matches for an "o".

Same applies to e,u,i,a

Actual result:
--------------
n/a

-- 
Edit bug report at http://bugs.php.net/?id=36130&edit=1
-- 
Try a CVS snapshot (PHP 4.4): 
http://bugs.php.net/fix.php?id=36130&r=trysnapshot44
Try a CVS snapshot (PHP 5.1): 
http://bugs.php.net/fix.php?id=36130&r=trysnapshot51
Try a CVS snapshot (PHP 6.0): 
http://bugs.php.net/fix.php?id=36130&r=trysnapshot60
Fixed in CVS:                 http://bugs.php.net/fix.php?id=36130&r=fixedcvs
Fixed in release:             
http://bugs.php.net/fix.php?id=36130&r=alreadyfixed
Need backtrace:               http://bugs.php.net/fix.php?id=36130&r=needtrace
Need Reproduce Script:        http://bugs.php.net/fix.php?id=36130&r=needscript
Try newer version:            http://bugs.php.net/fix.php?id=36130&r=oldversion
Not developer issue:          http://bugs.php.net/fix.php?id=36130&r=support
Expected behavior:            http://bugs.php.net/fix.php?id=36130&r=notwrong
Not enough info:              
http://bugs.php.net/fix.php?id=36130&r=notenoughinfo
Submitted twice:              
http://bugs.php.net/fix.php?id=36130&r=submittedtwice
register_globals:             http://bugs.php.net/fix.php?id=36130&r=globals
PHP 3 support discontinued:   http://bugs.php.net/fix.php?id=36130&r=php3
Daylight Savings:             http://bugs.php.net/fix.php?id=36130&r=dst
IIS Stability:                http://bugs.php.net/fix.php?id=36130&r=isapi
Install GNU Sed:              http://bugs.php.net/fix.php?id=36130&r=gnused
Floating point limitations:   http://bugs.php.net/fix.php?id=36130&r=float
No Zend Extensions:           http://bugs.php.net/fix.php?id=36130&r=nozend
MySQL Configuration Error:    http://bugs.php.net/fix.php?id=36130&r=mysqlcfg

Reply via email to