Here is the library I had promised. The two functions you'd use the most are normalize() (which converts a UTF-8 encoded string into the "normalized", or romanized format) and denormalize() (which converts the romanized format back into UTF-8 encoded Perisan text.)

Be careful that if you use it for fast text search indexing in MySQL, it will be affected to some assumptions (such as skipping some words like "is", "am", etc.) and possibly other restrictions.

You may use this library under the terms of the GPL license.

Good luck,
Ehsan Akhgari

Attachment: normalization.php
Description: Binary data

_______________________________________________
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing

Reply via email to