derek Mon Apr 3 21:39:59 2006 UTC
Modified files: /phpdoc/en/reference/mbstring reference.xml Log: Added a few grammatical fixes and provided a more in-depth explanation of why we need mbstring because of the limitations of a byte.
http://cvs.php.net/viewcvs.cgi/phpdoc/en/reference/mbstring/reference.xml?r1=1.22&r2=1.23&diff_format=u Index: phpdoc/en/reference/mbstring/reference.xml diff -u phpdoc/en/reference/mbstring/reference.xml:1.22 phpdoc/en/reference/mbstring/reference.xml:1.23 --- phpdoc/en/reference/mbstring/reference.xml:1.22 Sun Sep 4 19:39:18 2005 +++ phpdoc/en/reference/mbstring/reference.xml Mon Apr 3 21:39:59 2006 @@ -1,5 +1,5 @@ <?xml version="1.0" encoding="iso-8859-1"?> -<!-- $Revision: 1.22 $ --> +<!-- $Revision: 1.23 $ --> <!-- Purpose: international --> <!-- Membership: bundled --> @@ -12,12 +12,14 @@ &reftitle.intro; <para> While there are many languages in which every necessary character can - be represented by a one-to-one mapping to a 8-bit value, there are also + be represented by a one-to-one mapping to an 8-bit value, there are also several languages which require so many characters for written - communication that cannot be contained within the range a mere byte can - code. Multibyte character encoding schemes were developed to express - that many (more than 256) characters in the regular bytewise coding - system. + communication that they cannot be contained within the range a mere byte + can code (A byte is made up of eight bits. Each bit can contain only two + distinct values, one or zero. Because of this, a byte can only represent + 256 unique values (two to the power of eight)). Multibyte character + encoding schemes were developed to express more than 256 characters + in the regular bytewise coding system. </para> <para> When you manipulate (trim, split, splice, etc.) strings encoded in a @@ -29,17 +31,12 @@ most likely loses its original meaning. </para> <para> - <literal>mbstring</literal> provides these multibyte specific - string functions that help you deal with multibyte encodings in PHP, - which is basically supposed to be used with single byte encodings. - In addition to that, <literal>mbstring</literal> handles character - encoding conversion between the possible encoding pairs. - </para> - <para> - <literal>mbstring</literal> is also designed to handle Unicode-based - encodings such as UTF-8 and UCS-2 and many single-byte encodings - for convenience (listed below), whereas <literal>mbstring</literal> was - originally developed for use in Japanese web pages. + <literal>mbstring</literal> provides multibyte specific string functions + that help you deal with multibyte encodings in PHP. In addition to that, + <literal>mbstring</literal> handles character encoding conversion between + the possible encoding pairs. <literal>mbstring</literal> is designed to + handle Unicode-based encodings such as UTF-8 and UCS-2 and many + single-byte encodings for convenience (listed below). </para> <section id="mbstring.php4.req"> @@ -115,14 +112,14 @@ </note> <note> <para> - If you have some database connected with PHP, it is recommended that - you use the same character encoding for both database and the + If you are connecting to a database with PHP, it is recommended that + you use the same character encoding for both the database and the <literal>internal encoding</literal> for ease of use and better performance. </para> <para> If you are using PostgreSQL, the character encoding used in the - database and the one used in the PHP may differ as it supports + database and the one used in PHP may differ as it supports automatic character set conversion between the backend and the frontend. </para> </note> @@ -175,7 +172,7 @@ </simpara> <para> There is no way to control HTTP input character - conversion from PHP script. To disable HTTP input character + conversion from a PHP script. To disable HTTP input character conversion, it has to be done in &php.ini;. <example> <title> @@ -207,14 +204,14 @@ There are several ways to enable output character encoding conversion. One is using &php.ini;, another is using <function>ob_start</function> with - <function>mb_output_handler</function> as + <function>mb_output_handler</function> as the <literal>ob_start</literal> callback function. </para> <note> <para> PHP3-i18n users should note that <literal>mbstring</literal>'s output conversion differs from PHP3-i18n. Character encoding is - converted using output buffer. + converted using an output buffer. </para> </note> </listitem> @@ -268,7 +265,7 @@ <literal>mbstring</literal> functions. </simpara> <para> - The following character encoding is supported in this PHP + The following character encodings are supported in this PHP extension: </para> <itemizedlist> @@ -330,11 +327,11 @@ <listitem><simpara>KOI8-R</simpara></listitem> </itemizedlist> <para> - &php.ini; entry, which accepts encoding name, - accepts "<literal>auto</literal>" and - "<literal>pass</literal>" also. - <literal>mbstring</literal> functions, which accepts encoding - name, and accepts "<literal>auto</literal>". + Any &php.ini; entry which accepts an encoding name + can also use the values "<literal>auto</literal>" and + "<literal>pass</literal>". + <literal>mbstring</literal> functions which accept an encoding + name can also use the value "<literal>auto</literal>". </para> <para> If "<literal>pass</literal>" is set, no character @@ -358,13 +355,13 @@ </title> <para> You might often find it difficult to get an existing PHP application - work in a given multibyte environment. That's mostly because lots of - PHP applications out there are written with the standard - string functions such as <function>substr</function>, which are - known to not properly handle multibyte-encoded strings. + to work in a given multibyte environment. This happens because most + PHP applications out there are written with the standard string + functions such as <function>substr</function>, which are known to + not properly handle multibyte-encoded strings. </para> <para> - mbstring supports 'function overloading' feature which enables + mbstring supports a 'function overloading' feature which enables you to add multibyte awareness to such an application without code modification by overloading multibyte counterparts on the standard string functions. For example, @@ -374,13 +371,13 @@ single-byte encodings to a multibyte environment in many cases. </para> <para> - To use the function overloading, set + To use function overloading, set <literal>mbstring.func_overload</literal> in &php.ini; to a positive value that represents a combination of bitmasks specifying the categories of functions to be overloaded. It should be set to 1 to overload the <function>mail</function> function. 2 for string functions, 4 for regular expression functions. For example, - if is set for 7, mail, strings and regular expression functions should + if it is set to 7, mail, strings and regular expression functions will be overloaded. The list of overloaded functions are shown below. <table> <title>Functions to be overloaded</title> @@ -475,18 +472,13 @@ <section id="mbstring.ja-basic"> <title>Basics of Japanese multi-byte encodings</title> <para> - It is often said quite hard to figure out how Japanese texts are - handled in the computer. This is not only because Japanese characters - can only be represented by multibyte encodings, but because different - encoding standards are adopted for different purposes / platforms. - Moreover, not a few character set standards are used there, which - are slightly different from one another. Those facts have often led - developers to inevitable mess-up. - </para> - <para> - To create a working web application that would be put in the Japanese - environment, it is important to use the proper character encoding and - character set for the task in hand. + Japanese characters can only be represented by multibyte encodings, + and multiple encoding standards are used depending on platform and + text purpose. To make matters worse, these encoding standards + differ slightly from one another. In order to create a web + application which would be usable in a Japanese environment, a + developer has to keep these complexities in mind to ensure that the + proper character encodings are used. </para> <para> <itemizedlist> @@ -495,18 +487,19 @@ </listitem> <listitem> <simpara> - Most of multibyte characters often appear twice as wide as - a single-byte character on display. Those characters are called - "zen-kaku" in Japanese which means "full width", and the other - (narrower) characters are called "han-kaku" - means half width. - However the graphical properties of the characters depend on - the glyphs of the type faces used to display them or print them out. + Most Japanese multibyte characters appear twice as wide as + single-byte characters. These characters are called " + zen-kaku" in Japanese, which means "full width". + Other, narrower, characters are called "han-kaku", + which means "half width". The graphical properties + of the characters, however, depends upon the type faces used + to display them. </simpara> </listitem> <listitem> <simpara> Some character encodings use shift(escape) sequences defined - in ISO2022 to switch the code map of the specific code area + in ISO-2022 to switch the code map of the specific code area (<literal>00h</literal> to <literal>7fh</literal>). </simpara> </listitem> @@ -533,10 +526,10 @@ <section id="mbstring.ref"> <title>References</title> <para> - Multibyte character encoding schemes and the related issues are very - complicated. There should be too few space to cover in sufficient details. - Please refer to the following URLs and other resources for - further readings. + Multibyte character encoding schemes and their related issues are + fairly complicated, and are beyond the scope of this documentation. + Please refer to the following URLs and other resources for further + information regarding these topics. <itemizedlist> <listitem> <para>