mbstring reference.xml

Derek Ford Mon, 03 Apr 2006 14:40:21 -0700

derek           Mon Apr  3 21:39:59 2006 UTC


  Modified files:              
    /phpdoc/en/reference/mbstring       reference.xml 
  Log:
  Added a few grammatical fixes and provided a more in-depth explanation of why 
we need mbstring because of the limitations of a byte.

http://cvs.php.net/viewcvs.cgi/phpdoc/en/reference/mbstring/reference.xml?r1=1.22&r2=1.23&diff_format=u
Index: phpdoc/en/reference/mbstring/reference.xml
diff -u phpdoc/en/reference/mbstring/reference.xml:1.22 
phpdoc/en/reference/mbstring/reference.xml:1.23
--- phpdoc/en/reference/mbstring/reference.xml:1.22     Sun Sep  4 19:39:18 2005
+++ phpdoc/en/reference/mbstring/reference.xml  Mon Apr  3 21:39:59 2006
@@ -1,5 +1,5 @@
 <?xml version="1.0" encoding="iso-8859-1"?>
-<!-- $Revision: 1.22 $ -->
+<!-- $Revision: 1.23 $ -->
 <!-- Purpose: international -->
 <!-- Membership: bundled -->
 
@@ -12,12 +12,14 @@
     &reftitle.intro;
     <para>
      While there are many languages in which every necessary character can
-     be represented by a one-to-one mapping to a 8-bit value, there are also
+     be represented by a one-to-one mapping to an 8-bit value, there are also
      several languages which require so many characters for written
-     communication that cannot be contained within the range a mere byte can
-     code. Multibyte character encoding schemes were developed to express
-     that many (more than 256) characters in the regular bytewise coding
-     system.
+     communication that they cannot be contained within the range a mere byte 
+     can code (A byte is made up of eight bits. Each bit can contain only two 
+     distinct values, one or zero. Because of this, a byte can only represent 
+     256 unique values (two to the power of eight)). Multibyte character 
+     encoding schemes were developed to express more than 256 characters 
+     in the regular bytewise coding system.
     </para>
     <para>
      When you manipulate (trim, split, splice, etc.) strings encoded in a
@@ -29,17 +31,12 @@
      most likely loses its original meaning.
     </para>
     <para>
-     <literal>mbstring</literal> provides these multibyte specific
-     string functions that help you deal with multibyte encodings in PHP,
-     which is basically supposed to be used with single byte encodings.
-     In addition to that, <literal>mbstring</literal> handles character
-     encoding conversion between the possible encoding pairs.
-    </para>
-    <para>
-     <literal>mbstring</literal> is also designed to handle Unicode-based
-     encodings such as UTF-8 and UCS-2 and many single-byte encodings
-     for convenience (listed below), whereas <literal>mbstring</literal> was
-     originally developed for use in Japanese web pages.
+     <literal>mbstring</literal> provides multibyte specific string functions 
+     that help you deal with multibyte encodings in PHP. In addition to that, 
+     <literal>mbstring</literal> handles character encoding conversion between 
+     the possible encoding pairs. <literal>mbstring</literal> is designed to 
+     handle Unicode-based encodings such as UTF-8 and UCS-2 and many 
+     single-byte encodings for convenience (listed below).
     </para>
 
     <section id="mbstring.php4.req">
@@ -115,14 +112,14 @@
      </note>
      <note>
       <para>
-       If you have some database connected with PHP, it is recommended that
-       you use the same character encoding for both database and the
+       If you are connecting to a database with PHP, it is recommended that
+       you use the same character encoding for both the database and the
        <literal>internal encoding</literal> for ease of use and better
        performance.
       </para>
       <para>
        If you are using PostgreSQL, the character encoding used in the
-       database and the one used in the PHP may differ as it supports
+       database and the one used in PHP may differ as it supports
        automatic character set conversion between the backend and the frontend.
       </para>
      </note>
@@ -175,7 +172,7 @@
         </simpara>
         <para> 
          There is no way to control HTTP input character
-         conversion from PHP script. To disable HTTP input character
+         conversion from a PHP script. To disable HTTP input character
          conversion, it has to be done in &php.ini;.
          <example>
           <title>
@@ -207,14 +204,14 @@
          There are several ways to enable output character encoding
          conversion. One is using &php.ini;, another
          is using <function>ob_start</function> with
-         <function>mb_output_handler</function> as
+         <function>mb_output_handler</function> as the 
          <literal>ob_start</literal> callback function.
         </para>
         <note>
          <para>
           PHP3-i18n users should note that <literal>mbstring</literal>'s output
           conversion differs from PHP3-i18n. Character encoding is
-          converted using output buffer.
+          converted using an output buffer.
          </para>
         </note>
        </listitem>
@@ -268,7 +265,7 @@
      <literal>mbstring</literal> functions.
     </simpara>
     <para>
-     The following character encoding is supported in this PHP
+     The following character encodings are supported in this PHP
      extension: 
     </para>
     <itemizedlist>
@@ -330,11 +327,11 @@
      <listitem><simpara>KOI8-R</simpara></listitem>
     </itemizedlist>
     <para>
-     &php.ini; entry, which accepts encoding name,
-     accepts &quot;<literal>auto</literal>&quot; and
-     &quot;<literal>pass</literal>&quot; also.
-     <literal>mbstring</literal> functions, which accepts encoding
-     name, and accepts &quot;<literal>auto</literal>&quot;.
+     Any &php.ini; entry which accepts an encoding name
+     can also use the values &quot;<literal>auto</literal>&quot; and
+     &quot;<literal>pass</literal>&quot;.
+     <literal>mbstring</literal> functions which accept an encoding
+     name can also use the value &quot;<literal>auto</literal>&quot;.
     </para>
     <para>
      If &quot;<literal>pass</literal>&quot; is set, no character
@@ -358,13 +355,13 @@
     </title>
     <para>
      You might often find it difficult to get an existing PHP application
-     work in a given multibyte environment. That's mostly because lots of
-     PHP applications out there are written with the standard
-     string functions such as <function>substr</function>, which are
-     known to not properly handle multibyte-encoded strings.
+     to work in a given multibyte environment. This happens because most 
+     PHP applications out there are written with the standard string 
+     functions such as <function>substr</function>, which are known to 
+     not properly handle multibyte-encoded strings.
     </para>
     <para>
-     mbstring supports 'function overloading' feature which enables
+     mbstring supports a 'function overloading' feature which enables
      you to add multibyte awareness to such an application without
      code modification by overloading multibyte counterparts on
      the standard string functions. For example,
@@ -374,13 +371,13 @@
      single-byte encodings to a multibyte environment in many cases.
     </para>
     <para>
-     To use the function overloading, set
+     To use function overloading, set
      <literal>mbstring.func_overload</literal> in &php.ini; to a
      positive value that represents a combination of bitmasks specifying
      the categories of functions to be overloaded. It should be set
      to 1 to overload the <function>mail</function> function. 2 for string
      functions, 4 for regular expression functions. For example,
-     if is set for 7, mail, strings and regular expression functions should
+     if it is set to 7, mail, strings and regular expression functions will
      be overloaded. The list of overloaded functions are shown below.
      <table>
       <title>Functions to be overloaded</title>
@@ -475,18 +472,13 @@
    <section id="mbstring.ja-basic">
     <title>Basics of Japanese multi-byte encodings</title>
     <para>
-     It is often said quite hard to figure out how Japanese texts are
-     handled in the computer. This is not only because Japanese characters
-     can only be represented by multibyte encodings, but because different
-     encoding standards are adopted for different purposes / platforms.
-     Moreover, not a few character set standards are used there, which
-     are slightly different from one another. Those facts have often led
-     developers to inevitable mess-up.
-    </para>
-    <para> 
-     To create a working web application that would be put in the Japanese
-     environment, it is important to use the proper character encoding and
-     character set for the task in hand.
+     Japanese characters can only be represented by multibyte encodings, 
+     and multiple encoding standards are used depending on platform and 
+     text purpose. To make matters worse, these encoding standards 
+     differ slightly from one another. In order to create a web 
+     application which would be usable in a Japanese environment, a 
+     developer has to keep these complexities in mind to ensure that the
+     proper character encodings are used.
     </para>
     <para>
      <itemizedlist>
@@ -495,18 +487,19 @@
       </listitem>
       <listitem>
        <simpara>
-        Most of multibyte characters often appear twice as wide as 
-        a single-byte character on display. Those characters are called
-        "zen-kaku" in Japanese which means "full width", and the other
-        (narrower) characters are called "han-kaku" - means half width.
-        However the graphical properties of the characters depend on
-        the glyphs of the type faces used to display them or print them out.
+        Most Japanese multibyte characters appear twice as wide as
+        single-byte characters. These characters are called &quot;
+        zen-kaku&quot; in Japanese, which means &quot;full width&quot;.
+        Other, narrower, characters are called &quot;han-kaku&quot;,
+        which means &quot;half width&quot;. The graphical properties
+        of the characters, however, depends upon the type faces used
+        to display them.
        </simpara>
       </listitem>
       <listitem>
        <simpara>
         Some character encodings use shift(escape) sequences defined
-        in ISO2022 to switch the code map of the specific code area
+        in ISO-2022 to switch the code map of the specific code area
         (<literal>00h</literal> to <literal>7fh</literal>).
        </simpara>
       </listitem>
@@ -533,10 +526,10 @@
    <section id="mbstring.ref">
     <title>References</title>
     <para>
-     Multibyte character encoding schemes and the related issues are very
-     complicated. There should be too few space to cover in sufficient details.
-     Please refer to the following URLs and other resources for
-     further readings.
+     Multibyte character encoding schemes and their related issues are
+     fairly complicated, and are beyond the scope of this documentation.
+     Please refer to the following URLs and other resources for further
+     information regarding these topics.
      <itemizedlist>
       <listitem>
        <para>

[PHP-DOC] cvs: phpdoc /en/reference/mbstring reference.xml

Reply via email to