hirokawa Thu Jun 28 23:20:29 2001 EDT
Modified files:
/phpdoc/en/functions mbstring.xml
Log:
fixed some typos.
Index: phpdoc/en/functions/mbstring.xml
diff -u phpdoc/en/functions/mbstring.xml:1.2 phpdoc/en/functions/mbstring.xml:1.3
--- phpdoc/en/functions/mbstring.xml:1.2 Sun Jun 24 11:27:21 2001
+++ phpdoc/en/functions/mbstring.xml Thu Jun 28 23:20:28 2001
@@ -1,117 +1,305 @@
<reference id="ref.mbstring">
<title>Multi-Byte String Functions</title>
- <titleabbrev>Multi-Byte String</titleabbrev>
+ <titleabbrev>
+ Multi-Byte String
+ </titleabbrev>
<partintro>
&warn.experimental;
<sect1 id="mb-intro">
<title>Introduction</title>
<warning>
<simpara>
- This module is EXPERIMENTAL. Function name/API is subject to be
- changed. Current conversion filter supports Japanese only.
+ This module is EXPERIMENTAL. Function name/API is subject to
+ change. Current conversion filter supports Japanese only.
</simpara>
</warning>
<para>
- There are many languages that all characters cannot be expressed
+ There are many languages in which all characters can be expressed
by single byte. Multi-byte character codes are used to express
many characters for many languages. <literal>mbstring</literal>
is developed to handle Japanese characters. However, many
<literal>mbstring</literal> functions are able to handle
- character codes other than Japanese.
+ character encoding other than Japanese.
</para>
<para>
- Multi-byte character encoding represents single character with
+ A multi-byte character encoding represents single character with
consecutive bytes. Some character encoding has shift(escape)
- sequences to start/end multi-byte character string. Therefore,
+ sequences to start/end multi-byte character strings. Therefore, a
multi-byte character string may be destroyed when it is divided
- and/or counted, unless multi-byte character encoding safe method
- is used. <literal>mbstring</literal> functions support multi-byte
- character safe string functions and other utility functions such
- as conversion functions.
+ and/or counted unless multi-byte character encoding safe method
+ is used. This module provides multi-byte character safe string
+ functions and other utility functions such as conversion
+ functions.
</para>
+ <para>
+ Since PHP is basically designed for ISO-8859-1, some multi-byte
+ character encoding does not work well with PHP. Therefore, it is
+ important to set <literal>mbstring.internal_encoding</literal> to
+ a character encoding that works with PHP.
+ </para>
+ <para>
+ PHP4 Character Encoding Requirements
+ </para>
+ <para>
+ <itemizedlist>
+ <listitem>
+ <simpara>
+ Per byte encoding
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Single byte characters in range of <literal>00h-7fh</literal>
+ which is compatible with <literal>ASCII</literal>
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Multi-byte characters without <literal>00h-7fh</literal>
+ </simpara>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ These are examples of internal character encoding that works with
+ PHP and does NOT work with PHP.
+ <informalexample>
+ <programlisting>
- <sect2 id="mb-ja-basic">
- <title>Basics for Japanese multi-byte character</title>
+Character encodings work with PHP:
+ISO-8859-*, EUC-JP, UTF-8
+
+
+Character encodings do NOT work with PHP:
+JIS, SJIS
+ </programlisting>
+ </informalexample>
+ </para>
+ <para>
+ Character encoding, that does not work with PHP, may be converted
+ with <literal>mbstring</literal>'s HTTP input/output conversion
+ feature/function.
+ </para>
+ <note>
+ <para>
+ SJIS should not be used for internal encoding unless the reader
+ is familiar with parser/compiler, character encoding and
+ character encoding issues.
+ </para>
+ </note>
+ <note>
<para>
- Most Japanese characters need more than 1 byte for a
- character. In addition to this, several character encodings are
- used under Japanese environment. There are EUC-JP, Shift_JIS and
- ISO-2022-JP character encoding. As Unicode is getting popular,
- UTF-8 is used also. To develop Web application for Japanese
- environment, it is important to use these character codes depend
- on its purpose, HTTP input/output, RDBMS and E-mail.
+ If you use database with PHP, it is recommended that you use the
+ same character encoding for both database and <literal>internal
+ encoding</literal> for ease of use and better performance.
+ </para>
+ <para>
+ If you are using PostgreSQL, it supports character
+ encoding that is different from backend character encoding. See
+ the PostgreSQL manual for details.
</para>
+ </note>
+
+ <sect2 id="mb-enable">
+ <title>How to Enable mbstring</title>
<para>
+ <literal>mbstring</literal> is an extended module. You must
+ enable module with <literal>configure</literal> script. Refer
+ to the <link linkend="installation">Install</link> section for
+ details.
+ </para>
+ <simpara>
+ The following configure options are related to
+ <literal>mbstring</literal> module.
+ </simpara>
+ <para>
<itemizedlist>
- <listitem>
- <simpara>
- Storage for a character can be upto four bytes
- </simpara>
- </listitem>
<listitem>
- <simpara>
- A multi-byte character usually has twice of width compare to
- single byte characters. Wider character is called "zen-kaku"
- - meaning full width, narrower character called "han-kaku" -
- meaning half width. "zen-kaku" characters are fixed width
- usually.
- </simpara>
+ <para>
+ <option role="configure">--enable-mbstring</option> : Enable
+ <literal>mbstring</literal> functions. This option is
+ required to use <literal>mbstring</literal> functions.
+ </para>
</listitem>
<listitem>
- <simpara>
- Some character encoding defines shift sequence for
- entering/exiting multi-byte character strings.
- </simpara>
+ <para>
+ <option role="configure">--enable-mbstr-enc-trans</option> :
+ Enable HTTP input character encoding conversion using
+ <literal>mbstring</literal> conversion engine. If this
+ feature is enabled, HTTP input character encoding may be
+ converted to <literal>mbstring.internal_encoding</literal>
+ automatically.
+ </para>
</listitem>
+ </itemizedlist>
+ </para>
+ </sect2>
+
+ <sect2 id="mb-conv">
+ <title>HTTP Input and Output</title>
+ <para>
+ HTTP input/output character encoding conversion may convert
+ binary data also. Users are supposed to control character
+ encoding conversion if binary data is used for HTTP
+ input/output.
+ </para>
+ <para>
+ If <literal>enctype</literal> for HTML form is set to
+ <literal>multipart/form-data</literal>,
+ <literal>mbstring</literal> does not convert character encoding
+ in POST data. If it is the case, strings are needed to be
+ converted to internal character encoding.
+ </para>
+ <para>
+ <itemizedlist>
<listitem>
<simpara>
- Database may allocate storage for characters that differs
- from size used in PHP even if the same character encoding is
- used. (For example, PostgreSQL)
+ HTTP Input
</simpara>
+ <para> There is no way to control HTTP input character
+ conversion from PHP script. To disable HTTP input character
+ conversion, it has to be done in <literal>php.ini</literal>.
+ <example>
+ <title>
+ Disable HTTP input conversion in php.ini
+ </title>
+ <programlisting role="php">
+
+;; Disable HTTP Input conversion
+mbstring.http_input = pass
+ </programlisting>
+ </example>
+ </para>
+ <para>
+ When using PHP as an Apache module, it is possible to
+ override PHP ini setting per Virtual Host in
+ <literal>httpd.conf</literal> or per directory with
+ <literal>.htaccess</literal>. Refer to the <link
+ linkend="configuration">Configuration</link> section and
+ Apache Manual for details.
+ </para>
</listitem>
<listitem>
<simpara>
- E-mail is supposed to use ISO-2022-JP.
+ HTTP Output
</simpara>
- </listitem>
- <listitem>
<para>
- "i-mode" web site is supposed to use Shift_JIS.
+ There are several ways to enable output character encoding
+ conversion. One is using <literal>php.ini</literal>, another
+ is using <function>ob_start</function> with
+ <function>mb_output_handler</function> as
+ <literal>ob_start</literal> callback function.
</para>
+ <note>
+ <para>
+ For PHP3-i18n users, <literal>mbstring</literal>'s output
+ conversion differs from PHP3-i18n. Character encoding is
+ converted using output buffer.
+ </para>
+ </note>
</listitem>
</itemizedlist>
</para>
+ <para>
+ <example>
+ <title><literal>php.ini</literal> setting example</title>
+ <programlisting role="php">
+
+;; Enable output character encoding conversion for all PHP pages
+
+;; Enable Output Buffering
+output_buffering = On
+
+;; Set mb_output_handler to enable output conversion
+output_handler = mb_output_handler
+ </programlisting>
+ </example>
+ </para>
+ <para>
+ <example>
+ <title>Script example</title>
+ <programlisting role="php">
+
+<?php
+
+// Enable output character encoding conversion only for this page
+
+// Set HTTP output character encoding to SJIS
+mb_http_output('SJIS');
+
+// Start buffering and specify "mb_output_handler" as
+// callback function
+ob_start('mb_output_handler');
+
+?>
+ </programlisting>
+ </example>
+ </para>
</sect2>
<sect2 id="mb-code">
- <title>Supported character encodings</title>
+ <title>Supported Character Encoding</title>
+ <simpara>
+ Currently, the following character encoding is supported by
+ <literal>mbstring</literal> module. Caracter encoding may
+ be specified for <literal>mbstring</literal> functions'
+ <literal>encoding</literal> parameter. </simpara>
+ <para>
+ The following character encoding is supported in this PHP
+ extension :
+ </para>
<para>
- Following character encodings are supported in this PHP
- extension : <literal>UCS-4</literal>,
- <literal>UCS-4BE</literal>, <literal>UCS-4LE</literal>,
- <literal>UCS-2</literal>, <literal>UCS-2BE</literal>,
- <literal>UCS-2LE</literal>, <literal>UTF-32</literal>,
- <literal>UTF-32BE</literal>, <literal>UTF-32LE</literal>,
- <literal>UCS-2LE</literal>, <literal>UTF-16</literal>,
- <literal>UTF-16BE</literal>, <literal>UTF-16LE</literal>,
- <literal>UTF-8</literal>, <literal>UTF-7</literal>,
- <literal>ASCII</literal>, <literal>EUC-JP</literal>,
- <literal>SJIS</literal>, <literal>eucJP-win</literal>,
- <literal>SJIS-win</literal>,
- <literal>ISO-2022-JP</literal>(<literal>JIS</literal>),
+ <literal>UCS-4</literal>, <literal>UCS-4BE</literal>,
+ <literal>UCS-4LE</literal>, <literal>UCS-2</literal>,
+ <literal>UCS-2BE</literal>, <literal>UCS-2LE</literal>,
+ <literal>UTF-32</literal>, <literal>UTF-32BE</literal>,
+ <literal>UTF-32LE</literal>, <literal>UCS-2LE</literal>,
+ <literal>UTF-16</literal>, <literal>UTF-16BE</literal>,
+ <literal>UTF-16LE</literal>, <literal>UTF-8</literal>,
+ <literal>UTF-7</literal>, <literal>ASCII</literal>,
+ <literal>EUC-JP</literal>, <literal>SJIS</literal>,
+ <literal>eucJP-win</literal>, <literal>SJIS-win</literal>,
+ <literal>ISO-2022-JP</literal>, <literal>JIS</literal>,
<literal>ISO-8859-1</literal>, <literal>ISO-8859-2</literal>,
<literal>ISO-8859-3</literal>, <literal>ISO-8859-4</literal>,
<literal>ISO-8859-5</literal>, <literal>ISO-8859-6</literal>,
<literal>ISO-8859-7</literal>, <literal>ISO-8859-8</literal>,
<literal>ISO-8859-9</literal>, <literal>ISO-8859-10</literal>,
<literal>ISO-8859-13</literal>, <literal>ISO-8859-14</literal>,
- <literal>ISO-8859-15</literal>.
+ <literal>ISO-8859-15</literal>, <literal>byte2be</literal>,
+ <literal>byte2le</literal>, <literal>byte4be</literal>,
+ <literal>byte4le</literal>, <literal>BASE64</literal>,
+ <literal>7bit</literal>, <literal>8bit</literal> and
+ <literal>UTF7-IMAP</literal>.
+ </para>
+ <para>
+ <literal>php.ini</literal> entry, which accepts encoding name,
+ accepts "<literal>auto</literal>" and
+ "<literal>pass</literal>" also.
+ <literal>mbstring</literal> functions, which accepts encoding
+ name, and accepts "<literal>auto</literal>".
+ </para>
+ <para>
+ If "<literal>pass</literal>" is set, no character
+ encoding conversion is performed.
+ </para>
+ <para>
+ If "<literal>auto</literal>" is set, it is expanded to
+ "<literal>ASCII,JIS,UTF-8,EUC-JP,SJIS</literal>".
+ </para>
+ <para>
+ See also <function>mb_detect_order</function>
</para>
+ <note>
+ <para>
+ "Supported character encoding" does not mean that it
+ works as internal character code.
+ </para>
+ </note>
</sect2>
<sect2 id="mb-ini">
- <title> php.ini settings </title>
+ <title>php.ini settings</title>
<para>
<itemizedlist>
<listitem>
@@ -122,63 +310,311 @@
</listitem>
<listitem>
<simpara>
- <literal>mbstring.http_input</literal> defines default HTTP input
- character encoding.
+ <literal>mbstring.http_input</literal> defines default HTTP
+ input character encoding.
</simpara>
</listitem>
<listitem>
<simpara>
- <literal>mbstring.http_output</literal> defines default HTTP output
- character encoding.
+ <literal>mbstring.http_output</literal> defines default HTTP
+ output character encoding.
</simpara>
</listitem>
<listitem>
<simpara>
- <literal>mbstring.detect_order</literal> defines default character
- encoding detection order.
+ <literal>mbstring.detect_order</literal> defines default
+ character code detection order. See also
+ <function>mb_detect_order</function>.
</simpara>
</listitem>
<listitem>
<simpara>
- <literal>mbstring.substitute_character</literal> defines character
- to substitute for invalid character codes.
+ <literal>mbstring.substitute_character</literal> defines
+ character to substitute for invalid character encoding.
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
+ Web Browsers are supposed to use the same character encoding
+ when submitting form. However, browsers may not use the same
+ character encoding. See <function>mb_http_input</function> to
+ detect character encoding used by browsers.
+ </para>
+ <para>
+ If <literal>enctype</literal> is set to
+ <literal>multipart/form-data</literal> in HTML forms,
+ <literal>mbstring</literal> does not convert character encoding
+ in POST data. The user must convert them in the script, if
+ conversion is needed.
+ </para>
+ <para>
+ Although, browsers are smart enough to detect character encoding
+ in HTML. <literal>charset</literal> is better to be set in HTTP
+ header. Change <literal>default_charset</literal> according to
+ character encoding.
+ </para>
+ <para>
<example>
<title><literal>php.ini</literal> setting example</title>
- <programlisting role="php.ini">
+ <programlisting role="php">
+
;; Set default internal encoding
+;; Note: Make sure to use character encoding works with PHP
mbstring.internal_encoding = UTF-8 ; Set internal encoding to UTF-8
-;; Set default HTTP input character code
-mbstring.http_input = auto ; Set HTTP input to auto
-; or
-; mbstring.http_input = SJIS ; Set HTTP input to SJIS
-; mbstring.http_input = eucjp-win, sjis-win, UTF-8 ; Specify order
-
-;; Set default HTTP output character code
-mbstring.http_output = UTF-8 ; Set HTTP output encoding to UTF-8
-
-;; Set default character code detection order
-mbstring.detect_order = auto ; Set HTTP output to auto
-; or
-; mbstring.detect_order = eucjp-win, sjis-win, UTF-8 ; Specify order
+;; Set default HTTP input character encoding
+;; Note: Script cannot change http_input setting.
+mbstring.http_input = pass ; No conversion.
+mbstring.http_input = auto ; Set HTTP input to auto
+ ; "auto" is expanded to
+"ASCII,JIS,UTF-8,EUC-JP,SJIS"
+mbstring.http_input = SJIS ; Set HTTP2 input to SJIS
+mbstring.http_input = UTF-8,SJIS,EUC-JP ; Specify order
+
+;; Set default HTTP output character encoding
+mbstring.http_output = pass ; No conversion
+mbstring.http_output = UTF-8 ; Set HTTP output encoding to UTF-8
+
+;; Set default character encoding detection order
+mbstring.detect_order = auto ; Set detect order to auto
+mbstring.detect_order = ASCII,JIS,UTF-8,SJIS,EUC-JP ; Specify order
;; Set default substitute character
-mbstring.substitute_character = 12307 ; Specify character code
-; or
-; mbstring.substitute_character = none ; Null character
-; mbstring.substitute_character = long ; Long
+mbstring.substitute_character = 12307 ; Specify Unicode value
+mbstring.substitute_character = none ; Do not print character
+mbstring.substitute_character = long ; Long Example: U+3000,JIS+7E7E
</programlisting>
</example>
</para>
+ <para>
+ <example>
+ <title><literal>php.ini</literal> setting for <literal>EUC-JP</literal>
+users</title>
+ <programlisting role="php">
+
+;; Disable Output Buffering
+output_buffering = Off
+
+;; Set HTTP header charset
+default_charset = EUC-JP
+
+;; Set HTTP input encoding conversion to auto
+mbstring.http_input = auto
+
+;; Convert HTTP output to EUC-JP
+mbstring.http_output = EUC-JP
+
+;; Set internal encoding to EUC-JP
+mbstring.internal_encoding = EUC-JP
+
+;; Do not print invalid characters
+mbstring.substitute_character = none
+ </programlisting>
+ </example>
+ </para>
+ <para>
+ <example>
+ <title><literal>php.ini</literal> setting for <literal>SJIS</literal>
+users</title>
+ <programlisting role="php">
+
+;; Enable Output Buffering
+output_buffering = On
+
+;; Set mb_output_handler to enable output conversion
+output_handler = mb_output_handler
+
+;; Set HTTP header charset
+default_charset = Shift_JIS
+
+;; Set http input encoding conversion to auto
+mbstring.http_input = auto
+
+;; Convert to SJIS
+mbstring.http_output = SJIS
+
+;; Set internal encoding to EUC-JP
+mbstring.internal_encoding = EUC-JP
+
+;; Do not print invalid characters
+mbstring.substitute_character = none
+ </programlisting>
+ </example>
+ </para>
</sect2>
+
+ <sect2 id="mb-ja-basic">
+ <title>Basics for Japanese multi-byte character</title>
+ <para>
+ Most Japanese characters need more than 1 byte per character. In
+ addition, several character encoding schemas are used under a
+ Japanese environment. There are EUC-JP, Shift_JIS(SJIS) and
+ ISO-2022-JP(JIS) character encoding. As Unicode becomes popular,
+ UTF-8 is used also. To develop Web applications for a Japanese
+ environment, it is important to use the character set for the
+ task in hand, whether HTTP input/output, RDBMS and E-mail.
+ </para>
+ <para>
+ <itemizedlist>
+ <listitem>
+ <simpara>Storage for a character can be up to four
+ bytes</simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ A multi-byte character is usually twice of the width compared
+ to single-byte characters. Wider characters are called
+ "zen-kaku" - meaning full width, narrower characters are
+ called "han-kaku" - meaning half width. "zen-kaku" characters
+ are usually fixed width.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Some character encoding defines shift(escape) sequence for
+ entering/exiting multi-byte character strings.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ ISO-2022-JP must be used for SMTP/NNTP.
+ </simpara>
+ </listitem>
+ <listitem>
+ <para>
+ "i-mode" web site is supposed to use SJIS.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </sect2>
+
+ <sect2 id="mb-ref">
+ <title>References</title>
+ <para>
+ Multi-byte character encoding and its related issues are very
+ complex. It is impossible to cover in sufficient detail
+ here. Please refer to the following URLs and other resources for
+ further readings.
+ <itemizedlist>
+ <listitem>
+ <para>
+ Unicode/UTF/UCS/etc
+ </para>
+ <para>
+ <literal>http://www.unicode.org/</literal>
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Japanese/Korean/Chinese character
+ information
+ </para>
+ <para>
+ <literal>
+ ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf
+ </literal>
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </sect2>
+
</sect1>
</partintro>
+ <refentry id="function.mb-language">
+ <refnamediv>
+ <refname>mb_language</refname>
+ <refpurpose>
+ Set/Get current language
+ </refpurpose>
+ </refnamediv>
+ <refsect1>
+ <title>Description</title>
+ <funcsynopsis>
+ <funcprototype>
+ <funcdef>string
+ <function>mb_language</function></funcdef>
+ <paramdef>string
+ <parameter><optional>language</optional></parameter></paramdef>
+ </funcprototype>
+ </funcsynopsis>
+ <para>
+ <function>mb_language</function> sets language. If
+ <parameter>language</parameter> is omitted, it returns current
+ language as string.
+ </para>
+ <para>
+ <parameter>language</parameter> setting is used for encoding
+ e-mail messages. Valid languages are "Japanese",
+ "ja","English","en" and "uni"
+ (UTF-8). <function>mb_send_mail</function> uses this setting to
+ encode e-mail.
+ </para>
+ <para> Language and its setting is ISO-2022-JP/Base64 for
+ Japanese, UTF-8/Base64 for uni, ISO-8859-1/quoted printable for
+ English.
+ </para>
+ <para>
+ Return Value: If <parameter>language</parameter> is set and
+ <parameter>language</parameter> is valid, it returns
+ TRUE. Otherwise, it returns FALSE. When
+ <parameter>language</parameter> is omitted, it returns language
+ name as string. If no language is set previously, it returns
+ FALSE.
+ </para>
+ <para>
+ See also <function>mb_send_mail</function>.
+ </para>
+ </refsect1>
+ </refentry>
+
+ <refentry id="function.mb-parse-str">
+ <refnamediv>
+ <refname>mb_parse_str</refname>
+ <refpurpose>
+ Parse GET/POST/COOKIE data and set global variable
+ </refpurpose>
+ </refnamediv>
+ <refsect1>
+ <title>Description</title>
+ <funcsynopsis>
+ <funcprototype>
+ <funcdef>string
+ <function>mb_parse_str</function>
+ </funcdef>
+ <paramdef>string
+ <parameter>encoded_string</parameter>
+ </paramdef>
+ <paramdef>array
+ <parameter><optional>result</optional></parameter>
+ </paramdef>
+ </funcprototype>
+ </funcsynopsis>
+ <para>
+ <function>mb_parse_str</function> parses GET/POST/COOKIE data and
+ sets global variables. Since PHP does not provide raw POST/COOKIE
+ data, it can only used for GET data for now. It preses URL
+ encoded data, detects encoding, converts coding to internal
+ encoding and set values to <parameter>result</parameter> array or
+ global variables.
+ </para>
+ <para>
+ <parameter>encoded_string</parameter>: URL encoded data.
+ </para>
+ <para>
+ <parameter>result</parameter>: Array contains decoded and
+ character encoding converted values.
+ </para>
+ <para>
+ Return Value: It returns TRUE for success or FALSE for failure.
+ </para>
+ <para>
+ See also <function>mb_detect_order</function>,
+ <function>mb_internal_encoding</function>.
+ </para>
+ </refsect1>
+ </refentry>
+
<refentry id="function.mb-internal-encoding">
<refnamediv>
<refname>mb_internal_encoding</refname>
@@ -211,7 +647,7 @@
<parameter>encoding</parameter>: Character encoding name
</para>
<para>
- Return Value: If encoding is
+ Return Value: If <parameter>encoding</parameter> is
set,<function>mb_internal_encoding</function> returns
<literal>TRUE</literal> for success, otherwise returns
<literal>FALSE</literal>. If <parameter>encoding</parameter> is
@@ -232,7 +668,7 @@
<para>
See also <function>mb_http_input</function>,
<function>mb_http_output</function>,
- <function>mb_detect_order</function>
+ <function>mb_detect_order</function>.
</para>
</refsect1>
</refentry>
@@ -270,7 +706,7 @@
<para>
See also <function>mb_internal_encoding</function>,
<function>mb_http_output</function>,
- <function>mb_detect_order</function>
+ <function>mb_detect_order</function>.
</para>
</refsect1>
</refentry>
@@ -294,9 +730,10 @@
If <parameter>encoding</parameter> is set,
<function>mb_http_output</function> sets HTTP output character
encoding to <parameter>encoding</parameter>. Output after this
- function is converted to <parameter>encoding</parameter>.
- <function>mb_http_output</function> returns TRUE for success and
- FALSE for failure.
+ function is converted to <parameter>encoding</parameter>.
+ <function>mb_http_output</function> returns
+ <literal>TRUE</literal> for success and <literal>FALSE</literal>
+ for failure.
</para>
<para>
If <parameter>encoding</parameter> is omitted,
@@ -306,7 +743,7 @@
<para>
See also <function>mb_internal_encoding</function>,
<function>mb_http_input</function>,
- <function>mb_detect_order</function>
+ <function>mb_detect_order</function>.
</para>
</refsect1>
</refentry>
@@ -331,11 +768,12 @@
<para>
<function>mb_detect_order</function> sets automatic character
encoding detection order to <parameter>encoding-list</parameter>.
- It returns TRUE for success, FALSE for failure.
+ It returns <literal>TRUE</literal> for success,
+ <literal>FALSE</literal> for failure.
</para>
<para>
<parameter>encoding-list</parameter> is array or comma separated
- list of character encodings. ("auto" is expanded to
+ list of character encoding. ("auto" is expanded to
"ASCII, JIS, UTF-8, EUC-JP, SJIS")
</para>
<para>
@@ -346,6 +784,42 @@
This setting affects <function>mb_detect_encoding</function> and
<function>mb_send_mail</function>.
</para>
+ <note>
+ <para>
+ <literal>mbstring</literal> currently implements following
+ encoding detection filters. If there is a invalid byte sequence
+ for following encoding, encoding detection will fail.
+ </para>
+ <simpara>
+ <literal>UTF-8</literal>, <literal>UTF-7</literal>,
+ <literal>ASCII</literal>,
+ <literal>EUC-JP</literal>,<literal>SJIS</literal>,
+ <literal>eucJP-win</literal>, <literal>SJIS-win</literal>,
+ <literal>JIS</literal>, <literal>ISO-2022-JP</literal>
+ </simpara>
+ <para>
+ For <literal>ISO-8859-*</literal>, <literal>mbstring</literal>
+ always detects as <literal>ISO-8859-*</literal>.
+ </para>
+ <para>
+ For <literal>UTF-16</literal>, <literal>UTF-32</literal>,
+ <literal>UCS2</literal> and <literal>UCS4</literal>, encoding
+ detection will fail always.
+ </para>
+ <para>
+ <example>
+ <title>Useless detect order example</title>
+ <programlisting>
+; Always detect as ISO-8859-1
+detect_order = ISO-8859-1, UTF-8
+
+; Always detect as UTF-8, since ASCII/UTF-7 values are
+; valid for UTF-8
+detect_order = UTF-8, ASCII, UTF-7
+ </programlisting>
+ </example>
+ </para>
+ </note>
<para>
<example>
<title><function>mb_detect_order</function> examples</title>
@@ -368,7 +842,7 @@
See also <function>mb_internal_encoding</function>,
<function>mb_http_input</function>,
<function>mb_http_output</function>
- <function>mb_send_mail</function>
+ <function>mb_send_mail</function>.
</para>
</refsect1>
</refentry>
@@ -393,7 +867,7 @@
substitution character when input character encoding is invalid
or character code is not exist in output character
encoding. Invalid characters may be substituted null(no output),
- string or hex value (Unicode character code value).
+ string or integer value (Unicode character code value).
</para>
<para>
This setting affects <function>mb_detect_encoding</function>
@@ -410,16 +884,17 @@
</listitem>
<listitem>
<simpara>
- "long" : Output hex value (Example: U+3000,JIS+7E7E)
+ "long" : Output character code value (Example:
+ U+3000,JIS+7E7E)
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
Return Value: If <parameter>substchar</parameter> is set, it
- returns TRUE for success, otherwise returns FALSE. If
- <parameter>substchar</parameter> is not set, it returns Unicode
- value or
+ returns <literal>TRUE</literal> for success, otherwise returns
+ <literal>FALSE</literal>. If <parameter>substchar</parameter> is
+ not set, it returns Unicode value or
"<literal>none</literal>"/"<literal>long</literal>".
</para>
<para>
@@ -461,9 +936,29 @@
<function>ob_start</function> callback
function. <function>mb_output_handler</function> converts
characters in output buffer from internal character encoding to
- HTTP output character encoding.
+ HTTP output character encoding.
+ </para>
+ <para>
+ 4.0.7 or later version, this hanlder adds charset HTTP header
+ when following conditions are met:
</para>
<para>
+ <itemizedlist>
+ <listitem>
+ <simpara>Does not set <literal>Content-Type</literal> by
+ header()</simpara>
+ </listitem>
+ <listitem>
+ <simpara>Default MIME type begins with
+ <literal>text/</literal></simpara>
+ </listitem>
+ <listitem>
+ <simpara><literal>http_output</literal> setting is other than
+ pass</simpara>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
<parameter>contents</parameter> : Output buffer contents
</para>
<para>
@@ -483,8 +978,8 @@
</para>
<note>
<para>
- If you want to output some binary data such as image from php
- script, you must set output encoding to "pass" using
+ If you want to output some binary data such as image from PHP
+ script, you must set output encoding to "pass" using
<function>mb_http_output</function>.
</para>
</note>
@@ -520,7 +1015,7 @@
$outputenc = "sjis-win";
mb_http_output($outputenc);
ob_start("mb_output_handler");
-Header("Content-Type: text/html; charset=" . mb_preferred_mime_name($outputenc));
+header("Content-Type: text/html; charset=" . mb_preferred_mime_name($outputenc));
</programlisting>
</example>
</para>
@@ -550,6 +1045,11 @@
counted as 1.
</para>
<para>
+ <parameter>encoding</parameter> is character encoding for
+ <parameter>str</parameter>. If <parameter>encoding</parameter> is
+ omitted, internal character encoding is used.
+ </para>
+ <para>
See also <function>mb_internal_encoding</function>,
<function>strlen</function>.
</para>
@@ -567,7 +1067,7 @@
<title>Description</title>
<funcsynopsis>
<funcprototype>
- <funcdef>string <function>mb_strpos</function></funcdef>
+ <funcdef>int <function>mb_strpos</function></funcdef>
<paramdef>string <parameter>haystack</parameter></paramdef>
<paramdef>string <parameter>needle</parameter></paramdef>
<paramdef>int
@@ -605,7 +1105,7 @@
</para>
<para>
<parameter>encoding</parameter> is character encoding name. If it
- is not specified, internal character encoding is used.
+ is omitted, internal character encoding is used.
</para>
<para>
See also <function>mb_strpos</function>,
@@ -626,7 +1126,7 @@
<title>Description</title>
<funcsynopsis>
<funcprototype>
- <funcdef>string <function>mb_strrpos</function></funcdef>
+ <funcdef>int <function>mb_strrpos</function></funcdef>
<paramdef>string <parameter>haystack</parameter></paramdef>
<paramdef>string <parameter>needle</parameter></paramdef>
<paramdef>string
@@ -649,7 +1149,7 @@
0. Second character position is 1.
</para>
<para>
- If <parameter>encoding</parameter> is not set, internal encoding
+ If <parameter>encoding</parameter> is omitted, internal encoding
is assumed. <function>mb_strrpos</function> accepts
<literal>string</literal> for <parameter>needle</parameter> where
<function>strrpos</function> accepts only character.
@@ -709,7 +1209,7 @@
omitted, internal character encoding is used.
</para>
<para>
- See also <function>mb_struct</function>,
+ See also <function>mb_strcut</function>,
<function>mb_internal_encoding</function>.
</para>
</refsect1>
@@ -822,7 +1322,7 @@
<title>Description</title>
<funcsynopsis>
<funcprototype>
- <funcdef>string <function>mb_strmwidth</function></funcdef>
+ <funcdef>string <function>mb_strimwidth</function></funcdef>
<paramdef>string <parameter>str</parameter></paramdef>
<paramdef>int <parameter>start</parameter></paramdef>
<paramdef>int <parameter>width</parameter></paramdef>
@@ -833,7 +1333,7 @@
</funcprototype>
</funcsynopsis>
<para>
- <function>mb_strmwidth</function> truncates string
+ <function>mb_strimwidth</function> truncates string
<parameter>str</parameter> to specified
<parameter>width</parameter>. It returns truncated string.
</para>
@@ -1164,6 +1664,12 @@
before conversion for success, FALSE for failure.
</para>
<para>
+ <function>mb_convert_variables</function> join strings in Array
+ or Object to detect encoding, since encoding detection tends to
+ fail for short strings. Therefore, it is impossible to mix
+ encoding in single array or object.
+ </para>
+ <para>
It <parameter>from-encoding</parameter> is specified by
array or comma separated string, it tries to detect encoding from
<parameter>from-coding</parameter>. When
@@ -1172,7 +1678,9 @@
</para>
<para>
<parameter>vars (3rd and larger)</parameter> is reference to
- variable to be converted. String, Array and Object are accepted.
+ variable to be converted. String, Array and Object are accepted.
+ <function>mb_convert_variables</function> assumes all parameters
+ have the same encoding.
</para>
<para>
<example>
@@ -1296,7 +1804,8 @@
convert.
</para>
<para>
- <parameter>encoding</parameter> is character encoding.
+ <parameter>encoding</parameter> is character encoding. If it is
+ omitted, internal character encoding is used.
</para>
<para>
<example>
@@ -1323,7 +1832,7 @@
<refnamediv>
<refname>mb_send_mail</refname>
<refpurpose>
- Send mail with ISO-2022-JP character code. (Japanese specific)
+ Send encoded mail.
</refpurpose>
</refnamediv>
<refsect1>
@@ -1344,7 +1853,8 @@
</funcsynopsis>
<para>
<function>mb_send_mail</function> sends email. Headers and
- message are converted and encoded in ISO-2022-JP.
+ message are converted and encoded according to
+ <function>mb_language</function> setting.
<function>mb_send_mail</function> is wrapper
function of <function>mail</function>. See
<function>mail</function> for details.
@@ -1361,21 +1871,23 @@
<parameter>message</parameter> is mail message.
</para>
<para>
- string <parameter>additional_headers</parameter> is inserted at
- the end of the header. This is typically used to add
- extra headers. Multiple extra headers are separated with a
+ <parameter>additional_headers</parameter> is inserted at
+ the end of the header. This is typically used to add extra
+ headers. Multiple extra headers are separated with a
newline(\n).
</para>
<para>
- It returns TRUE for success, otherwise it returns FALSE.
+ <parameter>additional_parameter</parameter> is a MTA command line
+ parameter. It is useful when setting the correct Return-Path
+ header when using sendmail.
</para>
<para>
- <parameter>additional_parameter</parameter> is added this
- data to the call to the mailer by PHP. This is useful when
- setting the correct Return-Path header when using sendmail.
+ It returns <literal>TRUE</literal> for success, otherwise it
+ returns <literal>FALSE</literal>.
</para>
<para>
- See also: <function>mail</function>.
+ See also: <function>mb_language</function>,
+ <function>mail</function>.
</para>
</refsect1>
</refentry>