ID: 18169
Comment by: [EMAIL PROTECTED]
Reported By: [EMAIL PROTECTED]
Status: Analyzed
Bug Type: MSSQL related
Operating System: Windows 2000 Server
PHP Version: 4.1.2
New Comment:
If you're using PHP on a Windows platform you can use the PHP COM
extension to communicate with SQL Server via ADO. The PHP COM
extension is capable of translating UTF-8 to UCS-2 and back if you
specify so as the third parameter:
$oDb = new COM('ADODB.Connection', NULL, CP_UTF8);
This way you can use Unicode UTF-8 within PHP and Unicode UCS-2 within
SQL Server with all the translations done for you automatically.
HTH, Freddy Vulto
Previous Comments:
------------------------------------------------------------------------
[2002-07-06 07:08:48] [EMAIL PROTECTED]
Thanks Marko
-I guess this means that if you are to use binary (ie. unicode) data,
then COM/ADO is your only option, if SQL Server is the database of your
choice.
>From yohgaki's answer, I guess also the multibyte encoding
functionality lacks proper Unicode support -am I correct in assuming
that we will have to move to PHP4.2.x and do our own encoding/decoding
through the Win32 API then?
------------------------------------------------------------------------
[2002-07-05 05:34:22] [EMAIL PROTECTED]
PHP's mssql extension uses the Microsoft SQL Server's C
API, the "DB-Library for C". Specifically, SQL queries are
sent to the server using the dbcmd() function. This
function is not binary safe, so inserting UCS2 text or
images or any binary data is likely to fail.
The DB-Library for C has separate, binary-safe APIs for
entering text and images, but they are complicated and
difficult to seamlessly integrate to the current mssql
extension. Look up the documentation for dbwritetext() if
you feel like implementing this change.
UTF-8 and UTF-7 are, IIRC, the only Unicode encoding that
are guaranteed not to include null characters. They are,
therefore, the only encodings that can be reliably used
with PHP's mssql extension at this time.
------------------------------------------------------------------------
[2002-07-05 04:21:52] [EMAIL PROTECTED]
You are probably right. However, Unicode is central to making
world-wide web applications, and all major database vendors have this
posibility.
I find it to be a hindrance to wider deployment of large-scale,
worldwide php applications.
Does anyone know if it is only the MSSQL module? -are there any plans
to look into this issue?
What are the future directions for PHP and Unicode support?
------------------------------------------------------------------------
[2002-07-05 04:14:38] [EMAIL PROTECTED]
Wide char encoding, UCS2/UCS4/UTF16/UTF32, don't work well with current
PHP. I guess SQL Server module is using strlen() or like, that cannot
be used with wide char...
Fixing this is not simple at all.
------------------------------------------------------------------------
[2002-07-04 18:10:24] [EMAIL PROTECTED]
I have a problem converting UTF-8 (web character encoding) to UCS2
(Microsoft Windows character encoding) using PHP, and storing this in
the Microsoft SQL Server 2000 database.
My setup is:
Windows 2000 Server, with Apache 1.3.24/PHP 4.1.1 and Microsoft SQL
Server 2000
Now, as a result of Microsofts Q232580, I will have to do conversion
between UTF-8 and UCS-2. For this, I thought I would use the Multibyte
String functions.
However, this does not seem to work.
I am absolutely sure, that I input UTF-8 encoded data into my string,
and then I do:
$ucs2string=mb_convert_encoding($string,"UCS2","UTF-8");
$sqlStmt="insert into testtbl (tekst) values(N'".($ucs2string)."')";
$rs=$DBCon->Execute($sqlStmt);
When I access the database, then I will see something stored, that does
not resemble the input at all (most times, I see Japanese/Chinese
characters?!??). Furthermore, the insert sometimes comes up with an
error, and consequently stores nothing.
To me, it seems like either one of these (or both) are flawed:
1. the Multibyte String encoding funtion does not work properly (ie.
encoding from UTF-8 to UCS-2 does not happen correctly).
2. The PHP MSSQL driver does not handle unicode data properly, even
though the target column in the database is specified as Unicode and N
is prepended to the string before insert.
This leads me to use ADO (as in the example above), storing UTF-8
encoded data into SQL Server -this is a very short term solution, as
data are not sortable in the database (some of it looks like garbage
because of the
missing encoding).
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=18169&edit=1