ID: 18169 Comment by: [EMAIL PROTECTED] Reported By: [EMAIL PROTECTED] Status: Analyzed Bug Type: MSSQL related Operating System: Windows 2000 Server PHP Version: 4.1.2 New Comment:
If you're using PHP on a Windows platform you can use the PHP COM extension to communicate with SQL Server via ADO. The PHP COM extension is capable of translating UTF-8 to UCS-2 and back if you specify so as the third parameter: $oDb = new COM('ADODB.Connection', NULL, CP_UTF8); This way you can use Unicode UTF-8 within PHP and Unicode UCS-2 within SQL Server with all the translations done for you automatically. HTH, Freddy Vulto Previous Comments: ------------------------------------------------------------------------ [2002-07-06 07:08:48] [EMAIL PROTECTED] Thanks Marko -I guess this means that if you are to use binary (ie. unicode) data, then COM/ADO is your only option, if SQL Server is the database of your choice. >From yohgaki's answer, I guess also the multibyte encoding functionality lacks proper Unicode support -am I correct in assuming that we will have to move to PHP4.2.x and do our own encoding/decoding through the Win32 API then? ------------------------------------------------------------------------ [2002-07-05 05:34:22] [EMAIL PROTECTED] PHP's mssql extension uses the Microsoft SQL Server's C API, the "DB-Library for C". Specifically, SQL queries are sent to the server using the dbcmd() function. This function is not binary safe, so inserting UCS2 text or images or any binary data is likely to fail. The DB-Library for C has separate, binary-safe APIs for entering text and images, but they are complicated and difficult to seamlessly integrate to the current mssql extension. Look up the documentation for dbwritetext() if you feel like implementing this change. UTF-8 and UTF-7 are, IIRC, the only Unicode encoding that are guaranteed not to include null characters. They are, therefore, the only encodings that can be reliably used with PHP's mssql extension at this time. ------------------------------------------------------------------------ [2002-07-05 04:21:52] [EMAIL PROTECTED] You are probably right. However, Unicode is central to making world-wide web applications, and all major database vendors have this posibility. I find it to be a hindrance to wider deployment of large-scale, worldwide php applications. Does anyone know if it is only the MSSQL module? -are there any plans to look into this issue? What are the future directions for PHP and Unicode support? ------------------------------------------------------------------------ [2002-07-05 04:14:38] [EMAIL PROTECTED] Wide char encoding, UCS2/UCS4/UTF16/UTF32, don't work well with current PHP. I guess SQL Server module is using strlen() or like, that cannot be used with wide char... Fixing this is not simple at all. ------------------------------------------------------------------------ [2002-07-04 18:10:24] [EMAIL PROTECTED] I have a problem converting UTF-8 (web character encoding) to UCS2 (Microsoft Windows character encoding) using PHP, and storing this in the Microsoft SQL Server 2000 database. My setup is: Windows 2000 Server, with Apache 1.3.24/PHP 4.1.1 and Microsoft SQL Server 2000 Now, as a result of Microsofts Q232580, I will have to do conversion between UTF-8 and UCS-2. For this, I thought I would use the Multibyte String functions. However, this does not seem to work. I am absolutely sure, that I input UTF-8 encoded data into my string, and then I do: $ucs2string=mb_convert_encoding($string,"UCS2","UTF-8"); $sqlStmt="insert into testtbl (tekst) values(N'".($ucs2string)."')"; $rs=$DBCon->Execute($sqlStmt); When I access the database, then I will see something stored, that does not resemble the input at all (most times, I see Japanese/Chinese characters?!??). Furthermore, the insert sometimes comes up with an error, and consequently stores nothing. To me, it seems like either one of these (or both) are flawed: 1. the Multibyte String encoding funtion does not work properly (ie. encoding from UTF-8 to UCS-2 does not happen correctly). 2. The PHP MSSQL driver does not handle unicode data properly, even though the target column in the database is specified as Unicode and N is prepended to the string before insert. This leads me to use ADO (as in the example above), storing UTF-8 encoded data into SQL Server -this is a very short term solution, as data are not sortable in the database (some of it looks like garbage because of the missing encoding). ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=18169&edit=1