RE: Java, SQL, Unicode and Databases
Microsoft is very COM-based for its actual data access methods and COM uses BSTRs that are BOM-less UTF-16. Because of that, the actual storage format of any database ends up irrelevant since it will be converted to UTF-16 anyway. Given that this is what the data layers do, performance is certainly better if there does not have to be an extra call to the Windows MutliByteToWideChar to convert UTF-8 to UTF-16. So from a Windows perspective, not only is it no trouble, but it also the best possible solution! In any case, I know plenty of web people who *do* encode their strings in SQL Server databases as UTF-8 for web applications, since UTF-8 is their preference. They are willing to take the hit of "converting themselves" because when data is being read it is faster to go through no conversions at all. Michael -- From: [EMAIL PROTECTED][SMTP:[EMAIL PROTECTED]] Sent: Friday, June 23, 2000 7:55 AM To: Unicode List Cc: Unicode List; [EMAIL PROTECTED] Subject: Re: Java, SQL, Unicode and Databases I think that this is also true for DB2 using UTF-8 as the database encoding. From an application perspective, MS SQL Server is the one that gives us the most trouble, because it doesn't support UTF-8 as a database encoding for char, etc. Joe Kenneth Whistler [EMAIL PROTECTED] on 06/22/2000 06:42:20 PM To: "Unicode List" [EMAIL PROTECTED] cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Joe Ross/Tivoli Systems) Subject: Re: Java, SQL, Unicode and Databases Jianping responded: Tex, Oracle doesn't have special requirement for datatype in JDBC driver if you use UTF8 as database character set. In this case, all the text datatype in JDBC will support Unicode data. The same thing is, of course, true for Sybase databases using UTF-8 at the database character set, accessing them through a JDBC driver. But I think Tex's question is aimed at the much murkier area of what the various database vendors' strategies are for dealing with UTF-16 Unicode as a datatype. In that area, the answers for what a cross-platform application vendor needs to do and for how JDBC drivers might abstract differences in database implementations are still unclear. --Ken
Re: Java, SQL, Unicode and Databases
Yes, version 7. It requires us to use a different data type (nchar) if we want to store multilingual text as UTF-16. We want our applications to be database vendor independent so that customers can use any database under the covers. If all databases supported UTF-8 as an encoding for char, we could support multilingual data in the same way for all vendors. As it is, we have to use a different schema for MS SQL server than we do for the others. Joe "Tex Texin" [EMAIL PROTECTED] on 06/23/2000 11:50:06 AM To: Joe Ross/Tivoli Systems@Tivoli Systems cc: Unicode List [EMAIL PROTECTED], Hossein Kushki@IBMCA, Vladimir Dvorkin [EMAIL PROTECTED], Steven Watt [EMAIL PROTECTED] Subject: Re: Java, SQL, Unicode and Databases Joe, Can you expand on this a bit more? Privately if you prefer. Do you mean version 7 of MS SQL Server? I assume if it doesn't have UTF-8, it uses UTF-16. How does this being the storage encoding, become problematic? tex [EMAIL PROTECTED] wrote: I think that this is also true for DB2 using UTF-8 as the database encoding. From an application perspective, MS SQL Server is the one that gives us the most trouble, because it doesn't support UTF-8 as a database encoding for char, etc. Joe Kenneth Whistler [EMAIL PROTECTED] on 06/22/2000 06:42:20 PM To: "Unicode List" [EMAIL PROTECTED] cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Joe Ross/Tivoli Systems) Subject: Re: Java, SQL, Unicode and Databases Jianping responded: Tex, Oracle doesn't have special requirement for datatype in JDBC driver if you use UTF8 as database character set. In this case, all the text datatype in JDBC will support Unicode data. The same thing is, of course, true for Sybase databases using UTF-8 at the database character set, accessing them through a JDBC driver. But I think Tex's question is aimed at the much murkier area of what the various database vendors' strategies are for dealing with UTF-16 Unicode as a datatype. In that area, the answers for what a cross-platform application vendor needs to do and for how JDBC drivers might abstract differences in database implementations are still unclear. --Ken -- Tex Texin Director, International Products Progress Software Corp. +1-781-280-4271 14 Oak Park +1-781-280-4655 (Fax) Bedford, MA 01730 USA[EMAIL PROTECTED] http://www.progress.com The #1 Embedded Database http://www.SonicMQ.comJMS Compliant Messaging- Best Middleware Award http://www.aspconnections.com Leading provider in the ASP marketplace Progress Globalization Program (New URL) http://www.progress.com/partners/globalization.htm Come to the Panel on Open Source Approaches to Unicode Libraries at the Sept. Unicode Conference http://www.unicode.org/iuc/iuc17
RE: Java, SQL, Unicode and Databases
The datatype *does* matter in that sense you would use UTF-16 data fields (NTEXT and NCHAR and NVARCHAR) and access it with your favorite data access method, which will convert as needed to whatever format IS uses. You will never know oc care what the underlying engine stores. The web site stuff will not work for you since you would have to do the extra conversions to do the data mining, so you would probably go with plan "A". My general point is that OLE DB to an Oracle UTF-8 field and to a SQL Server UTF-16 field all return the same type of data UTF-16. So COM in this case is hiding the differences. Michael -- From: [EMAIL PROTECTED][SMTP:[EMAIL PROTECTED]] Sent: Friday, June 23, 2000 2:27 PM To: Michael Kaplan (Trigeminal Inc.) Cc: Unicode List; [EMAIL PROTECTED] Subject: RE: Java, SQL, Unicode and Databases Michael, are you saying that the data type (char or nchar) doesn't matter? Are you saying that if we just use UTF-16 or wchar_t interfaces to access the data all will be fine and we will be able to store multilingual data even in fields defined as char? Maybe things aren't as bad as I feared. With respect to the web applications you describe, do they store the UTF-8 as binary data? This wouldn't work for us, since we want other data mining applications to be able to access the same data. Thanks, Joe "Michael Kaplan (Trigeminal Inc.)" [EMAIL PROTECTED] on 06/23/2000 10:41:39 AM To: Unicode List [EMAIL PROTECTED], Joe Ross/Tivoli Systems@Tivoli Systems cc: Hossein Kushki@IBMCA Subject: RE: Java, SQL, Unicode and Databases Microsoft is very COM-based for its actual data access methods and COM uses BSTRs that are BOM-less UTF-16. Because of that, the actual storage format of any database ends up irrelevant since it will be converted to UTF-16 anyway. Given that this is what the data layers do, performance is certainly better if there does not have to be an extra call to the Windows MutliByteToWideChar to convert UTF-8 to UTF-16. So from a Windows perspective, not only is it no trouble, but it also the best possible solution! In any case, I know plenty of web people who *do* encode their strings in SQL Server databases as UTF-8 for web applications, since UTF-8 is their preference. They are willing to take the hit of "converting themselves" because when data is being read it is faster to go through no conversions at all. Michael -- From: [EMAIL PROTECTED][SMTP:[EMAIL PROTECTED]] Sent: Friday, June 23, 2000 7:55 AM To: Unicode List Cc: Unicode List; [EMAIL PROTECTED] Subject: Re: Java, SQL, Unicode and Databases I think that this is also true for DB2 using UTF-8 as the database encoding. From an application perspective, MS SQL Server is the one that gives us the most trouble, because it doesn't support UTF-8 as a database encoding for char, etc. Joe Kenneth Whistler [EMAIL PROTECTED] on 06/22/2000 06:42:20 PM To: "Unicode List" [EMAIL PROTECTED] cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Joe Ross/Tivoli Systems) Subject: Re: Java, SQL, Unicode and Databases Jianping responded: Tex, Oracle doesn't have special requirement for datatype in JDBC driver if you use UTF8 as database character set. In this case, all the text datatype in JDBC will support Unicode data. The same thing is, of course, true for Sybase databases using UTF-8 at the database character set, accessing them through a JDBC driver. But I think Tex's question is aimed at the much murkier area of what the various database vendors' strategies are for dealing with UTF-16 Unicode as a datatype. In that area, the answers for what a cross-platform application vendor needs to do and for how JDBC drivers might abstract differences in database implementations are still unclear. --Ken