RE: Java, SQL, Unicode and Databases

2000-06-23 Thread Michael Kaplan (Trigeminal Inc.)

Microsoft is very COM-based for its actual data access methods and COM
uses BSTRs that are BOM-less UTF-16. Because of that, the actual storage
format of any database ends up irrelevant since it will be converted to
UTF-16 anyway.

Given that this is what the data layers do, performance is certainly better
if there does not have to be an extra call to the Windows
MutliByteToWideChar to convert UTF-8 to UTF-16. So from a Windows
perspective, not only is it no trouble, but it also the best possible
solution!

In any case, I know plenty of web people who *do* encode their strings in
SQL Server databases as UTF-8 for web applications, since UTF-8 is their
preference. They are willing to take the hit of "converting themselves"
because when data is being read it is faster to go through no conversions at
all.

Michael

 --
 From: [EMAIL PROTECTED][SMTP:[EMAIL PROTECTED]]
 Sent: Friday, June 23, 2000 7:55 AM
 To:   Unicode List
 Cc:   Unicode List; [EMAIL PROTECTED]
 Subject:  Re: Java, SQL, Unicode and Databases
 
 
 
 I think that this is also true for DB2 using UTF-8 as the database
 encoding.
 From an application perspective, MS SQL Server is the one that gives us
 the most
 trouble, because it doesn't support UTF-8 as a database encoding for char,
 etc.
 Joe
 
 Kenneth Whistler [EMAIL PROTECTED] on 06/22/2000 06:42:20 PM
 
 To:   "Unicode List" [EMAIL PROTECTED]
 cc:   [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Joe
 Ross/Tivoli
   Systems)
 Subject:  Re: Java, SQL, Unicode and Databases
 
 
 
 
 Jianping responded:
 
 
  Tex,
 
  Oracle doesn't have special requirement for datatype in JDBC driver if
 you use
 UTF8 as database
  character set. In this case, all the text datatype in JDBC will support
 Unicode data.
 
 
 The same thing is, of course, true for Sybase databases using UTF-8
 at the database character set, accessing them through a JDBC driver.
 
 But I think Tex's question is aimed at the much murkier area
 of what the various database vendors' strategies are for dealing
 with UTF-16 Unicode as a datatype. In that area, the answers for
 what a cross-platform application vendor needs to do and for how
 JDBC drivers might abstract differences in database implementations
 are still unclear.
 
 --Ken
 
 
 



Re: Java, SQL, Unicode and Databases

2000-06-23 Thread Joe_Ross



Yes,  version 7. It requires us to use a different data type (nchar) if we want
to store multilingual text as UTF-16. We want our applications to be database
vendor independent so that customers can use any database under the covers. If
all databases supported UTF-8 as an encoding for char, we could support
multilingual data in the same way for all vendors. As it is, we have to use a
different schema for MS SQL server than we do for the others.
Joe


"Tex Texin" [EMAIL PROTECTED] on 06/23/2000 11:50:06 AM

To:   Joe Ross/Tivoli Systems@Tivoli Systems
cc:   Unicode List [EMAIL PROTECTED], Hossein Kushki@IBMCA, Vladimir Dvorkin
  [EMAIL PROTECTED], Steven Watt [EMAIL PROTECTED]
Subject:  Re: Java, SQL, Unicode and Databases




Joe,

Can you expand on this a bit more? Privately if you prefer.
Do you mean version 7 of MS SQL Server?

I assume if it doesn't have UTF-8, it uses UTF-16. How does this
being the storage encoding, become problematic?
tex


[EMAIL PROTECTED] wrote:

 I think that this is also true for DB2 using UTF-8 as the database encoding.
 From an application perspective, MS SQL Server is the one that gives us the
most
 trouble, because it doesn't support UTF-8 as a database encoding for char,
etc.
 Joe

 Kenneth Whistler [EMAIL PROTECTED] on 06/22/2000 06:42:20 PM

 To:   "Unicode List" [EMAIL PROTECTED]
 cc:   [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Joe
Ross/Tivoli
   Systems)
 Subject:  Re: Java, SQL, Unicode and Databases

 Jianping responded:

 
  Tex,
 
  Oracle doesn't have special requirement for datatype in JDBC driver if you
use
 UTF8 as database
  character set. In this case, all the text datatype in JDBC will support
 Unicode data.
 

 The same thing is, of course, true for Sybase databases using UTF-8
 at the database character set, accessing them through a JDBC driver.

 But I think Tex's question is aimed at the much murkier area
 of what the various database vendors' strategies are for dealing
 with UTF-16 Unicode as a datatype. In that area, the answers for
 what a cross-platform application vendor needs to do and for how
 JDBC drivers might abstract differences in database implementations
 are still unclear.

 --Ken

--


Tex Texin Director, International Products

Progress Software Corp.   +1-781-280-4271
14 Oak Park   +1-781-280-4655 (Fax)
Bedford, MA 01730  USA[EMAIL PROTECTED]

http://www.progress.com   The #1 Embedded Database
http://www.SonicMQ.comJMS Compliant Messaging- Best Middleware
Award
http://www.aspconnections.com Leading provider in the ASP marketplace

Progress Globalization Program (New URL)
http://www.progress.com/partners/globalization.htm


Come to the Panel on Open Source Approaches to Unicode Libraries at
the Sept. Unicode Conference
http://www.unicode.org/iuc/iuc17






RE: Java, SQL, Unicode and Databases

2000-06-23 Thread Michael Kaplan (Trigeminal Inc.)

The datatype *does* matter in that sense you would use UTF-16 data
fields (NTEXT and NCHAR and NVARCHAR) and access it with your favorite data
access method, which will convert as needed to whatever format IS uses. You
will never know oc care what the underlying engine stores.

The web site stuff will not work for you since you would have to do the
extra conversions to do the data mining, so you would probably go with plan
"A".

My general point is that OLE DB to an Oracle UTF-8 field and to a SQL Server
UTF-16 field all return the same type of data UTF-16. So COM in this
case is hiding the differences.

Michael

 --
 From: [EMAIL PROTECTED][SMTP:[EMAIL PROTECTED]]
 Sent: Friday, June 23, 2000 2:27 PM
 To:   Michael Kaplan (Trigeminal Inc.)
 Cc:   Unicode List; [EMAIL PROTECTED]
 Subject:  RE: Java, SQL, Unicode and Databases
 
 
 
 Michael, are you saying that the data type (char or nchar) doesn't matter?
 Are
 you saying that if we just use UTF-16 or wchar_t interfaces to access the
 data
 all will be fine and we will be able to store multilingual data even in
 fields
 defined as char? Maybe things aren't as bad as I feared.
 
 With respect to the web applications you describe, do they store the UTF-8
 as
 binary data? This wouldn't work for us, since we want other data mining
 applications to be able to access the same data.
 
 Thanks,
 Joe
 
 "Michael Kaplan (Trigeminal Inc.)" [EMAIL PROTECTED] on 06/23/2000
 10:41:39 AM
 
 To:   Unicode List [EMAIL PROTECTED], Joe Ross/Tivoli Systems@Tivoli
 Systems
 cc:   Hossein Kushki@IBMCA
 Subject:  RE: Java, SQL, Unicode and Databases
 
 
 
 
 Microsoft is very COM-based for its actual data access methods and COM
 uses BSTRs that are BOM-less UTF-16. Because of that, the actual storage
 format of any database ends up irrelevant since it will be converted to
 UTF-16 anyway.
 
 Given that this is what the data layers do, performance is certainly
 better
 if there does not have to be an extra call to the Windows
 MutliByteToWideChar to convert UTF-8 to UTF-16. So from a Windows
 perspective, not only is it no trouble, but it also the best possible
 solution!
 
 In any case, I know plenty of web people who *do* encode their strings in
 SQL Server databases as UTF-8 for web applications, since UTF-8 is their
 preference. They are willing to take the hit of "converting themselves"
 because when data is being read it is faster to go through no conversions
 at
 all.
 
 Michael
 
  --
  From:   [EMAIL PROTECTED][SMTP:[EMAIL PROTECTED]]
  Sent:   Friday, June 23, 2000 7:55 AM
  To: Unicode List
  Cc: Unicode List; [EMAIL PROTECTED]
  Subject:     Re: Java, SQL, Unicode and Databases
 
 
 
  I think that this is also true for DB2 using UTF-8 as the database
  encoding.
  From an application perspective, MS SQL Server is the one that gives us
  the most
  trouble, because it doesn't support UTF-8 as a database encoding for
 char,
  etc.
  Joe
 
  Kenneth Whistler [EMAIL PROTECTED] on 06/22/2000 06:42:20 PM
 
  To:   "Unicode List" [EMAIL PROTECTED]
  cc:   [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Joe
  Ross/Tivoli
    Systems)
  Subject:  Re: Java, SQL, Unicode and Databases
 
 
 
 
  Jianping responded:
 
  
   Tex,
  
   Oracle doesn't have special requirement for datatype in JDBC driver if
  you use
  UTF8 as database
   character set. In this case, all the text datatype in JDBC will
 support
  Unicode data.
  
 
  The same thing is, of course, true for Sybase databases using UTF-8
  at the database character set, accessing them through a JDBC driver.
 
  But I think Tex's question is aimed at the much murkier area
  of what the various database vendors' strategies are for dealing
  with UTF-16 Unicode as a datatype. In that area, the answers for
  what a cross-platform application vendor needs to do and for how
  JDBC drivers might abstract differences in database implementations
  are still unclear.
 
  --Ken