Re: unicode + oracle query....... (suggestions needed...)

2000-09-27 Thread Sandeep Krishna

hi,

thankx for responding.

but when u mention change in the registry..
could u elaborate about where exactly in reg and what changes are required

my registry setting shows NLS = American_English.UTF8.

is this the setting u indicated..or something to so with the charset entry :
autodetect and autodetect_all (in classid...Mimedatabasecharset..)

pls do elaborate

regards,

Sandeep



- Original Message -
From: Kedar Moghe [EMAIL PROTECTED]
To: 'Sandeep Krishna' [EMAIL PROTECTED]
Sent: Wednesday, September 27, 2000 11:20 AM
Subject: RE: unicode + oracle query... (suggestions needed...)


Sandeep,

I think you need to set the registry charset to UTF8 where database is
installed. We were was getting the same problem when we use to send UTF-8
strings to oracle database after conversion from Shift-JIS to UTF8. That
time also the byte sequence of the retrieved string is getting changed and
some of the bytes are getting replaced with BF.

Regards,

Kedar

-Original Message-
From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 27, 2000 11:36 AM
To: Unicode List
Subject: unicode + oracle query... (suggestions needed...)


hi

actually i have been trying to use ASPs (UTF-8 encoding..) to write unicode
cahracters to an Oracle DB table (varchar2 field)... and then retrieve them
back..
(i used UTF-8 encoding for both writing to the database and also for
retriving and displaying..)

there were some amazing observations...

* each  unicode character was taking 7 bytes in the database. (instead of
expected 2 or 3...)
* some unicode characters(or rather code points.) like' F95F' when encoded
in UTF-8 was being encoded as EF A5 BF, when it should have been encoded as
EF A5 9F..  in fact many unicode charcters whose encoded form had to had a
byte in the range (80..9F) were being somehow changed to BF ... thus
resulting in incorrect retrieval

I was unable to find the reasons for these strange occurrences
Pls suggest what could be the causes for these..

regards,

Sandeep.




***
SANDEEP KRISHNA
Member Technical Staff (Priceline.com)
H.C.L. Technologies Limited
A-1 CD, Sector -16, NOIDA, UP, India.
Ph:  91-11-91-4516321 (extn. 1062)
Fax: 91-11-91-4510713, 4510226
E-Mail : [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]






Re: unicode + oracle query....... (suggestions needed...)

2000-09-27 Thread Sandeep Krishna

hi...

i m thoroughly confused.
actually the registry entries for oracle shows 3 entries for NLS_LANG.
and that too at the WEB SERVER end and at the DATABASE SERVER end.
so that makes tooo many combinations...

can someone indicate which of these NLS_LANG entries have to be set as
"AMERICAN_AMERICA.UTF8" and if some of them doesnt need this...what exactly
should be there

pls suggest necessary messures..

regards,

Sandeep




- Original Message -
From: Bob Verbrugge [EMAIL PROTECTED]
To: Sandeep Krishna [EMAIL PROTECTED]
Sent: Wednesday, September 27, 2000 1:30 PM
Subject: Re: unicode + oracle query... (suggestions needed...)


Sandeep,

You probably need to change the NLS_LANG Oracle setting in the registry.
Look under
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE for this setting and change the
character set part to UTF8.

Bob.


- Original Message -
From: "Sandeep Krishna" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Wednesday, September 27, 2000 9:16 AM
Subject: Re: unicode + oracle query... (suggestions needed...)


 hi,

 thankx for responding.

 but when u mention change in the registry..
 could u elaborate about where exactly in reg and what changes are required

 my registry setting shows NLS = American_English.UTF8.

 is this the setting u indicated..or something to so with the charset entry
:
 autodetect and autodetect_all (in classid...Mimedatabasecharset..)

 pls do elaborate

 regards,

 Sandeep



 - Original Message -
 From: Kedar Moghe [EMAIL PROTECTED]
 To: 'Sandeep Krishna' [EMAIL PROTECTED]
 Sent: Wednesday, September 27, 2000 11:20 AM
 Subject: RE: unicode + oracle query... (suggestions needed...)


 Sandeep,

 I think you need to set the registry charset to UTF8 where database is
 installed. We were was getting the same problem when we use to send UTF-8
 strings to oracle database after conversion from Shift-JIS to UTF8. That
 time also the byte sequence of the retrieved string is getting changed and
 some of the bytes are getting replaced with BF.

 Regards,

 Kedar

 -Original Message-
 From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, September 27, 2000 11:36 AM
 To: Unicode List
 Subject: unicode + oracle query... (suggestions needed...)


 hi

 actually i have been trying to use ASPs (UTF-8 encoding..) to write
unicode
 cahracters to an Oracle DB table (varchar2 field)... and then retrieve
them
 back..
 (i used UTF-8 encoding for both writing to the database and also for
 retriving and displaying..)

 there were some amazing observations...

 * each  unicode character was taking 7 bytes in the database. (instead of
 expected 2 or 3...)
 * some unicode characters(or rather code points.) like' F95F' when encoded
 in UTF-8 was being encoded as EF A5 BF, when it should have been encoded
as
 EF A5 9F..  in fact many unicode charcters whose encoded form had to had a
 byte in the range (80..9F) were being somehow changed to BF ... thus
 resulting in incorrect retrieval

 I was unable to find the reasons for these strange occurrences
 Pls suggest what could be the causes for these..

 regards,

 Sandeep.





 ***
 SANDEEP KRISHNA
 Member Technical Staff (Priceline.com)
 H.C.L. Technologies Limited
 A-1 CD, Sector -16, NOIDA, UP, India.
 Ph:  91-11-91-4516321 (extn. 1062)
 Fax: 91-11-91-4510713, 4510226
 E-Mail : [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED]







RE: unicode + oracle query....... (suggestions needed...)

2000-09-27 Thread Kedar Moghe

Sandeep,

I think you need to change at following three places,
HKEY_LOCAL_MACHINE\ORACLE\NLS_LANG
HKEY_LOCAL_MACHINE\ORACLE\ALL_HOMES\ID0\NLS_LANG
HKEY_LOCAL_MACHINE\ORACLE\HOME0\NLS_LANG

Best of luck

Regards,

Kedar

-Original Message-
From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 27, 2000 5:45 PM
To: Carl W. Brown; Bob Verbrugge; Kedar Moghe
Cc: [EMAIL PROTECTED]
Subject: Re: unicode + oracle query... (suggestions needed...)


hi...

i m thoroughly confused.
actually the registry entries for oracle shows 3 entries for NLS_LANG.
and that too at the WEB SERVER end and at the DATABASE SERVER end.
so that makes tooo many combinations...

can someone indicate which of these NLS_LANG entries have to be set as
"AMERICAN_AMERICA.UTF8" and if some of them doesnt need this...what exactly
should be there

pls suggest necessary messures..

regards,

Sandeep




- Original Message -
From: Bob Verbrugge [EMAIL PROTECTED]
To: Sandeep Krishna [EMAIL PROTECTED]
Sent: Wednesday, September 27, 2000 1:30 PM
Subject: Re: unicode + oracle query... (suggestions needed...)


Sandeep,

You probably need to change the NLS_LANG Oracle setting in the registry.
Look under
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE for this setting and change the
character set part to UTF8.

Bob.


- Original Message -
From: "Sandeep Krishna" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Wednesday, September 27, 2000 9:16 AM
Subject: Re: unicode + oracle query... (suggestions needed...)


 hi,

 thankx for responding.

 but when u mention change in the registry..
 could u elaborate about where exactly in reg and what changes are required

 my registry setting shows NLS = American_English.UTF8.

 is this the setting u indicated..or something to so with the charset entry
:
 autodetect and autodetect_all (in classid...Mimedatabasecharset..)

 pls do elaborate

 regards,

 Sandeep



 - Original Message -
 From: Kedar Moghe [EMAIL PROTECTED]
 To: 'Sandeep Krishna' [EMAIL PROTECTED]
 Sent: Wednesday, September 27, 2000 11:20 AM
 Subject: RE: unicode + oracle query... (suggestions needed...)


 Sandeep,

 I think you need to set the registry charset to UTF8 where database is
 installed. We were was getting the same problem when we use to send UTF-8
 strings to oracle database after conversion from Shift-JIS to UTF8. That
 time also the byte sequence of the retrieved string is getting changed and
 some of the bytes are getting replaced with BF.

 Regards,

 Kedar

 -Original Message-
 From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, September 27, 2000 11:36 AM
 To: Unicode List
 Subject: unicode + oracle query... (suggestions needed...)


 hi

 actually i have been trying to use ASPs (UTF-8 encoding..) to write
unicode
 cahracters to an Oracle DB table (varchar2 field)... and then retrieve
them
 back..
 (i used UTF-8 encoding for both writing to the database and also for
 retriving and displaying..)

 there were some amazing observations...

 * each  unicode character was taking 7 bytes in the database. (instead of
 expected 2 or 3...)
 * some unicode characters(or rather code points.) like' F95F' when encoded
 in UTF-8 was being encoded as EF A5 BF, when it should have been encoded
as
 EF A5 9F..  in fact many unicode charcters whose encoded form had to had a
 byte in the range (80..9F) were being somehow changed to BF ... thus
 resulting in incorrect retrieval

 I was unable to find the reasons for these strange occurrences
 Pls suggest what could be the causes for these..

 regards,

 Sandeep.





 ***
 SANDEEP KRISHNA
 Member Technical Staff (Priceline.com)
 H.C.L. Technologies Limited
 A-1 CD, Sector -16, NOIDA, UP, India.
 Ph:  91-11-91-4516321 (extn. 1062)
 Fax: 91-11-91-4510713, 4510226
 E-Mail : [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED]






Re: unicode + oracle query....... (suggestions needed...)

2000-09-27 Thread Sandeep Krishna

i mean all the entries at both Web server machine's registry and Oracle
Database server machine's registry or either one.
in our setup... my machine is the Web Server and the Oracle Server is a
separate machine
please clarify

regards,

Sandeep
- Original Message -
From: Kedar Moghe [EMAIL PROTECTED]
To: Unicode List [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Wednesday, September 27, 2000 3:21 PM
Subject: RE: unicode + oracle query... (suggestions needed...)


Sandeep,

I think you need to change at following three places,
HKEY_LOCAL_MACHINE\ORACLE\NLS_LANG
HKEY_LOCAL_MACHINE\ORACLE\ALL_HOMES\ID0\NLS_LANG
HKEY_LOCAL_MACHINE\ORACLE\HOME0\NLS_LANG

Best of luck

Regards,

Kedar

-Original Message-
From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 27, 2000 5:45 PM
To: Carl W. Brown; Bob Verbrugge; Kedar Moghe
Cc: [EMAIL PROTECTED]
Subject: Re: unicode + oracle query... (suggestions needed...)


hi...

i m thoroughly confused.
actually the registry entries for oracle shows 3 entries for NLS_LANG.
and that too at the WEB SERVER end and at the DATABASE SERVER end.
so that makes tooo many combinations...

can someone indicate which of these NLS_LANG entries have to be set as
"AMERICAN_AMERICA.UTF8" and if some of them doesnt need this...what exactly
should be there

pls suggest necessary messures..

regards,

Sandeep




- Original Message -
From: Bob Verbrugge [EMAIL PROTECTED]
To: Sandeep Krishna [EMAIL PROTECTED]
Sent: Wednesday, September 27, 2000 1:30 PM
Subject: Re: unicode + oracle query... (suggestions needed...)


Sandeep,

You probably need to change the NLS_LANG Oracle setting in the registry.
Look under
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE for this setting and change the
character set part to UTF8.

Bob.


- Original Message -
From: "Sandeep Krishna" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Wednesday, September 27, 2000 9:16 AM
Subject: Re: unicode + oracle query... (suggestions needed...)


 hi,

 thankx for responding.

 but when u mention change in the registry..
 could u elaborate about where exactly in reg and what changes are required

 my registry setting shows NLS = American_English.UTF8.

 is this the setting u indicated..or something to so with the charset entry
:
 autodetect and autodetect_all (in classid...Mimedatabasecharset..)

 pls do elaborate

 regards,

 Sandeep



 - Original Message -
 From: Kedar Moghe [EMAIL PROTECTED]
 To: 'Sandeep Krishna' [EMAIL PROTECTED]
 Sent: Wednesday, September 27, 2000 11:20 AM
 Subject: RE: unicode + oracle query... (suggestions needed...)


 Sandeep,

 I think you need to set the registry charset to UTF8 where database is
 installed. We were was getting the same problem when we use to send UTF-8
 strings to oracle database after conversion from Shift-JIS to UTF8. That
 time also the byte sequence of the retrieved string is getting changed and
 some of the bytes are getting replaced with BF.

 Regards,

 Kedar

 -Original Message-
 From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, September 27, 2000 11:36 AM
 To: Unicode List
 Subject: unicode + oracle query... (suggestions needed...)


 hi

 actually i have been trying to use ASPs (UTF-8 encoding..) to write
unicode
 cahracters to an Oracle DB table (varchar2 field)... and then retrieve
them
 back..
 (i used UTF-8 encoding for both writing to the database and also for
 retriving and displaying..)

 there were some amazing observations...

 * each  unicode character was taking 7 bytes in the database. (instead of
 expected 2 or 3...)
 * some unicode characters(or rather code points.) like' F95F' when encoded
 in UTF-8 was being encoded as EF A5 BF, when it should have been encoded
as
 EF A5 9F..  in fact many unicode charcters whose encoded form had to had a
 byte in the range (80..9F) were being somehow changed to BF ... thus
 resulting in incorrect retrieval

 I was unable to find the reasons for these strange occurrences
 Pls suggest what could be the causes for these..

 regards,

 Sandeep.





 ***
 SANDEEP KRISHNA
 Member Technical Staff (Priceline.com)
 H.C.L. Technologies Limited
 A-1 CD, Sector -16, NOIDA, UP, India.
 Ph:  91-11-91-4516321 (extn. 1062)
 Fax: 91-11-91-4510713, 4510226
 E-Mail : [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED]







RE: unicode + oracle query....... (suggestions needed...)

2000-09-27 Thread Kedar Moghe

Only registry entries on the database machine. Not any other entry.

Regsrds,

Kedar

-Original Message-
From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 27, 2000 6:21 PM
To: Kedar Moghe
Cc: [EMAIL PROTECTED]
Subject: Re: unicode + oracle query... (suggestions needed...)


i mean all the entries at both Web server machine's registry and Oracle
Database server machine's registry or either one.
in our setup... my machine is the Web Server and the Oracle Server is a
separate machine
please clarify

regards,

Sandeep
- Original Message -
From: Kedar Moghe [EMAIL PROTECTED]
To: Unicode List [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Wednesday, September 27, 2000 3:21 PM
Subject: RE: unicode + oracle query... (suggestions needed...)


Sandeep,

I think you need to change at following three places,
HKEY_LOCAL_MACHINE\ORACLE\NLS_LANG
HKEY_LOCAL_MACHINE\ORACLE\ALL_HOMES\ID0\NLS_LANG
HKEY_LOCAL_MACHINE\ORACLE\HOME0\NLS_LANG

Best of luck

Regards,

Kedar

-Original Message-
From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 27, 2000 5:45 PM
To: Carl W. Brown; Bob Verbrugge; Kedar Moghe
Cc: [EMAIL PROTECTED]
Subject: Re: unicode + oracle query... (suggestions needed...)


hi...

i m thoroughly confused.
actually the registry entries for oracle shows 3 entries for NLS_LANG.
and that too at the WEB SERVER end and at the DATABASE SERVER end.
so that makes tooo many combinations...

can someone indicate which of these NLS_LANG entries have to be set as
"AMERICAN_AMERICA.UTF8" and if some of them doesnt need this...what exactly
should be there

pls suggest necessary messures..

regards,

Sandeep




- Original Message -
From: Bob Verbrugge [EMAIL PROTECTED]
To: Sandeep Krishna [EMAIL PROTECTED]
Sent: Wednesday, September 27, 2000 1:30 PM
Subject: Re: unicode + oracle query... (suggestions needed...)


Sandeep,

You probably need to change the NLS_LANG Oracle setting in the registry.
Look under
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE for this setting and change the
character set part to UTF8.

Bob.


- Original Message -
From: "Sandeep Krishna" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Wednesday, September 27, 2000 9:16 AM
Subject: Re: unicode + oracle query... (suggestions needed...)


 hi,

 thankx for responding.

 but when u mention change in the registry..
 could u elaborate about where exactly in reg and what changes are required

 my registry setting shows NLS = American_English.UTF8.

 is this the setting u indicated..or something to so with the charset entry
:
 autodetect and autodetect_all (in classid...Mimedatabasecharset..)

 pls do elaborate

 regards,

 Sandeep



 - Original Message -
 From: Kedar Moghe [EMAIL PROTECTED]
 To: 'Sandeep Krishna' [EMAIL PROTECTED]
 Sent: Wednesday, September 27, 2000 11:20 AM
 Subject: RE: unicode + oracle query... (suggestions needed...)


 Sandeep,

 I think you need to set the registry charset to UTF8 where database is
 installed. We were was getting the same problem when we use to send UTF-8
 strings to oracle database after conversion from Shift-JIS to UTF8. That
 time also the byte sequence of the retrieved string is getting changed and
 some of the bytes are getting replaced with BF.

 Regards,

 Kedar

 -Original Message-
 From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, September 27, 2000 11:36 AM
 To: Unicode List
 Subject: unicode + oracle query... (suggestions needed...)


 hi

 actually i have been trying to use ASPs (UTF-8 encoding..) to write
unicode
 cahracters to an Oracle DB table (varchar2 field)... and then retrieve
them
 back..
 (i used UTF-8 encoding for both writing to the database and also for
 retriving and displaying..)

 there were some amazing observations...

 * each  unicode character was taking 7 bytes in the database. (instead of
 expected 2 or 3...)
 * some unicode characters(or rather code points.) like' F95F' when encoded
 in UTF-8 was being encoded as EF A5 BF, when it should have been encoded
as
 EF A5 9F..  in fact many unicode charcters whose encoded form had to had a
 byte in the range (80..9F) were being somehow changed to BF ... thus
 resulting in incorrect retrieval

 I was unable to find the reasons for these strange occurrences
 Pls suggest what could be the causes for these..

 regards,

 Sandeep.





 ***
 SANDEEP KRISHNA
 Member Technical Staff (Priceline.com)
 H.C.L. Technologies Limited
 A-1 CD, Sector -16, NOIDA, UP, India.
 Ph:  91-11-91-4516321 (extn. 1062)
 Fax: 91-11-91-4510713, 4510226
 E-Mail : [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED]






Re: unicode + oracle query....... (suggestions needed...)

2000-09-27 Thread Michael \(michka\) Kaplan

Sandeep,

Can you explain exactly what you are doing to get the data from ASP into the
Oracle database? Perhaps post the ASP code? Like most scriptoing languages,
VBScript and JScript both support UCS-2, and it is really usually the Oracle
ODBC or OLE DB driver that has the job of converting the text from UCS-2 to
UTF-8. I would wonder if what you are seeing is some type of "double
conversion?"

So the things that would be interesting to know:

1) The data access method to Oracle
2) Version of the driver being used
3) A sample of the code/script being used

michka

a new book on internationalization in VB at
http://www.i18nWithVB.com/

- Original Message -
From: "Sandeep Krishna" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Wednesday, September 27, 2000 3:12 AM
Subject: Re: unicode + oracle query... (suggestions needed...)


 i mean all the entries at both Web server machine's registry and
Oracle
 Database server machine's registry or either one.
 in our setup... my machine is the Web Server and the Oracle Server is a
 separate machine
 please clarify

 regards,

 Sandeep
 - Original Message -
 From: Kedar Moghe [EMAIL PROTECTED]
 To: Unicode List [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Sent: Wednesday, September 27, 2000 3:21 PM
 Subject: RE: unicode + oracle query... (suggestions needed...)


 Sandeep,

 I think you need to change at following three places,
 HKEY_LOCAL_MACHINE\ORACLE\NLS_LANG
 HKEY_LOCAL_MACHINE\ORACLE\ALL_HOMES\ID0\NLS_LANG
 HKEY_LOCAL_MACHINE\ORACLE\HOME0\NLS_LANG

 Best of luck

 Regards,

 Kedar

 -Original Message-
 From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, September 27, 2000 5:45 PM
 To: Carl W. Brown; Bob Verbrugge; Kedar Moghe
 Cc: [EMAIL PROTECTED]
 Subject: Re: unicode + oracle query... (suggestions needed...)


 hi...

 i m thoroughly confused.
 actually the registry entries for oracle shows 3 entries for NLS_LANG.
 and that too at the WEB SERVER end and at the DATABASE SERVER end.
 so that makes tooo many combinations...

 can someone indicate which of these NLS_LANG entries have to be set as
 "AMERICAN_AMERICA.UTF8" and if some of them doesnt need this...what
exactly
 should be there

 pls suggest necessary messures..

 regards,

 Sandeep




 - Original Message -
 From: Bob Verbrugge [EMAIL PROTECTED]
 To: Sandeep Krishna [EMAIL PROTECTED]
 Sent: Wednesday, September 27, 2000 1:30 PM
 Subject: Re: unicode + oracle query... (suggestions needed...)


 Sandeep,

 You probably need to change the NLS_LANG Oracle setting in the registry.
 Look under
 HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE for this setting and change the
 character set part to UTF8.

 Bob.


 - Original Message -
 From: "Sandeep Krishna" [EMAIL PROTECTED]
 To: "Unicode List" [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Sent: Wednesday, September 27, 2000 9:16 AM
 Subject: Re: unicode + oracle query... (suggestions needed...)


  hi,
 
  thankx for responding.
 
  but when u mention change in the registry..
  could u elaborate about where exactly in reg and what changes are
required
 
  my registry setting shows NLS = American_English.UTF8.
 
  is this the setting u indicated..or something to so with the charset
entry
 :
  autodetect and autodetect_all (in classid...Mimedatabasecharset..)
 
  pls do elaborate
 
  regards,
 
  Sandeep
 
 
 
  - Original Message -
  From: Kedar Moghe [EMAIL PROTECTED]
  To: 'Sandeep Krishna' [EMAIL PROTECTED]
  Sent: Wednesday, September 27, 2000 11:20 AM
  Subject: RE: unicode + oracle query... (suggestions needed...)
 
 
  Sandeep,
 
  I think you need to set the registry charset to UTF8 where database is
  installed. We were was getting the same problem when we use to send
UTF-8
  strings to oracle database after conversion from Shift-JIS to UTF8. That
  time also the byte sequence of the retrieved string is getting changed
and
  some of the bytes are getting replaced with BF.
 
  Regards,
 
  Kedar
 
  -Original Message-
  From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
  Sent: Wednesday, September 27, 2000 11:36 AM
  To: Unicode List
  Subject: unicode + oracle query... (suggestions needed...)
 
 
  hi
 
  actually i have been trying to use ASPs (UTF-8 encoding..) to write
 unicode
  cahracters to an Oracle DB table (varchar2 field)... and then retrieve
 them
  back..
  (i used UTF-8 encoding for both writing to the database and also for
  retriving and displaying..)
 
  there were some amazing observations...
 
  * each  unicode character was taking 7 bytes in the database. (instead
of
  expected 2 or 3...)
  * some unicode characters(or rather code points.) like' F95F' when
encoded
  in UTF-8 was being encoded as EF A5 BF, when it should have been encoded
 as
  EF A5 9F..  in fact many unicode charcters whose encoded form had to had
a
  byte in the range (80..9F) were being somehow 

FW: Implementation of Unicode

2000-09-27 Thread Magda Danish (Unicode)



-Original Message-
From: McGonigle, Laurence [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, September 26, 2000 8:51 PM
To: '[EMAIL PROTECTED]'
Subject: Implementation of Unicode


Hi, we are a large government organisation in Western Australia and require
some advice on the use and implementation of Unicode.  The business area in
question is the Registry of Births, Deaths  Marriages which is a government
agency within the Ministry of Justice.  This agency needs to register all
births, deaths and marriages in the state of Western Australia and has a
policy on what characters it will accept and register for a name.  In
summary the policy has forced the use of an old DOS based Code Page Set
(i.e. Latin 850).  The agency would like to continue to restrict input of
names to characters that appear on this Code Page Set.  With the migration
to a new system planned for February 2001 (system will also be available on
the Internet) it is envisaged that we will need to implement Unicode if we
are to continue to use the characters of the Latin 850 Code Page Set.

Given the above, my question is as follows:
*   Is it possible to implement the Unicode standard but easily restrict
the input of characters to those currently available in the Latin 850 Code
Page Set?

*   Further, if we decide in the future to allow other characters to be
input, is there an easy method available to permit the use of the additional
characters?

I look forward to your response.



 Thank-you
 
 
 Laurence McGonigle
 Project Manager
 Ministry of Justice (Information Services Directorate)
 Ph: 9264 1614
 E-mail: [EMAIL PROTECTED]
 



RE: [idn] nameprep forbidden characters

2000-09-27 Thread Jonathan Rosenne

See my comments inline.

Jony

 -Original Message-
 From: Mark Davis [mailto:[EMAIL PROTECTED]]
 Sent: Sunday, September 17, 2000 10:40 PM
 To: Jonathan Rosenne
 Cc: Unicode List; [EMAIL PROTECTED]; Edmon; [EMAIL PROTECTED]
 Subject: Re: [idn] nameprep forbidden characters


 I'm not trying to argue with you on this issue -- it may very
 well be best for points to be ignored. But I do want to
 understand the situation a bit better. My questions below should
 not be taken as rhetorical criticism, but simply as questions for
 clarification.

 For others, I am also interested in the situation vis-a-vis
 Arabic, whether we should treat it the same as Hebrew in terms of
 the vowel marks (fatha, etc.).

 Mark

 Jonathan Rosenne wrote:

  Why should case be ignored in English?

 Except for an extremely small set of edge cases (such as Polish
 vs polish, God vs god), there is no extra meaning attached to case.

In the context of identifiers such as domain names, I believe the
justification for ignoring case in English is related to convenience and
user friendliness.

Unless it is a leftover form the 6 bit days.


  In Hebrew, points are optional. The word is the same with them
 and without them, or with just some of them.

 I had thought that there were many words with the same base
 letters, but different pronunciations (and meaning), and that
 different vowels would be used for the different pronunciations.
 That's the way for Arabic, and I had assumed it was the same for
 Hebrew. Is that not the case? From the base
 letters in each word are the vowels always predictable, so that
 they are completely optional?

There are homonyms in Hebrew, just as there are in most languages. Some can
be resolved with points, some cannot. Some platforms support points, some do
not, and some do but at some inconvenience. Newspapers can use points, and
do it sparingly, mainly to disambiguate homonyms - say about once per sheet.


  In addition, not all systems support them, and when they do
 most users don't know how to type them. It isn't easy - see
 http://www.qsm.co.il/NewHebrew/wniqud.htm
 
  A domain owner could publish it with points, to clarify the
 pronunciation, but many users would type it without them or even
 get them wrong.

 Do you think that it is a realistic case, that a domain owner
 would use need to points in that manner, and that a significant
 fraction of domain owners would do this?

Not a large number.


  The issue has been discussed at the Hebrew WG of the SII and I
 think there is general agreement on this issue. We plan a paper
 some time in the future.
 
  I feel that when identifiers are case sensitive, such as in C,
 there may be a case for respecting points, although this would
 cause a problem with cross-system portability, but where case is
 ignored, such as in domain names, the emphasis is more on the
 pronunciation rather than the exact spelling.

 I didn't quite get the last sentence. I had thought that the
 vowel marks were used to get the exact pronunciation. If that is
 not true, it may be part of my misunderstanding of the situation.

Points are more than pronunciation, because in modern Hebrew we do not
distinguish between long and short vowels and we do not pronounce the Dagesh
except in three letters.



In summary, we have two alternatives: to disallow points, or to allow them
and ignore them. I think the latter is more friendly.


  Jony
 
   -Original Message-
   From: Mark Davis [mailto:[EMAIL PROTECTED]]
   Sent: Sunday, September 17, 2000 7:58 PM
   To: Unicode List
   Cc: [EMAIL PROTECTED]; Edmon
   Subject: Re: [idn] nameprep forbidden characters
  
  
   I am curious why you feel so strongly that the Hebrew points
   should be ignored
   in domain names. Prima facie, it seems that there is little harm
   in treating
   them no differently from other characters. What problem would
 arise if the
   domain was ABC.COM and I could not get it by typing AB*C.COM?
   (Here uppercase
   stands for Hebrew, and * for a point.) Conversely, if someone
 really did
   register AB*C.COM, would it be a problem that I couldn't get to
   that location by
   typing ABC.COM?
  
   It is my understanding that the vowels are rarely used, and that
   people really
   wouldn't use them in registered domain names anyway. It seems
   that if someone
   did take the trouble to type in the points, that there would be a
   reason for
   their making such a distinction.
  
   I'd appreciate it if you could help me to understand the issue
   more clearly.
  
   Mark
  
   Jonathan Rosenne wrote:
  
We should distinguish "punctuation", like 060C Arabic Comma, and
"diacritics", such as 064E Arabic Fatha. Diacritics is probably
   the wrong
word. I have the impression that you were referring to the latter.
   
For Hebrew, my opinion is that from the point of view of the user,
punctuation should be forbidden, while diacritics such as
 the vowels and
other combining 

RE: Implementation of Unicode

2000-09-27 Thread Carl W. Brown

Dear Sir,

Since Unicode is a superset of codepage 850 you certainly can filter out any
other characters.  I suggest that you filter these out as close to the
keying as possible.  Since you are using codepage 850 data, you might want
to look at using UTF-8 for storing your data since most characters will be
the same as one byte ASCII characters.  Only a few special characters will
be two byte characters (The characters in CP 850 that are  0x7F).  Use the
fixed width form (UCS2 or UTF-16) of Unicode for internal processing such as
string scanning and the like.

Carl


-Original Message-
From: Magda Danish (Unicode) [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 27, 2000 8:47 AM
To: Unicode List
Subject: FW: Implementation of Unicode




-Original Message-
From: McGonigle, Laurence [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, September 26, 2000 8:51 PM
To: '[EMAIL PROTECTED]'
Subject: Implementation of Unicode


Hi, we are a large government organisation in Western Australia and require
some advice on the use and implementation of Unicode.  The business area in
question is the Registry of Births, Deaths  Marriages which is a government
agency within the Ministry of Justice.  This agency needs to register all
births, deaths and marriages in the state of Western Australia and has a
policy on what characters it will accept and register for a name.  In
summary the policy has forced the use of an old DOS based Code Page Set
(i.e. Latin 850).  The agency would like to continue to restrict input of
names to characters that appear on this Code Page Set.  With the migration
to a new system planned for February 2001 (system will also be available on
the Internet) it is envisaged that we will need to implement Unicode if we
are to continue to use the characters of the Latin 850 Code Page Set.

Given the above, my question is as follows:
*   Is it possible to implement the Unicode standard but easily restrict
the input of characters to those currently available in the Latin 850 Code
Page Set?

*   Further, if we decide in the future to allow other characters to be
input, is there an easy method available to permit the use of the additional
characters?

I look forward to your response.



 Thank-you


 Laurence McGonigle
 Project Manager
 Ministry of Justice (Information Services Directorate)
 Ph: 9264 1614
 E-mail: [EMAIL PROTECTED]





Implementation of isLetter()

2000-09-27 Thread John O'Conner

Should an isLetter() implementation return true for "Nl" characters as
well as the usual "L*"?

Regards,
John





RE: [idn] nameprep forbidden characters

2000-09-27 Thread Jonathan Rosenne

See my comments inline.

Jony

 -Original Message-
 From: Mark Davis [mailto:[EMAIL PROTECTED]]
 Sent: Sunday, September 17, 2000 10:40 PM
 To: Jonathan Rosenne
 Cc: Unicode List; [EMAIL PROTECTED]; Edmon; [EMAIL PROTECTED]
 Subject: Re: [idn] nameprep forbidden characters


 I'm not trying to argue with you on this issue -- it may very
 well be best for points to be ignored. But I do want to
 understand the situation a bit better. My questions below should
 not be taken as rhetorical criticism, but simply as questions for
 clarification.

 For others, I am also interested in the situation vis-a-vis
 Arabic, whether we should treat it the same as Hebrew in terms of
 the vowel marks (fatha, etc.).

 Mark

 Jonathan Rosenne wrote:

  Why should case be ignored in English?

 Except for an extremely small set of edge cases (such as Polish
 vs polish, God vs god), there is no extra meaning attached to case.

In the context of identifiers such as domain names, I believe the
justification for ignoring case in English is related to convenience and
user friendliness.

Unless it is a leftover form the 6 bit days.


  In Hebrew, points are optional. The word is the same with them
 and without them, or with just some of them.

 I had thought that there were many words with the same base
 letters, but different pronunciations (and meaning), and that
 different vowels would be used for the different pronunciations.
 That's the way for Arabic, and I had assumed it was the same for
 Hebrew. Is that not the case? From the base
 letters in each word are the vowels always predictable, so that
 they are completely optional?

There are homonyms in Hebrew, just as there are in most languages. Some can
be resolved with points, some cannot. Some platforms support points, some do
not, and some do but at some inconvenience. Newspapers can use points, and
do it sparingly, mainly to disambiguate homonyms - say about once per sheet.


  In addition, not all systems support them, and when they do
 most users don't know how to type them. It isn't easy - see
 http://www.qsm.co.il/NewHebrew/wniqud.htm
 
  A domain owner could publish it with points, to clarify the
 pronunciation, but many users would type it without them or even
 get them wrong.

 Do you think that it is a realistic case, that a domain owner
 would use need to points in that manner, and that a significant
 fraction of domain owners would do this?

Not a large number.


  The issue has been discussed at the Hebrew WG of the SII and I
 think there is general agreement on this issue. We plan a paper
 some time in the future.
 
  I feel that when identifiers are case sensitive, such as in C,
 there may be a case for respecting points, although this would
 cause a problem with cross-system portability, but where case is
 ignored, such as in domain names, the emphasis is more on the
 pronunciation rather than the exact spelling.

 I didn't quite get the last sentence. I had thought that the
 vowel marks were used to get the exact pronunciation. If that is
 not true, it may be part of my misunderstanding of the situation.

Points are more than pronunciation, because in modern Hebrew we do not
distinguish between long and short vowels and we do not pronounce the Dagesh
except in three letters.



In summary, we have two alternatives: to disallow points, or to allow them
and ignore them. I think the latter is more friendly.


  Jony
 
   -Original Message-
   From: Mark Davis [mailto:[EMAIL PROTECTED]]
   Sent: Sunday, September 17, 2000 7:58 PM
   To: Unicode List
   Cc: [EMAIL PROTECTED]; Edmon
   Subject: Re: [idn] nameprep forbidden characters
  
  
   I am curious why you feel so strongly that the Hebrew points
   should be ignored
   in domain names. Prima facie, it seems that there is little harm
   in treating
   them no differently from other characters. What problem would
 arise if the
   domain was ABC.COM and I could not get it by typing AB*C.COM?
   (Here uppercase
   stands for Hebrew, and * for a point.) Conversely, if someone
 really did
   register AB*C.COM, would it be a problem that I couldn't get to
   that location by
   typing ABC.COM?
  
   It is my understanding that the vowels are rarely used, and that
   people really
   wouldn't use them in registered domain names anyway. It seems
   that if someone
   did take the trouble to type in the points, that there would be a
   reason for
   their making such a distinction.
  
   I'd appreciate it if you could help me to understand the issue
   more clearly.
  
   Mark
  
   Jonathan Rosenne wrote:
  
We should distinguish "punctuation", like 060C Arabic Comma, and
"diacritics", such as 064E Arabic Fatha. Diacritics is probably
   the wrong
word. I have the impression that you were referring to the latter.
   
For Hebrew, my opinion is that from the point of view of the user,
punctuation should be forbidden, while diacritics such as
 the vowels and
other combining