Re: unicode + oracle query....... (suggestions needed...)
hi, thankx for responding. but when u mention change in the registry.. could u elaborate about where exactly in reg and what changes are required my registry setting shows NLS = American_English.UTF8. is this the setting u indicated..or something to so with the charset entry : autodetect and autodetect_all (in classid...Mimedatabasecharset..) pls do elaborate regards, Sandeep - Original Message - From: Kedar Moghe [EMAIL PROTECTED] To: 'Sandeep Krishna' [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 11:20 AM Subject: RE: unicode + oracle query... (suggestions needed...) Sandeep, I think you need to set the registry charset to UTF8 where database is installed. We were was getting the same problem when we use to send UTF-8 strings to oracle database after conversion from Shift-JIS to UTF8. That time also the byte sequence of the retrieved string is getting changed and some of the bytes are getting replaced with BF. Regards, Kedar -Original Message- From: Sandeep Krishna [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 27, 2000 11:36 AM To: Unicode List Subject: unicode + oracle query... (suggestions needed...) hi actually i have been trying to use ASPs (UTF-8 encoding..) to write unicode cahracters to an Oracle DB table (varchar2 field)... and then retrieve them back.. (i used UTF-8 encoding for both writing to the database and also for retriving and displaying..) there were some amazing observations... * each unicode character was taking 7 bytes in the database. (instead of expected 2 or 3...) * some unicode characters(or rather code points.) like' F95F' when encoded in UTF-8 was being encoded as EF A5 BF, when it should have been encoded as EF A5 9F.. in fact many unicode charcters whose encoded form had to had a byte in the range (80..9F) were being somehow changed to BF ... thus resulting in incorrect retrieval I was unable to find the reasons for these strange occurrences Pls suggest what could be the causes for these.. regards, Sandeep. *** SANDEEP KRISHNA Member Technical Staff (Priceline.com) H.C.L. Technologies Limited A-1 CD, Sector -16, NOIDA, UP, India. Ph: 91-11-91-4516321 (extn. 1062) Fax: 91-11-91-4510713, 4510226 E-Mail : [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
Re: unicode + oracle query....... (suggestions needed...)
hi... i m thoroughly confused. actually the registry entries for oracle shows 3 entries for NLS_LANG. and that too at the WEB SERVER end and at the DATABASE SERVER end. so that makes tooo many combinations... can someone indicate which of these NLS_LANG entries have to be set as "AMERICAN_AMERICA.UTF8" and if some of them doesnt need this...what exactly should be there pls suggest necessary messures.. regards, Sandeep - Original Message - From: Bob Verbrugge [EMAIL PROTECTED] To: Sandeep Krishna [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 1:30 PM Subject: Re: unicode + oracle query... (suggestions needed...) Sandeep, You probably need to change the NLS_LANG Oracle setting in the registry. Look under HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE for this setting and change the character set part to UTF8. Bob. - Original Message - From: "Sandeep Krishna" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 9:16 AM Subject: Re: unicode + oracle query... (suggestions needed...) hi, thankx for responding. but when u mention change in the registry.. could u elaborate about where exactly in reg and what changes are required my registry setting shows NLS = American_English.UTF8. is this the setting u indicated..or something to so with the charset entry : autodetect and autodetect_all (in classid...Mimedatabasecharset..) pls do elaborate regards, Sandeep - Original Message - From: Kedar Moghe [EMAIL PROTECTED] To: 'Sandeep Krishna' [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 11:20 AM Subject: RE: unicode + oracle query... (suggestions needed...) Sandeep, I think you need to set the registry charset to UTF8 where database is installed. We were was getting the same problem when we use to send UTF-8 strings to oracle database after conversion from Shift-JIS to UTF8. That time also the byte sequence of the retrieved string is getting changed and some of the bytes are getting replaced with BF. Regards, Kedar -Original Message- From: Sandeep Krishna [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 27, 2000 11:36 AM To: Unicode List Subject: unicode + oracle query... (suggestions needed...) hi actually i have been trying to use ASPs (UTF-8 encoding..) to write unicode cahracters to an Oracle DB table (varchar2 field)... and then retrieve them back.. (i used UTF-8 encoding for both writing to the database and also for retriving and displaying..) there were some amazing observations... * each unicode character was taking 7 bytes in the database. (instead of expected 2 or 3...) * some unicode characters(or rather code points.) like' F95F' when encoded in UTF-8 was being encoded as EF A5 BF, when it should have been encoded as EF A5 9F.. in fact many unicode charcters whose encoded form had to had a byte in the range (80..9F) were being somehow changed to BF ... thus resulting in incorrect retrieval I was unable to find the reasons for these strange occurrences Pls suggest what could be the causes for these.. regards, Sandeep. *** SANDEEP KRISHNA Member Technical Staff (Priceline.com) H.C.L. Technologies Limited A-1 CD, Sector -16, NOIDA, UP, India. Ph: 91-11-91-4516321 (extn. 1062) Fax: 91-11-91-4510713, 4510226 E-Mail : [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
RE: unicode + oracle query....... (suggestions needed...)
Sandeep, I think you need to change at following three places, HKEY_LOCAL_MACHINE\ORACLE\NLS_LANG HKEY_LOCAL_MACHINE\ORACLE\ALL_HOMES\ID0\NLS_LANG HKEY_LOCAL_MACHINE\ORACLE\HOME0\NLS_LANG Best of luck Regards, Kedar -Original Message- From: Sandeep Krishna [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 27, 2000 5:45 PM To: Carl W. Brown; Bob Verbrugge; Kedar Moghe Cc: [EMAIL PROTECTED] Subject: Re: unicode + oracle query... (suggestions needed...) hi... i m thoroughly confused. actually the registry entries for oracle shows 3 entries for NLS_LANG. and that too at the WEB SERVER end and at the DATABASE SERVER end. so that makes tooo many combinations... can someone indicate which of these NLS_LANG entries have to be set as "AMERICAN_AMERICA.UTF8" and if some of them doesnt need this...what exactly should be there pls suggest necessary messures.. regards, Sandeep - Original Message - From: Bob Verbrugge [EMAIL PROTECTED] To: Sandeep Krishna [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 1:30 PM Subject: Re: unicode + oracle query... (suggestions needed...) Sandeep, You probably need to change the NLS_LANG Oracle setting in the registry. Look under HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE for this setting and change the character set part to UTF8. Bob. - Original Message - From: "Sandeep Krishna" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 9:16 AM Subject: Re: unicode + oracle query... (suggestions needed...) hi, thankx for responding. but when u mention change in the registry.. could u elaborate about where exactly in reg and what changes are required my registry setting shows NLS = American_English.UTF8. is this the setting u indicated..or something to so with the charset entry : autodetect and autodetect_all (in classid...Mimedatabasecharset..) pls do elaborate regards, Sandeep - Original Message - From: Kedar Moghe [EMAIL PROTECTED] To: 'Sandeep Krishna' [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 11:20 AM Subject: RE: unicode + oracle query... (suggestions needed...) Sandeep, I think you need to set the registry charset to UTF8 where database is installed. We were was getting the same problem when we use to send UTF-8 strings to oracle database after conversion from Shift-JIS to UTF8. That time also the byte sequence of the retrieved string is getting changed and some of the bytes are getting replaced with BF. Regards, Kedar -Original Message- From: Sandeep Krishna [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 27, 2000 11:36 AM To: Unicode List Subject: unicode + oracle query... (suggestions needed...) hi actually i have been trying to use ASPs (UTF-8 encoding..) to write unicode cahracters to an Oracle DB table (varchar2 field)... and then retrieve them back.. (i used UTF-8 encoding for both writing to the database and also for retriving and displaying..) there were some amazing observations... * each unicode character was taking 7 bytes in the database. (instead of expected 2 or 3...) * some unicode characters(or rather code points.) like' F95F' when encoded in UTF-8 was being encoded as EF A5 BF, when it should have been encoded as EF A5 9F.. in fact many unicode charcters whose encoded form had to had a byte in the range (80..9F) were being somehow changed to BF ... thus resulting in incorrect retrieval I was unable to find the reasons for these strange occurrences Pls suggest what could be the causes for these.. regards, Sandeep. *** SANDEEP KRISHNA Member Technical Staff (Priceline.com) H.C.L. Technologies Limited A-1 CD, Sector -16, NOIDA, UP, India. Ph: 91-11-91-4516321 (extn. 1062) Fax: 91-11-91-4510713, 4510226 E-Mail : [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
Re: unicode + oracle query....... (suggestions needed...)
i mean all the entries at both Web server machine's registry and Oracle Database server machine's registry or either one. in our setup... my machine is the Web Server and the Oracle Server is a separate machine please clarify regards, Sandeep - Original Message - From: Kedar Moghe [EMAIL PROTECTED] To: Unicode List [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 3:21 PM Subject: RE: unicode + oracle query... (suggestions needed...) Sandeep, I think you need to change at following three places, HKEY_LOCAL_MACHINE\ORACLE\NLS_LANG HKEY_LOCAL_MACHINE\ORACLE\ALL_HOMES\ID0\NLS_LANG HKEY_LOCAL_MACHINE\ORACLE\HOME0\NLS_LANG Best of luck Regards, Kedar -Original Message- From: Sandeep Krishna [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 27, 2000 5:45 PM To: Carl W. Brown; Bob Verbrugge; Kedar Moghe Cc: [EMAIL PROTECTED] Subject: Re: unicode + oracle query... (suggestions needed...) hi... i m thoroughly confused. actually the registry entries for oracle shows 3 entries for NLS_LANG. and that too at the WEB SERVER end and at the DATABASE SERVER end. so that makes tooo many combinations... can someone indicate which of these NLS_LANG entries have to be set as "AMERICAN_AMERICA.UTF8" and if some of them doesnt need this...what exactly should be there pls suggest necessary messures.. regards, Sandeep - Original Message - From: Bob Verbrugge [EMAIL PROTECTED] To: Sandeep Krishna [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 1:30 PM Subject: Re: unicode + oracle query... (suggestions needed...) Sandeep, You probably need to change the NLS_LANG Oracle setting in the registry. Look under HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE for this setting and change the character set part to UTF8. Bob. - Original Message - From: "Sandeep Krishna" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 9:16 AM Subject: Re: unicode + oracle query... (suggestions needed...) hi, thankx for responding. but when u mention change in the registry.. could u elaborate about where exactly in reg and what changes are required my registry setting shows NLS = American_English.UTF8. is this the setting u indicated..or something to so with the charset entry : autodetect and autodetect_all (in classid...Mimedatabasecharset..) pls do elaborate regards, Sandeep - Original Message - From: Kedar Moghe [EMAIL PROTECTED] To: 'Sandeep Krishna' [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 11:20 AM Subject: RE: unicode + oracle query... (suggestions needed...) Sandeep, I think you need to set the registry charset to UTF8 where database is installed. We were was getting the same problem when we use to send UTF-8 strings to oracle database after conversion from Shift-JIS to UTF8. That time also the byte sequence of the retrieved string is getting changed and some of the bytes are getting replaced with BF. Regards, Kedar -Original Message- From: Sandeep Krishna [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 27, 2000 11:36 AM To: Unicode List Subject: unicode + oracle query... (suggestions needed...) hi actually i have been trying to use ASPs (UTF-8 encoding..) to write unicode cahracters to an Oracle DB table (varchar2 field)... and then retrieve them back.. (i used UTF-8 encoding for both writing to the database and also for retriving and displaying..) there were some amazing observations... * each unicode character was taking 7 bytes in the database. (instead of expected 2 or 3...) * some unicode characters(or rather code points.) like' F95F' when encoded in UTF-8 was being encoded as EF A5 BF, when it should have been encoded as EF A5 9F.. in fact many unicode charcters whose encoded form had to had a byte in the range (80..9F) were being somehow changed to BF ... thus resulting in incorrect retrieval I was unable to find the reasons for these strange occurrences Pls suggest what could be the causes for these.. regards, Sandeep. *** SANDEEP KRISHNA Member Technical Staff (Priceline.com) H.C.L. Technologies Limited A-1 CD, Sector -16, NOIDA, UP, India. Ph: 91-11-91-4516321 (extn. 1062) Fax: 91-11-91-4510713, 4510226 E-Mail : [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
RE: unicode + oracle query....... (suggestions needed...)
Only registry entries on the database machine. Not any other entry. Regsrds, Kedar -Original Message- From: Sandeep Krishna [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 27, 2000 6:21 PM To: Kedar Moghe Cc: [EMAIL PROTECTED] Subject: Re: unicode + oracle query... (suggestions needed...) i mean all the entries at both Web server machine's registry and Oracle Database server machine's registry or either one. in our setup... my machine is the Web Server and the Oracle Server is a separate machine please clarify regards, Sandeep - Original Message - From: Kedar Moghe [EMAIL PROTECTED] To: Unicode List [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 3:21 PM Subject: RE: unicode + oracle query... (suggestions needed...) Sandeep, I think you need to change at following three places, HKEY_LOCAL_MACHINE\ORACLE\NLS_LANG HKEY_LOCAL_MACHINE\ORACLE\ALL_HOMES\ID0\NLS_LANG HKEY_LOCAL_MACHINE\ORACLE\HOME0\NLS_LANG Best of luck Regards, Kedar -Original Message- From: Sandeep Krishna [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 27, 2000 5:45 PM To: Carl W. Brown; Bob Verbrugge; Kedar Moghe Cc: [EMAIL PROTECTED] Subject: Re: unicode + oracle query... (suggestions needed...) hi... i m thoroughly confused. actually the registry entries for oracle shows 3 entries for NLS_LANG. and that too at the WEB SERVER end and at the DATABASE SERVER end. so that makes tooo many combinations... can someone indicate which of these NLS_LANG entries have to be set as "AMERICAN_AMERICA.UTF8" and if some of them doesnt need this...what exactly should be there pls suggest necessary messures.. regards, Sandeep - Original Message - From: Bob Verbrugge [EMAIL PROTECTED] To: Sandeep Krishna [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 1:30 PM Subject: Re: unicode + oracle query... (suggestions needed...) Sandeep, You probably need to change the NLS_LANG Oracle setting in the registry. Look under HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE for this setting and change the character set part to UTF8. Bob. - Original Message - From: "Sandeep Krishna" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 9:16 AM Subject: Re: unicode + oracle query... (suggestions needed...) hi, thankx for responding. but when u mention change in the registry.. could u elaborate about where exactly in reg and what changes are required my registry setting shows NLS = American_English.UTF8. is this the setting u indicated..or something to so with the charset entry : autodetect and autodetect_all (in classid...Mimedatabasecharset..) pls do elaborate regards, Sandeep - Original Message - From: Kedar Moghe [EMAIL PROTECTED] To: 'Sandeep Krishna' [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 11:20 AM Subject: RE: unicode + oracle query... (suggestions needed...) Sandeep, I think you need to set the registry charset to UTF8 where database is installed. We were was getting the same problem when we use to send UTF-8 strings to oracle database after conversion from Shift-JIS to UTF8. That time also the byte sequence of the retrieved string is getting changed and some of the bytes are getting replaced with BF. Regards, Kedar -Original Message- From: Sandeep Krishna [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 27, 2000 11:36 AM To: Unicode List Subject: unicode + oracle query... (suggestions needed...) hi actually i have been trying to use ASPs (UTF-8 encoding..) to write unicode cahracters to an Oracle DB table (varchar2 field)... and then retrieve them back.. (i used UTF-8 encoding for both writing to the database and also for retriving and displaying..) there were some amazing observations... * each unicode character was taking 7 bytes in the database. (instead of expected 2 or 3...) * some unicode characters(or rather code points.) like' F95F' when encoded in UTF-8 was being encoded as EF A5 BF, when it should have been encoded as EF A5 9F.. in fact many unicode charcters whose encoded form had to had a byte in the range (80..9F) were being somehow changed to BF ... thus resulting in incorrect retrieval I was unable to find the reasons for these strange occurrences Pls suggest what could be the causes for these.. regards, Sandeep. *** SANDEEP KRISHNA Member Technical Staff (Priceline.com) H.C.L. Technologies Limited A-1 CD, Sector -16, NOIDA, UP, India. Ph: 91-11-91-4516321 (extn. 1062) Fax: 91-11-91-4510713, 4510226 E-Mail : [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
Re: unicode + oracle query....... (suggestions needed...)
Sandeep, Can you explain exactly what you are doing to get the data from ASP into the Oracle database? Perhaps post the ASP code? Like most scriptoing languages, VBScript and JScript both support UCS-2, and it is really usually the Oracle ODBC or OLE DB driver that has the job of converting the text from UCS-2 to UTF-8. I would wonder if what you are seeing is some type of "double conversion?" So the things that would be interesting to know: 1) The data access method to Oracle 2) Version of the driver being used 3) A sample of the code/script being used michka a new book on internationalization in VB at http://www.i18nWithVB.com/ - Original Message - From: "Sandeep Krishna" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 3:12 AM Subject: Re: unicode + oracle query... (suggestions needed...) i mean all the entries at both Web server machine's registry and Oracle Database server machine's registry or either one. in our setup... my machine is the Web Server and the Oracle Server is a separate machine please clarify regards, Sandeep - Original Message - From: Kedar Moghe [EMAIL PROTECTED] To: Unicode List [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 3:21 PM Subject: RE: unicode + oracle query... (suggestions needed...) Sandeep, I think you need to change at following three places, HKEY_LOCAL_MACHINE\ORACLE\NLS_LANG HKEY_LOCAL_MACHINE\ORACLE\ALL_HOMES\ID0\NLS_LANG HKEY_LOCAL_MACHINE\ORACLE\HOME0\NLS_LANG Best of luck Regards, Kedar -Original Message- From: Sandeep Krishna [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 27, 2000 5:45 PM To: Carl W. Brown; Bob Verbrugge; Kedar Moghe Cc: [EMAIL PROTECTED] Subject: Re: unicode + oracle query... (suggestions needed...) hi... i m thoroughly confused. actually the registry entries for oracle shows 3 entries for NLS_LANG. and that too at the WEB SERVER end and at the DATABASE SERVER end. so that makes tooo many combinations... can someone indicate which of these NLS_LANG entries have to be set as "AMERICAN_AMERICA.UTF8" and if some of them doesnt need this...what exactly should be there pls suggest necessary messures.. regards, Sandeep - Original Message - From: Bob Verbrugge [EMAIL PROTECTED] To: Sandeep Krishna [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 1:30 PM Subject: Re: unicode + oracle query... (suggestions needed...) Sandeep, You probably need to change the NLS_LANG Oracle setting in the registry. Look under HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE for this setting and change the character set part to UTF8. Bob. - Original Message - From: "Sandeep Krishna" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 9:16 AM Subject: Re: unicode + oracle query... (suggestions needed...) hi, thankx for responding. but when u mention change in the registry.. could u elaborate about where exactly in reg and what changes are required my registry setting shows NLS = American_English.UTF8. is this the setting u indicated..or something to so with the charset entry : autodetect and autodetect_all (in classid...Mimedatabasecharset..) pls do elaborate regards, Sandeep - Original Message - From: Kedar Moghe [EMAIL PROTECTED] To: 'Sandeep Krishna' [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 11:20 AM Subject: RE: unicode + oracle query... (suggestions needed...) Sandeep, I think you need to set the registry charset to UTF8 where database is installed. We were was getting the same problem when we use to send UTF-8 strings to oracle database after conversion from Shift-JIS to UTF8. That time also the byte sequence of the retrieved string is getting changed and some of the bytes are getting replaced with BF. Regards, Kedar -Original Message- From: Sandeep Krishna [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 27, 2000 11:36 AM To: Unicode List Subject: unicode + oracle query... (suggestions needed...) hi actually i have been trying to use ASPs (UTF-8 encoding..) to write unicode cahracters to an Oracle DB table (varchar2 field)... and then retrieve them back.. (i used UTF-8 encoding for both writing to the database and also for retriving and displaying..) there were some amazing observations... * each unicode character was taking 7 bytes in the database. (instead of expected 2 or 3...) * some unicode characters(or rather code points.) like' F95F' when encoded in UTF-8 was being encoded as EF A5 BF, when it should have been encoded as EF A5 9F.. in fact many unicode charcters whose encoded form had to had a byte in the range (80..9F) were being somehow
FW: Implementation of Unicode
-Original Message- From: McGonigle, Laurence [mailto:[EMAIL PROTECTED]] Sent: Tuesday, September 26, 2000 8:51 PM To: '[EMAIL PROTECTED]' Subject: Implementation of Unicode Hi, we are a large government organisation in Western Australia and require some advice on the use and implementation of Unicode. The business area in question is the Registry of Births, Deaths Marriages which is a government agency within the Ministry of Justice. This agency needs to register all births, deaths and marriages in the state of Western Australia and has a policy on what characters it will accept and register for a name. In summary the policy has forced the use of an old DOS based Code Page Set (i.e. Latin 850). The agency would like to continue to restrict input of names to characters that appear on this Code Page Set. With the migration to a new system planned for February 2001 (system will also be available on the Internet) it is envisaged that we will need to implement Unicode if we are to continue to use the characters of the Latin 850 Code Page Set. Given the above, my question is as follows: * Is it possible to implement the Unicode standard but easily restrict the input of characters to those currently available in the Latin 850 Code Page Set? * Further, if we decide in the future to allow other characters to be input, is there an easy method available to permit the use of the additional characters? I look forward to your response. Thank-you Laurence McGonigle Project Manager Ministry of Justice (Information Services Directorate) Ph: 9264 1614 E-mail: [EMAIL PROTECTED]
RE: [idn] nameprep forbidden characters
See my comments inline. Jony -Original Message- From: Mark Davis [mailto:[EMAIL PROTECTED]] Sent: Sunday, September 17, 2000 10:40 PM To: Jonathan Rosenne Cc: Unicode List; [EMAIL PROTECTED]; Edmon; [EMAIL PROTECTED] Subject: Re: [idn] nameprep forbidden characters I'm not trying to argue with you on this issue -- it may very well be best for points to be ignored. But I do want to understand the situation a bit better. My questions below should not be taken as rhetorical criticism, but simply as questions for clarification. For others, I am also interested in the situation vis-a-vis Arabic, whether we should treat it the same as Hebrew in terms of the vowel marks (fatha, etc.). Mark Jonathan Rosenne wrote: Why should case be ignored in English? Except for an extremely small set of edge cases (such as Polish vs polish, God vs god), there is no extra meaning attached to case. In the context of identifiers such as domain names, I believe the justification for ignoring case in English is related to convenience and user friendliness. Unless it is a leftover form the 6 bit days. In Hebrew, points are optional. The word is the same with them and without them, or with just some of them. I had thought that there were many words with the same base letters, but different pronunciations (and meaning), and that different vowels would be used for the different pronunciations. That's the way for Arabic, and I had assumed it was the same for Hebrew. Is that not the case? From the base letters in each word are the vowels always predictable, so that they are completely optional? There are homonyms in Hebrew, just as there are in most languages. Some can be resolved with points, some cannot. Some platforms support points, some do not, and some do but at some inconvenience. Newspapers can use points, and do it sparingly, mainly to disambiguate homonyms - say about once per sheet. In addition, not all systems support them, and when they do most users don't know how to type them. It isn't easy - see http://www.qsm.co.il/NewHebrew/wniqud.htm A domain owner could publish it with points, to clarify the pronunciation, but many users would type it without them or even get them wrong. Do you think that it is a realistic case, that a domain owner would use need to points in that manner, and that a significant fraction of domain owners would do this? Not a large number. The issue has been discussed at the Hebrew WG of the SII and I think there is general agreement on this issue. We plan a paper some time in the future. I feel that when identifiers are case sensitive, such as in C, there may be a case for respecting points, although this would cause a problem with cross-system portability, but where case is ignored, such as in domain names, the emphasis is more on the pronunciation rather than the exact spelling. I didn't quite get the last sentence. I had thought that the vowel marks were used to get the exact pronunciation. If that is not true, it may be part of my misunderstanding of the situation. Points are more than pronunciation, because in modern Hebrew we do not distinguish between long and short vowels and we do not pronounce the Dagesh except in three letters. In summary, we have two alternatives: to disallow points, or to allow them and ignore them. I think the latter is more friendly. Jony -Original Message- From: Mark Davis [mailto:[EMAIL PROTECTED]] Sent: Sunday, September 17, 2000 7:58 PM To: Unicode List Cc: [EMAIL PROTECTED]; Edmon Subject: Re: [idn] nameprep forbidden characters I am curious why you feel so strongly that the Hebrew points should be ignored in domain names. Prima facie, it seems that there is little harm in treating them no differently from other characters. What problem would arise if the domain was ABC.COM and I could not get it by typing AB*C.COM? (Here uppercase stands for Hebrew, and * for a point.) Conversely, if someone really did register AB*C.COM, would it be a problem that I couldn't get to that location by typing ABC.COM? It is my understanding that the vowels are rarely used, and that people really wouldn't use them in registered domain names anyway. It seems that if someone did take the trouble to type in the points, that there would be a reason for their making such a distinction. I'd appreciate it if you could help me to understand the issue more clearly. Mark Jonathan Rosenne wrote: We should distinguish "punctuation", like 060C Arabic Comma, and "diacritics", such as 064E Arabic Fatha. Diacritics is probably the wrong word. I have the impression that you were referring to the latter. For Hebrew, my opinion is that from the point of view of the user, punctuation should be forbidden, while diacritics such as the vowels and other combining
RE: Implementation of Unicode
Dear Sir, Since Unicode is a superset of codepage 850 you certainly can filter out any other characters. I suggest that you filter these out as close to the keying as possible. Since you are using codepage 850 data, you might want to look at using UTF-8 for storing your data since most characters will be the same as one byte ASCII characters. Only a few special characters will be two byte characters (The characters in CP 850 that are 0x7F). Use the fixed width form (UCS2 or UTF-16) of Unicode for internal processing such as string scanning and the like. Carl -Original Message- From: Magda Danish (Unicode) [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 27, 2000 8:47 AM To: Unicode List Subject: FW: Implementation of Unicode -Original Message- From: McGonigle, Laurence [mailto:[EMAIL PROTECTED]] Sent: Tuesday, September 26, 2000 8:51 PM To: '[EMAIL PROTECTED]' Subject: Implementation of Unicode Hi, we are a large government organisation in Western Australia and require some advice on the use and implementation of Unicode. The business area in question is the Registry of Births, Deaths Marriages which is a government agency within the Ministry of Justice. This agency needs to register all births, deaths and marriages in the state of Western Australia and has a policy on what characters it will accept and register for a name. In summary the policy has forced the use of an old DOS based Code Page Set (i.e. Latin 850). The agency would like to continue to restrict input of names to characters that appear on this Code Page Set. With the migration to a new system planned for February 2001 (system will also be available on the Internet) it is envisaged that we will need to implement Unicode if we are to continue to use the characters of the Latin 850 Code Page Set. Given the above, my question is as follows: * Is it possible to implement the Unicode standard but easily restrict the input of characters to those currently available in the Latin 850 Code Page Set? * Further, if we decide in the future to allow other characters to be input, is there an easy method available to permit the use of the additional characters? I look forward to your response. Thank-you Laurence McGonigle Project Manager Ministry of Justice (Information Services Directorate) Ph: 9264 1614 E-mail: [EMAIL PROTECTED]
Implementation of isLetter()
Should an isLetter() implementation return true for "Nl" characters as well as the usual "L*"? Regards, John
RE: [idn] nameprep forbidden characters
See my comments inline. Jony -Original Message- From: Mark Davis [mailto:[EMAIL PROTECTED]] Sent: Sunday, September 17, 2000 10:40 PM To: Jonathan Rosenne Cc: Unicode List; [EMAIL PROTECTED]; Edmon; [EMAIL PROTECTED] Subject: Re: [idn] nameprep forbidden characters I'm not trying to argue with you on this issue -- it may very well be best for points to be ignored. But I do want to understand the situation a bit better. My questions below should not be taken as rhetorical criticism, but simply as questions for clarification. For others, I am also interested in the situation vis-a-vis Arabic, whether we should treat it the same as Hebrew in terms of the vowel marks (fatha, etc.). Mark Jonathan Rosenne wrote: Why should case be ignored in English? Except for an extremely small set of edge cases (such as Polish vs polish, God vs god), there is no extra meaning attached to case. In the context of identifiers such as domain names, I believe the justification for ignoring case in English is related to convenience and user friendliness. Unless it is a leftover form the 6 bit days. In Hebrew, points are optional. The word is the same with them and without them, or with just some of them. I had thought that there were many words with the same base letters, but different pronunciations (and meaning), and that different vowels would be used for the different pronunciations. That's the way for Arabic, and I had assumed it was the same for Hebrew. Is that not the case? From the base letters in each word are the vowels always predictable, so that they are completely optional? There are homonyms in Hebrew, just as there are in most languages. Some can be resolved with points, some cannot. Some platforms support points, some do not, and some do but at some inconvenience. Newspapers can use points, and do it sparingly, mainly to disambiguate homonyms - say about once per sheet. In addition, not all systems support them, and when they do most users don't know how to type them. It isn't easy - see http://www.qsm.co.il/NewHebrew/wniqud.htm A domain owner could publish it with points, to clarify the pronunciation, but many users would type it without them or even get them wrong. Do you think that it is a realistic case, that a domain owner would use need to points in that manner, and that a significant fraction of domain owners would do this? Not a large number. The issue has been discussed at the Hebrew WG of the SII and I think there is general agreement on this issue. We plan a paper some time in the future. I feel that when identifiers are case sensitive, such as in C, there may be a case for respecting points, although this would cause a problem with cross-system portability, but where case is ignored, such as in domain names, the emphasis is more on the pronunciation rather than the exact spelling. I didn't quite get the last sentence. I had thought that the vowel marks were used to get the exact pronunciation. If that is not true, it may be part of my misunderstanding of the situation. Points are more than pronunciation, because in modern Hebrew we do not distinguish between long and short vowels and we do not pronounce the Dagesh except in three letters. In summary, we have two alternatives: to disallow points, or to allow them and ignore them. I think the latter is more friendly. Jony -Original Message- From: Mark Davis [mailto:[EMAIL PROTECTED]] Sent: Sunday, September 17, 2000 7:58 PM To: Unicode List Cc: [EMAIL PROTECTED]; Edmon Subject: Re: [idn] nameprep forbidden characters I am curious why you feel so strongly that the Hebrew points should be ignored in domain names. Prima facie, it seems that there is little harm in treating them no differently from other characters. What problem would arise if the domain was ABC.COM and I could not get it by typing AB*C.COM? (Here uppercase stands for Hebrew, and * for a point.) Conversely, if someone really did register AB*C.COM, would it be a problem that I couldn't get to that location by typing ABC.COM? It is my understanding that the vowels are rarely used, and that people really wouldn't use them in registered domain names anyway. It seems that if someone did take the trouble to type in the points, that there would be a reason for their making such a distinction. I'd appreciate it if you could help me to understand the issue more clearly. Mark Jonathan Rosenne wrote: We should distinguish "punctuation", like 060C Arabic Comma, and "diacritics", such as 064E Arabic Fatha. Diacritics is probably the wrong word. I have the impression that you were referring to the latter. For Hebrew, my opinion is that from the point of view of the user, punctuation should be forbidden, while diacritics such as the vowels and other combining