do all browsers support UTF-8 encoding???
hi guys!! can someone tell me whether all browsers (atleast IE 2,3.0 and Netscape...) support encoding/deocding on UTF-8 and also, can there be an instance of browser (say a primitave version of a Chinese Netscape) that supports Big 5 encoding but not UTF-8. THis info. is crucial as i expect all users (of the site) to be capable of using only UTF-8 encoding... so if there is a user whose browser doesnt support UTF-8 or it supports Big 5 but not UTF-8 then this is trouble.. anyone with some idea on this issue... regards, Sandeep *** SANDEEP KRISHNAMember Technical Staff (Priceline.com)H.C.L. Technologies LimitedA-1 CD, Sector -16, NOIDA, UP, India.Ph: 91-11-91-4516321 (extn. 1062)Fax: 91-11-91-4510713, 4510226E-Mail : [EMAIL PROTECTED] ~Don't frown, because you never know who's falling in love with your smile!~
Re: do all browsers support UTF-8 encoding???
hi... well.. as per ur suggestion.. i shouldnt send in UTF-8 coded text... and instead should send in text in local scripts (Big5, GB..., Shift-JIS etc.. ) but doesnt that implicitly imply that i need to have a separate middle tier support for each locale... that is i dedicate separate Web Servers specfically meant for a particular locale(that is it only writes and reads to the DB server on a particualar encoding ..say. Big5) but my kindof set up doesnt permit me the liberty of separate Web Servers for separate locales.(Business Rules.) so i dont think that solution hold valid for my case any elaborations/clarifications. regards, Sandeep - Original Message - From: [EMAIL PROTECTED] To: Unicode List [EMAIL PROTECTED] Cc: Unicode List [EMAIL PROTECTED] Sent: Wednesday, October 04, 2000 10:22 PM Subject: Re: do all browsers support UTF-8 encoding??? Hi Sandeep, Maybe this wasn't clear, but... IE 2,3,4.x and Netscape 2, 3, and 4.x will not display Chinese characters using the UTF-8 encoding as installed. They set the font for the UTF-8 encoding to "Times New Roman" and therefore display black squares (the "empty glyph") for all Chinese characters. A lot of us think that you should not send UTF-8 to the browser if you are concerned about having large numbers of people with older browser versions (and cannot ensure that they all set the font to something more approprite, i.e. in a controlled environment such as an intranet). This appears to be your case. Short story: Work in Unicode (your choice, UTF-8 or UTF-16) at the server. Send UTF-8 to "modern" browsers (IE 5.x, NN 6.x). Send legacy encodings (such as Big5) to older browsers. Send UTF-8 to browsers serving languages that are compatible with UTF-8 (Latin script languages in Western and Central Europe mostly). Regards, Addison === Addison P. PhillipsPrincipal Consultant Inter-Locale LLChttp://www.inter-locale.com Los Gatos, CA, USA mailto:[EMAIL PROTECTED] +1 408.210.3569 (mobile) +1 408.904.4762 (fax) === Globalization Engineering Consulting Services On Wed, 4 Oct 2000, Sandeep Krishna wrote: hi guys!! can someone tell me whether all browsers (atleast IE 2,3.0 and Netscape...) support encoding/deocding on UTF-8 and also, can there be an instance of browser (say a primitave version of a Chinese Netscape) that supports Big 5 encoding but not UTF-8. THis info. is crucial as i expect all users (of the site) to be capable of using only UTF-8 encoding... so if there is a user whose browser doesnt support UTF-8 or it supports Big 5 but not UTF-8 then this is trouble.. anyone with some idea on this issue... regards, Sandeep ******* SANDEEP KRISHNA Member Technical Staff (Priceline.com) H.C.L. Technologies Limited A-1 CD, Sector -16, NOIDA, UP, India. Ph: 91-11-91-4516321 (extn. 1062) Fax: 91-11-91-4510713, 4510226 E-Mail : [EMAIL PROTECTED] ~Don't frown, because you never know who's falling in love with your smile!~
Re: do all browsers support UTF-8 encoding???
(this is just the extension of my query that i have also copied down out here...) one more thing can i possibly change encoding of a chinese character in Big5 to UTF-8... i mean how exactly do i map a character in Big5 to the same character in UTF-8??? (last query) hi... well.. as per ur suggestion.. i shouldnt send in UTF-8 coded text... and instead should send in text in local scripts (Big5, GB..., Shift-JIS etc.. ) but doesnt that implicitly imply that i need to have a separate middle tier support for each locale... that is i dedicate separate Web Servers specfically meant for a particular locale(that is it only writes and reads to the DB server on a particualar encoding ..say. Big5) but my kindof set up doesnt permit me the liberty of separate Web Servers for separate locales.(Business Rules.) so i dont think that solution hold valid for my case any elaborations/clarifications. regards, Sandeep - Original Message - From: [EMAIL PROTECTED] To: Unicode List [EMAIL PROTECTED] Cc: Unicode List [EMAIL PROTECTED] Sent: Wednesday, October 04, 2000 10:22 PM Subject: Re: do all browsers support UTF-8 encoding??? Hi Sandeep, Maybe this wasn't clear, but... IE 2,3,4.x and Netscape 2, 3, and 4.x will not display Chinese characters using the UTF-8 encoding as installed. They set the font for the UTF-8 encoding to "Times New Roman" and therefore display black squares (the "empty glyph") for all Chinese characters. A lot of us think that you should not send UTF-8 to the browser if you are concerned about having large numbers of people with older browser versions (and cannot ensure that they all set the font to something more approprite, i.e. in a controlled environment such as an intranet). This appears to be your case. Short story: Work in Unicode (your choice, UTF-8 or UTF-16) at the server. Send UTF-8 to "modern" browsers (IE 5.x, NN 6.x). Send legacy encodings (such as Big5) to older browsers. Send UTF-8 to browsers serving languages that are compatible with UTF-8 (Latin script languages in Western and Central Europe mostly). Regards, Addison === Addison P. PhillipsPrincipal Consultant Inter-Locale LLChttp://www.inter-locale.com Los Gatos, CA, USA mailto:[EMAIL PROTECTED] +1 408.210.3569 (mobile) +1 408.904.4762 (fax) === Globalization Engineering Consulting Services On Wed, 4 Oct 2000, Sandeep Krishna wrote: hi guys!! can someone tell me whether all browsers (atleast IE 2,3.0 and Netscape...) support encoding/deocding on UTF-8 and also, can there be an instance of browser (say a primitave version of a Chinese Netscape) that supports Big 5 encoding but not UTF-8. THis info. is crucial as i expect all users (of the site) to be capable of using only UTF-8 encoding... so if there is a user whose browser doesnt support UTF-8 or it supports Big 5 but not UTF-8 then this is trouble.. anyone with some idea on this issue... regards, Sandeep ******* SANDEEP KRISHNA Member Technical Staff (Priceline.com) H.C.L. Technologies Limited A-1 CD, Sector -16, NOIDA, UP, India. Ph: 91-11-91-4516321 (extn. 1062) Fax: 91-11-91-4510713, 4510226 E-Mail : [EMAIL PROTECTED] ~Don't frown, because you never know who's falling in love with your smile!~
how does a chinese keyboard input....
hi, guys, can anyone tell me how does a Chinese Keyboard take inputs... i mean that Chinese has more thana thousand characters... and also majorly their script is Ideographics orcharacters represent symbols... then how can a keyboard incorporate such volume of characters in thier limited no of keys or are the keyboards based on Charactersets that are phonetic??? one more thing does the pressing of a key in such keyboards return their Unicode values or ASCII values that are comprehended as mapped chinese characters by the OS or the word-processor.. kindly clarify... regards, Sandeep *** SANDEEP KRISHNAMember Technical Staff (Priceline.com)H.C.L. Technologies LimitedA-1 CD, Sector -16, NOIDA, UP, India.Ph: 91-11-91-4516321 (extn. 1062)Fax: 91-11-91-4510713, 4510226E-Mail : [EMAIL PROTECTED]
re: encoding??????
HI, thankx a lot for providing solutions of many of my problems.. but i shall take the liberty to ask some more * actually i have been trying to use ASPs (UTF-8 encoding..) to write unicode cahracters to an Oracle DB table (varchar2 field)... and then retrieve them back.. (i used UTF-8 encoding for both writing to the database and also for retriving and displaying..) there were some amazing observations... * each unicode character was taking 7 bytes in the database. (instead of expected 2 or 3...) * some unicode characters(or rather code points.) like' F95F' when encoded in UTF-8 was being encoded as EF A5 BF, when it should have been encoded as EF A5 9F.. in fact many unicode charcters whose encoded form had to had a byte in the range (80..9F) were being somehow changed to BF ... thus resulting in incorrect retrieval I was unable to find the reasons for these strange occurrences THEN SOMEONE SUGGESTED CHANGES IN THE REGISTRY. actually the registry entries for oracle shows 3 entries for NLS_LANG. and that too at the WEB SERVER end and at the DATABASE SERVER end. so that makes tooo many combinations... AND FINALLY HOW DOES THE CHANGING OF REGISTRY TO AMERICAN_AMERICA.UTF8 IMPACT THE DATABASE STORAGE OR DISPLAY PROCESS?? kindly suggest.. thankx and regads, Sandeep - Original Message - From: [EMAIL PROTECTED] To: Unicode List [EMAIL PROTECTED] Sent: Thursday, September 28, 2000 2:18 PM Subject: Re: Encoding On Wed, 27 Sep 2000, Sandeep Krishna wrote: can someone tell me...what does the Encoding in the browser (IE5) imlpy. That's a good question. Internet Explorer 5 is relatively advanced in the area of handling different encodings. It seems to honor the encoding ("charset") as advertized in HTTP headers, and it seems to try to make an educated guess (based on the actual content of data) when no encoding is specified. The details are somewhat obscure and undocumented, though. IE 5 _also_ lets the user override its guess of the encoding; a good thing to do, since quite a many pages are still sent without proper designation of the encoding. The Encoding menu on IE 5 has dual purpose: you can check from it what encoding the browser has assumed when interpreting the data (either from the HTTP headers, or from META tags which try to simulate them, or by heuristic guessing, or by user's explicit selection) - you'll see that alternative checked - or you can make your own guess of what the encoding really is. does it mean that the Encoding (say UTF-8 or Chinese Big5) shall be used for encoding/ decoding any data ..(page) to be displayed or sent Basically, for interpreting the data that the browser has to display. (It may affect e.g. how data sent via forms is encoded by the browser, but I've never studied that side of the matter.) i mean if i use an encoding like Big5 how does it encode a chinese character...similar to utf-8 or differently..??? Do you mean as an IE 5 user, or as a Web document author? If you, as a user, change the selection in the Encoding menu, you're telling the browser to treat the data according to that encoding. Whether it makes sense depends on how the data has actually been encoded. Big5, also known as "Traditional Chinese" is not a Unicode encoding at all, so it is surely different from UTF-8. For a short characterization of Big5, see http://www.dpliv.com/nckuaa/tech/bg5hist.html and can i display a Korean charactrer... using big5??? Depends on what you mean by a "Korean character". I suppose you mean Hangul. As far as I know, Big5 doesn't contain them. For Hangul, you can use either some Korean standard, or Unicode (see part "10.4 Hangul" in the Unicode standard). There are various practical considerations. For example, software used by people in Korea might be better equipped to handle data encoded according to a Korean standard. People elsewhere might cope with Unicode encoded data better. (For example, my IE 4, in a fairly vanilla Windows environment with a few Unicode fonts installed, can display UTF-8 encoded Korean texts just fine - I hope I could just understand them! - but doesn't seem to be able to handle any Korean encoding.) To gain maximum audience, as a Web author, you might consider making your documents available in both (or several) encodings, and link them together (for obvious reasons, with link texts in plain Ascii, which probably means you'd have to use plain English) so that people can try the other version if the first one is illegible. pls explain the Encoding.part?? It's a somewhat confusing issue, and not directly related to Unicode (though it naturally affects Unicode encodings too). If you mean the encoding concept in general, perhaps my http://www.hut.fi/u/jkorpela/chars.html#encoding illustrates a bit; that tutorial of mine contains references which you might find more readable presentations on the topic - mine
Re: unicode + oracle query....... (suggestions needed...)
hi, thankx for responding. but when u mention change in the registry.. could u elaborate about where exactly in reg and what changes are required my registry setting shows NLS = American_English.UTF8. is this the setting u indicated..or something to so with the charset entry : autodetect and autodetect_all (in classid...Mimedatabasecharset..) pls do elaborate regards, Sandeep - Original Message - From: Kedar Moghe [EMAIL PROTECTED] To: 'Sandeep Krishna' [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 11:20 AM Subject: RE: unicode + oracle query... (suggestions needed...) Sandeep, I think you need to set the registry charset to UTF8 where database is installed. We were was getting the same problem when we use to send UTF-8 strings to oracle database after conversion from Shift-JIS to UTF8. That time also the byte sequence of the retrieved string is getting changed and some of the bytes are getting replaced with BF. Regards, Kedar -Original Message- From: Sandeep Krishna [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 27, 2000 11:36 AM To: Unicode List Subject: unicode + oracle query... (suggestions needed...) hi actually i have been trying to use ASPs (UTF-8 encoding..) to write unicode cahracters to an Oracle DB table (varchar2 field)... and then retrieve them back.. (i used UTF-8 encoding for both writing to the database and also for retriving and displaying..) there were some amazing observations... * each unicode character was taking 7 bytes in the database. (instead of expected 2 or 3...) * some unicode characters(or rather code points.) like' F95F' when encoded in UTF-8 was being encoded as EF A5 BF, when it should have been encoded as EF A5 9F.. in fact many unicode charcters whose encoded form had to had a byte in the range (80..9F) were being somehow changed to BF ... thus resulting in incorrect retrieval I was unable to find the reasons for these strange occurrences Pls suggest what could be the causes for these.. regards, Sandeep. *** SANDEEP KRISHNA Member Technical Staff (Priceline.com) H.C.L. Technologies Limited A-1 CD, Sector -16, NOIDA, UP, India. Ph: 91-11-91-4516321 (extn. 1062) Fax: 91-11-91-4510713, 4510226 E-Mail : [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
Re: unicode + oracle query....... (suggestions needed...)
hi... i m thoroughly confused. actually the registry entries for oracle shows 3 entries for NLS_LANG. and that too at the WEB SERVER end and at the DATABASE SERVER end. so that makes tooo many combinations... can someone indicate which of these NLS_LANG entries have to be set as "AMERICAN_AMERICA.UTF8" and if some of them doesnt need this...what exactly should be there pls suggest necessary messures.. regards, Sandeep - Original Message - From: Bob Verbrugge [EMAIL PROTECTED] To: Sandeep Krishna [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 1:30 PM Subject: Re: unicode + oracle query... (suggestions needed...) Sandeep, You probably need to change the NLS_LANG Oracle setting in the registry. Look under HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE for this setting and change the character set part to UTF8. Bob. - Original Message ----- From: "Sandeep Krishna" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 9:16 AM Subject: Re: unicode + oracle query... (suggestions needed...) hi, thankx for responding. but when u mention change in the registry.. could u elaborate about where exactly in reg and what changes are required my registry setting shows NLS = American_English.UTF8. is this the setting u indicated..or something to so with the charset entry : autodetect and autodetect_all (in classid...Mimedatabasecharset..) pls do elaborate regards, Sandeep - Original Message - From: Kedar Moghe [EMAIL PROTECTED] To: 'Sandeep Krishna' [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 11:20 AM Subject: RE: unicode + oracle query... (suggestions needed...) Sandeep, I think you need to set the registry charset to UTF8 where database is installed. We were was getting the same problem when we use to send UTF-8 strings to oracle database after conversion from Shift-JIS to UTF8. That time also the byte sequence of the retrieved string is getting changed and some of the bytes are getting replaced with BF. Regards, Kedar -Original Message- From: Sandeep Krishna [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 27, 2000 11:36 AM To: Unicode List Subject: unicode + oracle query... (suggestions needed...) hi actually i have been trying to use ASPs (UTF-8 encoding..) to write unicode cahracters to an Oracle DB table (varchar2 field)... and then retrieve them back.. (i used UTF-8 encoding for both writing to the database and also for retriving and displaying..) there were some amazing observations... * each unicode character was taking 7 bytes in the database. (instead of expected 2 or 3...) * some unicode characters(or rather code points.) like' F95F' when encoded in UTF-8 was being encoded as EF A5 BF, when it should have been encoded as EF A5 9F.. in fact many unicode charcters whose encoded form had to had a byte in the range (80..9F) were being somehow changed to BF ... thus resulting in incorrect retrieval I was unable to find the reasons for these strange occurrences Pls suggest what could be the causes for these.. regards, Sandeep. ******* SANDEEP KRISHNA Member Technical Staff (Priceline.com) H.C.L. Technologies Limited A-1 CD, Sector -16, NOIDA, UP, India. Ph: 91-11-91-4516321 (extn. 1062) Fax: 91-11-91-4510713, 4510226 E-Mail : [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
Re: unicode + oracle query....... (suggestions needed...)
i mean all the entries at both Web server machine's registry and Oracle Database server machine's registry or either one. in our setup... my machine is the Web Server and the Oracle Server is a separate machine please clarify regards, Sandeep - Original Message - From: Kedar Moghe [EMAIL PROTECTED] To: Unicode List [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 3:21 PM Subject: RE: unicode + oracle query... (suggestions needed...) Sandeep, I think you need to change at following three places, HKEY_LOCAL_MACHINE\ORACLE\NLS_LANG HKEY_LOCAL_MACHINE\ORACLE\ALL_HOMES\ID0\NLS_LANG HKEY_LOCAL_MACHINE\ORACLE\HOME0\NLS_LANG Best of luck Regards, Kedar -Original Message- From: Sandeep Krishna [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 27, 2000 5:45 PM To: Carl W. Brown; Bob Verbrugge; Kedar Moghe Cc: [EMAIL PROTECTED] Subject: Re: unicode + oracle query... (suggestions needed...) hi... i m thoroughly confused. actually the registry entries for oracle shows 3 entries for NLS_LANG. and that too at the WEB SERVER end and at the DATABASE SERVER end. so that makes tooo many combinations... can someone indicate which of these NLS_LANG entries have to be set as "AMERICAN_AMERICA.UTF8" and if some of them doesnt need this...what exactly should be there pls suggest necessary messures.. regards, Sandeep - Original Message - From: Bob Verbrugge [EMAIL PROTECTED] To: Sandeep Krishna [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 1:30 PM Subject: Re: unicode + oracle query... (suggestions needed...) Sandeep, You probably need to change the NLS_LANG Oracle setting in the registry. Look under HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE for this setting and change the character set part to UTF8. Bob. - Original Message ----- From: "Sandeep Krishna" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 9:16 AM Subject: Re: unicode + oracle query... (suggestions needed...) hi, thankx for responding. but when u mention change in the registry.. could u elaborate about where exactly in reg and what changes are required my registry setting shows NLS = American_English.UTF8. is this the setting u indicated..or something to so with the charset entry : autodetect and autodetect_all (in classid...Mimedatabasecharset..) pls do elaborate regards, Sandeep - Original Message - From: Kedar Moghe [EMAIL PROTECTED] To: 'Sandeep Krishna' [EMAIL PROTECTED] Sent: Wednesday, September 27, 2000 11:20 AM Subject: RE: unicode + oracle query... (suggestions needed...) Sandeep, I think you need to set the registry charset to UTF8 where database is installed. We were was getting the same problem when we use to send UTF-8 strings to oracle database after conversion from Shift-JIS to UTF8. That time also the byte sequence of the retrieved string is getting changed and some of the bytes are getting replaced with BF. Regards, Kedar -Original Message- From: Sandeep Krishna [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 27, 2000 11:36 AM To: Unicode List Subject: unicode + oracle query... (suggestions needed...) hi actually i have been trying to use ASPs (UTF-8 encoding..) to write unicode cahracters to an Oracle DB table (varchar2 field)... and then retrieve them back.. (i used UTF-8 encoding for both writing to the database and also for retriving and displaying..) there were some amazing observations... * each unicode character was taking 7 bytes in the database. (instead of expected 2 or 3...) * some unicode characters(or rather code points.) like' F95F' when encoded in UTF-8 was being encoded as EF A5 BF, when it should have been encoded as EF A5 9F.. in fact many unicode charcters whose encoded form had to had a byte in the range (80..9F) were being somehow changed to BF ... thus resulting in incorrect retrieval I was unable to find the reasons for these strange occurrences Pls suggest what could be the causes for these.. regards, Sandeep. ******* SANDEEP KRISHNA Member Technical Staff (Priceline.com) H.C.L. Technologies Limited A-1 CD, Sector -16, NOIDA, UP, India. Ph: 91-11-91-4516321 (extn. 1062) Fax: 91-11-91-4510713, 4510226 E-Mail : [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
unicode + oracle query....... (suggestions needed...)
hi actually i have been trying to use ASPs (UTF-8 encoding..) to write unicode cahracters to an Oracle DB table (varchar2 field)... and then retrieve them back.. (i used UTF-8 encoding for both writing to the database and also for retriving and displaying..) there were some amazing observations... * each unicode character was taking 7 bytes in the database. (instead of expected 2 or 3...) * some unicode characters(or rather code points.) like' F95F' when encoded in UTF-8 was being encoded as EF A5 BF, when it should have been encoded as EF A5 9F.. in fact many unicode charcters whose encoded form had to had a byte in the range (80..9F) were being somehow changed to BF ... thus resulting in incorrect retrieval I was unable to find the reasons for these strange occurrences Pls suggest what could be the causes for these.. regards, Sandeep. *** SANDEEP KRISHNAMember Technical Staff (Priceline.com)H.C.L. Technologies LimitedA-1 CD, Sector -16, NOIDA, UP, India.Ph: 91-11-91-4516321 (extn. 1062)Fax: 91-11-91-4510713, 4510226E-Mail : [EMAIL PROTECTED]
testing .....
pls ignore this this for testing my membership in thhe mailing list *** SANDEEP KRISHNAMember Technical Staff (Priceline.com)H.C.L. Technologies LimitedA-1 CD, Sector -16, NOIDA, UP, India.Ph: 91-11-91-4516321 (extn. 1062)Fax: 91-11-91-4510713, 4510226E-Mail : [EMAIL PROTECTED] ~Don't frown, because you never know who's falling in love with your smile!~
performance.....
guys...any idea what does the use of unicode affect on the database performance ona NT setup...will things slow down due to this ??? *** SANDEEP KRISHNAMember Technical Staff (Priceline.com)H.C.L. Technologies LimitedA-1 CD, Sector -16, NOIDA, UP, India.Ph: 91-11-91-4516321 (extn. 1062)Fax: 91-11-91-4510713, 4510226E-Mail : [EMAIL PROTECTED] ~Don't frown, because you never know who's falling in love with your smile!~
help on Unicode
hi friends... we are new to Unicode and we are aware of the basic concepts of Unicode and UTF-8 coding... but as far as the implementation of Unicode encoding on platforms like Visual C++ or Visual Basic are concerned, we are pretty much in the dark.. if someone could help us in this regard, that would be just great... thankx and regards, Sandeep Krishna and Santosh S.N.