RE: Mixing languages on a Web site
Only in the sense that Arial is more attractive than Times New Roman. For on-screen display of small amounts of text, a Gothic font is better due to the low resolution of displays, but for larger amounts of printed text, a Mincho font is preferred. Newspapers (and Word documents) are all set in Mincho for that reason. If you install both fonts, you should make sure you get SP5 or later for NT4 to fix a problem that NT4 has handling several very large fonts on the system at the same time. One thing to note is that there are different versions of MS Gothic and MS Mincho that have different coverage of CJK. Notably, the ver 2.3 of these fonts that ships with Win98J, and all languages of Win2000 has JIS X 212 CJK coverage. Older versions (NT4) covered only JIS X 208. I am not sure which version ships with the IE language packs, but it is probably a smaller (older) one for size reasons. Regarding Mike Ayers's question about usage, the global IME's appear in the list of installed keyboards (represented by a two-letter icon in the task bar tray). They appear only if you are using an application that supports the Global IME (IE4/5, Word2000, Outlook 98/2000 mail, Outlook Express 4/5, etc.). There is almost no documentation in English on how to use IMEs that I know of. The Office2000 Proofing Tools manual has one page for each language, but comprehensive documentation in English does not exist that I know of (I would love to be proven wrong). Chris Pratley Group Program Manager Microsoft Word Sent with office10ship build 1829 wordmail on -Original Message- From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]] Sent: July 1, 2000 7:00 AM To: Unicode List Subject: Re: Mixing languages on a Web site If you mean the Active IMM, you can install the Japanese lang support provided by IE5 as well, as it does the same thing (installs a font and code page support). In fact the cp files have more recent dates, I think. In fact, the font it installs (MS Gothic) is generally considered to be more attractive than the LangPack font (MS Mincho). michka - Original Message - From: "Andrew Cunningham" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: "Unicode List" [EMAIL PROTECTED] Sent: Saturday, July 01, 2000 6:51 AM Subject: Re: Mixing languages on a Web site Hi Mike To use microsoft's global IME for Japanese on NT4, there is one very important step you need to do ... install NT4 Japanese support .. there are a few articles about it in the Microsoft knowledge base .. i have the urls at work, don't have them with me at the moment ... on the win NT4 cdrom there is a folder somewhere called langpacks ... use windows explorer to look in it ... there is a file called japanese.inf .. right mouse click on it .. a pop up menu will appear ... on of the menu items is 'install' .. select this .. and it will install NT4's Japanese langauge support .. this should be installed before the global IME for Japanese ... otherwise it will not work ... at least that's the story ... ciao Andrew Andrew Cunningham [EMAIL PROTECTED] - Original Message - From: Ayers, Mike [EMAIL PROTECTED] To: Unicode List [EMAIL PROTECTED] Sent: Saturday, 1 July 2000 3:49 Subject: RE: Mixing languages on a Web site From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]] Sent: Friday, June 30, 2000 4:28 AM To prove #4 will work, see http://www.trigeminal.com/samples/provincial.html Along with 102 other languages, this page includes both Japanese and Turkish. UTF-8 is what makes that possible michka I checked it out, and with IE5 I can now view almost all of it. There are 5 lines that I cannot view and for which there are no fonts available, but otherwise great. Netscape does not show nearly as many (hints?). On a possibly entirely unrelated subject, I downloaded Microsoft's IMEs for Chinese and Japanese, hoping to learn to use them. However, I cannot figure out how to enable them, and can't locate any helpful info on Microsoft's site. I am running NT4. Any tips greatly appreciated. Thanks, /|/|ike
Re: Mixing languages on a Web site
Hi Mike To use microsoft's global IME for Japanese on NT4, there is one very important step you need to do ... install NT4 Japanese support .. there are a few articles about it in the Microsoft knowledge base .. i have the urls at work, don't have them with me at the moment ... on the win NT4 cdrom there is a folder somewhere called langpacks ... use windows explorer to look in it ... there is a file called japanese.inf .. right mouse click on it .. a pop up menu will appear ... on of the menu items is 'install' .. select this .. and it will install NT4's Japanese langauge support .. this should be installed before the global IME for Japanese ... otherwise it will not work ... at least that's the story ... ciao Andrew Andrew Cunningham [EMAIL PROTECTED] - Original Message - From: Ayers, Mike [EMAIL PROTECTED] To: Unicode List [EMAIL PROTECTED] Sent: Saturday, 1 July 2000 3:49 Subject: RE: Mixing languages on a Web site From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]] Sent: Friday, June 30, 2000 4:28 AM To prove #4 will work, see http://www.trigeminal.com/samples/provincial.html Along with 102 other languages, this page includes both Japanese and Turkish. UTF-8 is what makes that possible michka I checked it out, and with IE5 I can now view almost all of it. There are 5 lines that I cannot view and for which there are no fonts available, but otherwise great. Netscape does not show nearly as many (hints?). On a possibly entirely unrelated subject, I downloaded Microsoft's IMEs for Chinese and Japanese, hoping to learn to use them. However, I cannot figure out how to enable them, and can't locate any helpful info on Microsoft's site. I am running NT4. Any tips greatly appreciated. Thanks, /|/|ike
Re: Mixing languages on a Web site
If you mean the Active IMM, you can install the Japanese lang support provided by IE5 as well, as it does the same thing (installs a font and code page support). In fact the cp files have more recent dates, I think. In fact, the font it installs (MS Gothic) is generally considered to be more attractive than the LangPack font (MS Mincho). michka - Original Message - From: "Andrew Cunningham" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: "Unicode List" [EMAIL PROTECTED] Sent: Saturday, July 01, 2000 6:51 AM Subject: Re: Mixing languages on a Web site Hi Mike To use microsoft's global IME for Japanese on NT4, there is one very important step you need to do ... install NT4 Japanese support .. there are a few articles about it in the Microsoft knowledge base .. i have the urls at work, don't have them with me at the moment ... on the win NT4 cdrom there is a folder somewhere called langpacks ... use windows explorer to look in it ... there is a file called japanese.inf .. right mouse click on it .. a pop up menu will appear ... on of the menu items is 'install' .. select this .. and it will install NT4's Japanese langauge support .. this should be installed before the global IME for Japanese ... otherwise it will not work ... at least that's the story ... ciao Andrew Andrew Cunningham [EMAIL PROTECTED] - Original Message - From: Ayers, Mike [EMAIL PROTECTED] To: Unicode List [EMAIL PROTECTED] Sent: Saturday, 1 July 2000 3:49 Subject: RE: Mixing languages on a Web site From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]] Sent: Friday, June 30, 2000 4:28 AM To prove #4 will work, see http://www.trigeminal.com/samples/provincial.html Along with 102 other languages, this page includes both Japanese and Turkish. UTF-8 is what makes that possible michka I checked it out, and with IE5 I can now view almost all of it. There are 5 lines that I cannot view and for which there are no fonts available, but otherwise great. Netscape does not show nearly as many (hints?). On a possibly entirely unrelated subject, I downloaded Microsoft's IMEs for Chinese and Japanese, hoping to learn to use them. However, I cannot figure out how to enable them, and can't locate any helpful info on Microsoft's site. I am running NT4. Any tips greatly appreciated. Thanks, /|/|ike
Re: Mixing languages on a Web site
To prove #4 will work, see http://www.trigeminal.com/samples/provincial.html Along with 102 other languages, this page includes both Japanese and Turkish. UTF-8 is what makes that possible michka - Original Message - From: [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Thursday, June 29, 2000 10:19 PM Subject: Mixing languages on a Web site I am mixing Japanese and Turkish letters on my site. 1) How do I convert Latin-* text to UTF-8 text? 2) How do I convert Shift-JIS text to UTF-8 text? 3) How do I mark text as UTF-8? 4) Will people actually be able to SEE BOTH the Japanese AND the Turkish? 5) Is there a little "formatted in Unicode" logo I can put on my site? 6) Is there a "Unicode Help" site so people like me don't have to post these questions on lists like these? I bet 1 and 2 could be done with CGI scripts, and 3 is trivial. Get free email and a permanent address at http://www.netaddress.com/?N=1
Re: Mixing languages on a Web site
[EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: 3) How do I mark text as UTF-8? In your head section: meta http-equiv="content-type" content="text/html; charset=utf-8" Theoretically, you don't need this: Unicode (UTF-16 or UTF-8) are the default for the web. In practice, however, each different browser behaves in a slightly different way, so it can be a good idea to use the explicit declaration. Hmmm. Writing from top of my head (which is *not* the good way to go in such a list), I understood that Unicode was the default character set, meaning that #65; is supposed to be a Latin 'A' and #x431; is supposed to be Cyrilic 'a'. OTOH, I believe that for upward compatibility, the default encoding (i.e. how the actual bytes are supposed to be understood) is supposed to be iso-8859-1, not utf-8. (and if it begins with ÿþ or þÿ, the browser is advised to test if reading the file as utf-16 is not a better idea...) 4) Will people actually be able to SEE BOTH the Japanese AND the Turkish? Yes, provided they have a UTF-8 enabled browser and a font with all necessary glyphs. Well, with current generation browsers (IE5 or Netscape 6), it can even work with a font for Japanese and a different font for Turkish. 6) Is there a "Unicode Help" site so people like me don't have to post these questions on lists like these? I think this mailing list is the proper place [...] Yes, but wouldn't it be a very good idea to resume these answers in some FAQ at Unicode (or W3C) site, allow the Web sites to link relevant informations from everywhere in a convenient way, particularly for the poorer guys that cannot afford testing all the cases with all browsers (also, it can then be easily translated to a bunch of languages). Perhaps this pertains more to W3C than Unicode, though. Antoine
Re: Mixing languages on a Web site
This is very much like how we did the multlingual content in http://www.unicode.org/unicode/standard/WhatIsUnicode.html, which currently has English, French, German, Italian, Russian, and Arabic; with more to follow. Mark Herman Ranes wrote: [EMAIL PROTECTED] skreiv: I am mixing Japanese and Turkish letters on my site. 1) How do I convert Latin-* text to UTF-8 text? 2) How do I convert Shift-JIS text to UTF-8 text? I suppose you do not mean dynamic / server side conversion, but text preparation only. You can use MS Internet Explorer 5.0: -Load the text -Select codepage, so that the text displays properly View - Encoding - More --- -Save the UNICODE text file: Save as - Encoding:UTF-8 3) How do I mark text as UTF-8? If you can not configure the server to include the appropriate HTTP header info, you can instead / in addition use the following META-tag in the HTML code: HEAD META http-equiv="Content-Type" content="text/html; charset=UTF-8" /HEAD 4) Will people actually be able to SEE BOTH the Japanese AND the Turkish? If fonts with the required repertoire are installed, they will! (Mozilla / Netscape 6.0 requires no more preparations -- it will pick subtitutions from the avalable fonts.) If the HTML-text is language-tagged -- and the browser correctly configured, the text may be displayed in correct font styles. (Japanese, not Chinese.) A page which contains both Japanese and Esperanto, in UTF-8 and with language tags: http://www.hist.no/~herman/eo/eo-jp.html http://www.hist.no/~herman/eo/eo.html 5) Is there a little "formatted in Unicode" logo I can put on my site? 6) Is there a "Unicode Help" site so people like me don't have to post these questions on lists like these? I bet 1 and 2 could be done with CGI scripts, and 3 is trivial. Get free email and a permanent address at http://www.netaddress.com/?N=1 -- Herman Ranes Høgskolen i Sør-Trøndelag Avdeling for teknologi Telefon +47 73559606Institutt for elektroteknikk Telefaks +47 73559581 [EMAIL PROTECTED] N-7004 Trondheim http://www.hist.no/~herman/ NOREG
RE: Mixing languages on a Web site
From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]] Sent: Friday, June 30, 2000 4:28 AM To prove #4 will work, see http://www.trigeminal.com/samples/provincial.html Along with 102 other languages, this page includes both Japanese and Turkish. UTF-8 is what makes that possible michka I checked it out, and with IE5 I can now view almost all of it. There are 5 lines that I cannot view and for which there are no fonts available, but otherwise great. Netscape does not show nearly as many (hints?). On a possibly entirely unrelated subject, I downloaded Microsoft's IMEs for Chinese and Japanese, hoping to learn to use them. However, I cannot figure out how to enable them, and can't locate any helpful info on Microsoft's site. I am running NT4. Any tips greatly appreciated. Thanks, /|/|ike
RE: Mixing languages on a Web site
Antoine Leca wrote: Hmmm. Writing from top of my head (which is *not* the good way to go in such a list), I understood that Unicode was the default character set, [...] You are right (see http://www.w3.org/International/O-HTML-charset.html). OTOH, I believe that for upward compatibility, the default encoding [...] is supposed to be iso-8859-1, [...] I was wrong, and you are right for HTML as served HTTP 1.1. The current trend is that HTML has no default encoding (see http://www.w3.org/International/O-HTTP-charset.html) so, yes, the meta tag should always be there in a decent page. Well, with current generation browsers (IE5 or Netscape 6), it can even work with a font for Japanese and a different font for Turkish. Right, using language tagging within the document (http://www.w3.org/International/O-HTML-tags.html). Yes, but wouldn't it be a very good idea to resume these answers in some FAQ at Unicode (or W3C) site, [...] But it is much easier to bang out inaccurate answers from one's memory :-( Sorry for having been careless one more time. W3C has than nice section I've been mentioning so far (http://www.w3.org/International/). Unicode has a FAQ (http://www.unicode.org/unicode/faq/), a technical introduction (http://www.unicode.org/unicode/standard/principles.html), and a glossary (http://www.unicode.org/glossary/index.html). Probably the Unicode FAQ should be updated periodically with questions asked on *this* list, such as problems authoring web pages, selecting fonts, etc. Among independent documentation, I would cite at least Roman Czyborra's site (http://www.czyborra.com/), that is remarkably informative. _ Marco
Re: Mixing languages on a Web site
On 06/30/2000 08:25:47 AM [EMAIL PROTECTED] wrote: ... a few are missing (Ethiopic, for example). But its got most of them (and I would love to fill in the blanks if there is anyone who has sources for the missing languages!). Just a few? Most of them? Not by a long shot! (Cf. http://www.sil.org/ethnologue/) I regret, though, that I can't easily offer you text for any of the other 6,700 languages (probably only 1/3 of which are written). "Ethiopic" is not the name of a language, by the way. Or were you counting scripts rather than languages? I'm inclined to think that counting country-specific varieties as separate languages is artificially stretching things. I really doubt that someone from Guatemala could complain of someone from Argentina, "¿Por qué no puede simplemente hablar Español de Guatemala?" Do Australians, Canadians, Brits, etc.; or Germans, Austrians, etc. make similar complaints of one another? A fun page, nonetheless! - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Mixing languages on a Web site
On 06/30/2000 12:09:53 PM [EMAIL PROTECTED] wrote: Languages and scripts are often very "politically" involved. I simply chose not to judge people for their contribution, thats all. And given those considerations, I don't blame you in the least. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Mixing languages on a Web site
On 06/30/2000 01:27:18 PM [EMAIL PROTECTED] wrote: Peter, Just read your post to the Unicode list. I'm wondering if your site has any Unicode sample texts available (I'm looking for just about every major script/language). The texts don't have to be long... but I'd like stuff longer than one or two sentences (maybe a couple paragraphs would be great). Any pointers would be much appreciated. Not yet, really, at least probably not of the type and in a form that you're looking for. There is some data but it's either in PDF, or it's probably limited in its character repertoire to what is found in European languages. Of the latter, I don't know whether any is encoded in UTF-8. At any rate, try the following links: http://www.sil.org/silewp/ http://www.sil.org/mexico/pub/publicaciones.htm Some of our field offices have been working on getting content ready to publish on the web, but I'm not sure how much and of what sort, and it may be that a lot of it will be in PDF for now. I expect we will have a lot more linguistic data from a wide variety of languages in the future, but this will take some time. With regard to character sets/encodings, most of our researches have, in the past, worked with custom character sets/encodings where commonly available standards like cp1252 weren't adequate - linguists everywhere have had to do that, so most existing data isn't yet in Unicode. As an organisation, though, we're committed to Unicode, and those of us in our International offices working on technology solutions for the researches we support are promoting the use of Unicode as linguistic software that supports Unicode becomes available. (We anticipate our first Unicode-enabled language software products will be released late this year or early next year.) So Unicode-encoded data from vernacular languages will start to become more common over the next several years. I also expect that SIL will be getting involved in cooperative efforts with other major linguistic agencies to start building online archives of linguistic data, and that will likely build heavily on XML and Unicode. One key issue in putting data from hundreds of languages on the web is fonts and rendering support for complex scripts (which includes IPA and Roman with diacritics). There is also the issue that some minority languages use characters that are not yet part of Unicode, or they may use characters in Unicode but with script behaviour that's slightly different from what occurs in the more commonly known languages (e.g. different glyph shapes or different ligatures and ligation rules). It will just take some time to cross all these bridges. We do have access to electronic corpora of texts in literally hundreds of minority languages, we know that there would be a lot of interest in those being made available, and we want to start making it available. With personnel resources already stretched and some technical issues still to be worked out, this will take longer than we wish it would. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Mixing languages on a Web site
[EMAIL PROTECTED] wrote: Probably the Unicode FAQ should be updated periodically with questions asked on *this*list, such as problems authoring web pages, selecting fonts, etc. I second that. As Unicode is increasingly available to users in operating systems, applications, and on the web, more and more people are going to be looking for answers to practical questions relating to how to use and display Unicode text in their favourite applications or on their web pages. The most obvious place for them to look for these answers is on the Unicode.org site - and the most obvious place for people to ask those questions which they cannot easily find answers to is on this list. - Chris
Mixing languages on a Web site
I am mixing Japanese and Turkish letters on my site. 1) How do I convert Latin-* text to UTF-8 text? 2) How do I convert Shift-JIS text to UTF-8 text? 3) How do I mark text as UTF-8? 4) Will people actually be able to SEE BOTH the Japanese AND the Turkish? 5) Is there a little "formatted in Unicode" logo I can put on my site? 6) Is there a "Unicode Help" site so people like me don't have to post these questions on lists like these? I bet 1 and 2 could be done with CGI scripts, and 3 is trivial. Get free email and a permanent address at http://www.netaddress.com/?N=1