Common Locale Data Repository V1.0 Released!
The OpenI18N WG of the Free Standards Group is pleased to inform you that CLDR (Common XML Locale Data Repository) V1.0 Beta snapshot is available. The CLDR repository provides application developers a consistent and uniform resource in managing the locale-sensitive data used for formatting, parsing, and analysis. It also includes the comparison charts that demonstrates the locale data differences on various platforms. New in this snapshot are initial versions of collation tailoring data. For details on the data, reporting problems, and information on the LDML specification, see: http://oss.software.ibm.com/cvs/icu/~checkout~/locale/CLDR_status.html. Thank you. Helena S Chapman Co-Chair of OpenI18N
Locale Data Markup Language Specification 1.0 is Completed.
The Free Standards Group Open Internationalization Initiative (OpenI18N) announced the release of the locale data markup language specification (LDML), Version 1.0: see http://www.openi18n.org/specs/ldml/. To see the full announcement, please visit http://www.openi18n.org/subgroups/lade/locale/announce.htm. Helena Shih Chapman Co-Chair of OpenI18N / Free Standards Group
Common XML Locale Announcement
The Free Standards Group has released another essential specification, that will allow computer and web users worldwide to have one standard for the exchange of culturally sensitive information. Since the global PC market is expected to be double that of the North American market, creating standards that will make it easy for computer users from all over the globe to work with each other is essential. Please see the following announcement for more details. Thank you. Best regards, Helena Helena Shih Chapman IBM GCoC - San Jose 5600 Cottle Road Mail Stop: 50-2/B11 San Jose, CA 95193 === Common XML Locale Specification Released The Free Standards Group Open Internationalization Initiative, OpenI18N (formerly known as Li18nux) announced the release of the XML specification of the common XML locale data. The Common XML Locale Repository project is a joint effort among the members of the Linux Application Development Environment (aka LADE) Workgroup of the Free Standards Group. The founding members of the workgroup are IBM, Sun and OpenOffice.org. The workgroup is open to additional members, both industry and community. The purpose of this project is to devise a general XML format for the exchange of culturally sensitive (locale) information for use in application and system development, and to gather, store, and make available data generated in that format. "Interoperability has been significantly hampered by the lack of any acceptable repository for locale data," said Mark Davis, IBM chief globalization architect. "By having a single format for gathering and comparing data specific to different countries, it will make it far easier for programs and systems to provide consistent results to people all around the globe, no matter what language they speak. To support this effort, we have volunteered to host the initial work on the ICU website (http://oss.software.ibm.com/icu/)." The LADE Workgroup has finalized the XML specification of the culture information data to be shared by the application developers creating globalized software. It is also in the process of creating a set of modular standards such that the culture information repertoire can be used based on one or more components or as a whole, depending on the end users' needs. This approach allows for true scalability. "The ability to process and present culturally sensitive information has become a significant issue with the popularity of the Web, said Helena Shih Chapman, The Free Standards Group OpenI18N LADE Workgroup leader. "Application developers can now make use of the information provided by the Common XML Locale Repository to provide the correct international behavior to the application end users." Locale/culture information standards for Linux ensure that Linux and Linux-based software will have the infrastructure necessary to address the advanced needs of world-wide ready software, creating yet another indispensable tool for Linux. Information on the Common XML Locale Information Repository can be found at http://oss.software.ibm.com/cvs/icu/locale/. To learn more about LADE Workgroup and how to join, please see http://www.openi18n.org/subgroups/. About the Free Standards Group Supported by industry leaders, the Free Standards Group is an independent, vendor-neutral, non-profit organization dedicated to accelerating the use and acceptance of open source technologies through the development, application and promotion of standards. Headquartered in Oakland, Calif., the Free Standards Group fulfills a critical need in the open source development community to have common behavioral specifications, tools and APIs, making development across Linux distributions easier. More information on the Free Standards Group is available at www.freestandards.org.
Re: C Programming for Unicode
My apology, I didn't realize glibc also supports Unicode collation algorithm. If so, yes, my statement underestimated the support in glibc quite a bit. Sorry. - Original Message - > > I'm afraid this is a little bit of understatement fow what glibc can do > (among other things, glibc can do collation like any other C library can > do with appropriate locales) . It would be a good description of iconv > and friends in glibc ( any C library with iconv supporting many encodings > ). In case glibc is too big to install on the target platform, there's > also a standalone free (LGPLed) libiconv (developed by Bruno Haible ) > that offers iconv(3) for a lot of encodings. > > http://clisp.cons.org/~haible/packages-libiconv.html > > Jungshik Shin > >
Re: C Programming for Unicode
There are a few options, depending what you mean by "supports unicode". If all you care about the code page conversion so your program can process Unicode code points, glibc is freely available on many platforms, http://www.gnu.org. If your application requires more sophisticated Unicode support such as collation and word break etc., take a look at ICU, http://oss.software.ibm.com/icu. It's also freely available on many interesting environments. Qt also provides a great set of features, again for free. A more complete list of internatinalization libraries can be found at http://www.unicode.org/unicode/onlinedat/products.html. Some of them are commercial products and some not. - Original Message - From: "SoHee Kim" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Tuesday, October 17, 2000 1:54 PM Subject: C Programming for Unicode > > Hi, > > I would like to modify existing C application so that it supports > unicode. > Does anybody know any references any samples that would help? > Thanks. > > SoHee > >
RE: Unicode on a non-Unicode web page
Hi Paul. I am curious to know if, 1. The ICU conversion code is buggy, or 2. The XMLConverter sample is buggy. If you can kindly point out the bugs in ICU code to us, we would really appreciate that. Instead of using XMLConverter sample, which is not designed and coded to be robust and easy to use. I would recommend using uconv application instead in the 'icuapps' module also checked into the CVS repository for ICU. Please feel free to submit your bug report to us at http://oss.software.ibm.com/developerworks/opensource/icu/bugs. Thank you!! -Original Message- From: Paul Deuter [mailto:[EMAIL PROTECTED]] Sent: Thursday, September 07, 2000 1:47 PM To: Unicode List Subject: RE: Unicode on a non-Unicode web page Your question is essentially "How do I mix characters encoded in more than one character set on a single page?" A normal page has one document and that one document will expect characters to be encoded in the character set specified in the meta tag in the header. It is possible to have a compound document consisting of one or more documents each in its own FRAME. Each frame will have its own header and therefore can have a different character set than the main page (see example below). It is also possible to use IFRAMEs which also have their own header. IFRAMEs however are not supported by Netscape. These are the only ways I know of using multiple character sets on one page. Finally you also have the solution already suggested of encoding everything as UTF-8 and using that as your main character set. I don't know of an easy way of transliterating 8859-2 to UTF-8. The hard ways are using Notepad on Windows 2000 on a machine that has 8859-2 as the ANSI character set and saving to UTF-8. There is also an XMLConverter program that comes with the ICU source - but I have found this to be buggy. FRAMES example: Simple set of frames FRM1.HTM: Frame 1 HTML Japanese 'Ü'ê'É Text FRM1.HTM: Frame 2 HTML Russian åíãåíãåíã Text Paul Deuter [EMAIL PROTECTED] -Original Message- From: Gary P. Grosso [mailto:[EMAIL PROTECTED]] Sent: Thursday, September 07, 2000 7:32 AM To: Unicode List Subject: Unicode on a non-Unicode web page Hi Unicoders, I am working on software to emit HTML in the encoding and character set of the user's choice, from SGML/XML documents which can contain any Plane 1 Unicode character. The question is what to do with characters outside the selected encoding. I thought I would use the "numeric" character entity reference and IE5 at least seems to render that well, but Netscape Communicator 4.6 doesn't. One way to look at this is: how do I use unicode as an "escape" to include some isolated content on a web page of arbitrary encoding? For example, I have something such as: Unicode in a Latin 2 page Èlánek Úvod ®ádný èest èin èinìn èinù èinùm èinnost èinnosti jakmile jako jako¾ jako¾to jazyka je¾ jediné jednat jednotkou jednotlivec CYRILLIC CAPITAL LETTER DJE: Ђ CAPITAL LETTER GAMMA: Γ HIRAGANA LETTER KA: か jeho jejich jemu jimi jiného jinému jiných jiným jinými jsou ka¾dému ka¾dý which probably looks awful since your email client is not likely set to display Latin 2, but which can also be seen at: http://www.angelfire.com/mi/virtualattic/latin2_test.html If I change the meta tag to: then Netscape does slightly better (still stumbles over -anything and doesn't display the hiragana, but does display the DJE and GAMMA if I use decimal values) but of course now the Czech words are not displayed properly. My question(s): Is there some way I can nudge Netscape's browser to display these? Is there a better way to write this admittedly mongrel HTML content? I have heard somewhere that it is possible to change charset choice "on the fly" and if would work, I would appreciate a pointer to somewhere that says how best to do this. Thanks in advance for any insights. --- Gary Grosso [EMAIL PROTECTED] Arbortext, Inc. Ann Arbor, MI, USA
First ICU Developer Workshop Meeting, September 2000, Cupertino, CA -- Register now
First ICU Developer Workshop Meeting September 11-12, 2000 IBM Emerging Technology Center Cupertino, California, USA ** Unicode is essential to software globalization development. The International Components for Unicode(ICU) is a Java, C and C++ library that provides robust and full-featured Unicode support on a wide variety of platforms. ICU is a collaborative, open-source development project jointly managed by a group of companies and individual volunteers throughout the world, using the Internet and the Web to communicate, plan, and develop the software and documentation. The First ICU developer workshop meeting is designed to provide you a better understanding of the ICU technologies, specifically C/C++ libraries. It will also show you how to migrate from the existing proprietary technologies to using ICU. CONFERENCE WEB SITE http://oss.software.ibm.com/icu/workshop CONFERENCE PROGRAM and REGISTRATION Visit the workshop Web site at http://oss.software.ibm.com/icu/workshop/agenda.html to view the detailed program (complete with abstracts and speaker biographies). The registration is free. Seating is limited on a first-come-first-serve basis. ICU workshop committee reserves the rights to cancel or reschedule the workshop meetings. CONFERENCE VENUE IBM 10275 North DeAnza Blvd. Cupertino, CA 95014 USA Tel: +1 408 777 5802 Fax: +1 408 777 5890 * * * * * Unicode(r) and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.