Common Locale Data Repository V1.0 Released!

2003-10-31 Thread Helena Shih
The OpenI18N WG of the Free Standards Group is pleased to inform you that

CLDR (Common XML Locale Data Repository) V1.0 Beta snapshot is available.

The CLDR repository provides application developers a consistent and

uniform resource in managing the locale-sensitive data used for formatting,

parsing, and analysis. It also includes the comparison charts that

demonstrates the locale data differences on various platforms.

New in this snapshot are initial versions of collation tailoring data.

For details on the data, reporting problems, and information on the LDML
specification, see:

http://oss.software.ibm.com/cvs/icu/~checkout~/locale/CLDR_status.html.

Thank you.



Helena S Chapman

Co-Chair of OpenI18N





Locale Data Markup Language Specification 1.0 is Completed.

2003-06-25 Thread Helena Shih
The Free Standards Group Open Internationalization Initiative (OpenI18N)
announced the release of the locale data markup language specification
(LDML), Version 1.0: see http://www.openi18n.org/specs/ldml/.

To see the full announcement, please visit
http://www.openi18n.org/subgroups/lade/locale/announce.htm.

Helena Shih Chapman

Co-Chair of OpenI18N / Free Standards Group





Common XML Locale Announcement

2002-11-07 Thread Helena Shih
The Free Standards Group has released another essential specification, that

will allow computer and web users worldwide to have one standard for the

exchange of culturally sensitive information. Since the global PC market is

expected to be double that of the North American market, creating standards

that will make it easy for computer users from all over the globe to work

with each other is essential.



Please see the following announcement for more details. Thank you.



Best regards,

Helena

Helena Shih Chapman

IBM GCoC - San Jose

5600 Cottle Road

Mail Stop: 50-2/B11

San Jose, CA 95193

===

Common XML Locale Specification Released

The Free Standards Group Open Internationalization Initiative, OpenI18N

(formerly known as Li18nux) announced the release of the XML specification

of the common XML locale data. The Common XML Locale Repository project is a

joint effort among the members of the Linux Application Development

Environment (aka LADE) Workgroup of the Free Standards Group. The founding

members of the workgroup are IBM, Sun and OpenOffice.org. The workgroup is

open to additional members, both industry and community. The purpose of this

project is to devise a general XML format for the exchange of culturally

sensitive (locale) information for use in application and system

development, and to gather, store, and make available data generated in that

format.

"Interoperability has been significantly hampered by the lack of any

acceptable repository for locale data," said Mark Davis, IBM chief

globalization architect. "By having a single format for gathering and

comparing data specific to different countries, it will make it far easier

for programs and systems to provide consistent results to people all around

the globe, no matter what language they speak. To support this effort, we

have volunteered to host the initial work on the ICU website

(http://oss.software.ibm.com/icu/)."

The LADE Workgroup has finalized the XML specification of the culture

information data to be shared by the application developers creating

globalized software. It is also in the process of creating a set of modular

standards such that the culture information repertoire can be used based on

one or more components or as a whole, depending on the end users' needs.

This approach allows for true scalability.

"The ability to process and present culturally sensitive information has

become a significant issue with the popularity of the Web, said Helena Shih

Chapman, The Free Standards Group OpenI18N LADE Workgroup leader.

"Application developers can now make use of the information provided by the

Common XML Locale Repository to provide the correct international behavior

to the application end users."

Locale/culture information standards for Linux ensure that Linux and

Linux-based software will have the infrastructure necessary to address the

advanced needs of world-wide ready software, creating yet another

indispensable tool for Linux. Information on the Common XML Locale

Information Repository can be found at

http://oss.software.ibm.com/cvs/icu/locale/. To learn more about LADE

Workgroup and how to join, please see http://www.openi18n.org/subgroups/.

About the Free Standards Group

Supported by industry leaders, the Free Standards Group is an independent,

vendor-neutral, non-profit organization dedicated to accelerating the use

and acceptance of open source technologies through the development,

application and promotion of standards. Headquartered in Oakland, Calif.,

the Free Standards Group fulfills a critical need in the open source

development community to have common behavioral specifications, tools and

APIs, making development across Linux distributions easier. More

information on the Free Standards Group is available at

www.freestandards.org.






Re: C Programming for Unicode

2000-10-17 Thread Helena Shih

My apology, I didn't realize glibc also supports Unicode collation
algorithm.  If so, yes, my statement underestimated the support in glibc
quite a bit.

Sorry.
- Original Message -
>
> I'm afraid this is a little bit of understatement fow what glibc can do
> (among other things, glibc can do collation like any other C library can
> do with appropriate locales)  . It would be a good description of iconv
> and friends in glibc ( any C library with iconv supporting many encodings
> ). In case glibc is too big to install on the target platform, there's
> also a standalone free (LGPLed) libiconv (developed by Bruno Haible )
> that offers iconv(3) for a lot of encodings.
>
>   http://clisp.cons.org/~haible/packages-libiconv.html
>
> Jungshik Shin
>
>




Re: C Programming for Unicode

2000-10-17 Thread Helena Shih

There are a few options, depending what you mean by "supports unicode".  If
all you care about the code page conversion so your program can process
Unicode code points, glibc is freely available on many platforms,
http://www.gnu.org.

If your application requires more sophisticated Unicode support such as
collation and word break etc., take a look at ICU,
http://oss.software.ibm.com/icu.  It's also freely available on many
interesting environments.  Qt also provides a great set of features, again
for free.  A more complete list of internatinalization libraries can be
found at http://www.unicode.org/unicode/onlinedat/products.html.  Some of
them are commercial products and some not.

- Original Message -
From: "SoHee Kim" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Tuesday, October 17, 2000 1:54 PM
Subject: C Programming for Unicode


>
>  Hi,
>
>  I would like to modify existing C application so that it supports
> unicode.
>  Does anybody know any references any samples that would help?
>  Thanks.
>
>  SoHee
>
>




RE: Unicode on a non-Unicode web page

2000-09-08 Thread Helena Shih

Hi Paul.  I am curious to know if,

1. The ICU conversion code is buggy, or
2. The XMLConverter sample is buggy.

If you can kindly point out the bugs in ICU code to us, we would really
appreciate that. Instead of using XMLConverter sample, which is not designed
and coded to be robust and easy to use.  I would recommend using uconv
application instead in the 'icuapps' module also checked into the CVS
repository for ICU.

Please feel free to submit your bug report to us at
http://oss.software.ibm.com/developerworks/opensource/icu/bugs.  Thank you!!

-Original Message-
From: Paul Deuter [mailto:[EMAIL PROTECTED]]
Sent: Thursday, September 07, 2000 1:47 PM
To: Unicode List
Subject: RE: Unicode on a non-Unicode web page


Your question is essentially "How do I mix characters encoded in more than
one character set on a single page?"

A normal page has one document and that one document will expect characters
to be encoded in the character set specified in the meta tag in the header.
It is possible to have a compound document consisting of one or more
documents each in its own FRAME.  Each frame will have its own header and
therefore can have a different character set than the main page (see example
below).  It is also possible to use IFRAMEs which also have their own
header.  IFRAMEs however are not supported by Netscape.  These are the only
ways I know of using multiple character sets on one page.

Finally you also have the solution already suggested of encoding everything
as UTF-8 and using that as your main character set.  I don't know of an easy
way of transliterating 8859-2 to UTF-8.  The hard ways are using Notepad on
Windows 2000 on a machine that has 8859-2 as the ANSI character set and
saving to UTF-8.  There is also an XMLConverter program that comes with the
ICU source - but I have found this to be buggy.

FRAMES example:



Simple set of frames

   
   



FRM1.HTM:




Frame 1 HTML


Japanese 'Ü'ê'É Text




FRM1.HTM:




Frame 2 HTML


Russian åíãåíãåíã Text





Paul Deuter
[EMAIL PROTECTED]


-Original Message-
From: Gary P. Grosso [mailto:[EMAIL PROTECTED]]
Sent: Thursday, September 07, 2000 7:32 AM
To: Unicode List
Subject: Unicode on a non-Unicode web page


Hi Unicoders,

I am working on software to emit HTML in the encoding
and character set of the user's choice, from SGML/XML
documents which can contain any Plane 1 Unicode character.
The question is what to do with characters outside the
selected encoding.  I thought I would use the "numeric"
character entity reference and IE5 at least seems to
render that well, but Netscape Communicator 4.6 doesn't.

One way to look at this is: how do I use unicode as an
"escape" to include some isolated content on a web page
of arbitrary encoding?

For example, I have something such as:


Unicode in a Latin 2 page



Èlánek Úvod ®ádný èest èin èinìn èinù èinùm èinnost èinnosti
jakmile jako jako¾ jako¾to jazyka je¾ jediné jednat jednotkou
jednotlivec
CYRILLIC CAPITAL LETTER DJE: Ђ
CAPITAL LETTER GAMMA: Γ
HIRAGANA LETTER KA: か
jeho jejich jemu jimi jiného jinému jiných jiným jinými jsou ka¾dému
ka¾dý




which probably looks awful since your email client is not likely
set to display Latin 2, but which can also be seen at:

http://www.angelfire.com/mi/virtualattic/latin2_test.html

If I change the meta tag to:

then Netscape does slightly better (still stumbles over &#x-anything
and doesn't display the hiragana, but does display the DJE and GAMMA
if I use decimal values) but of course now the Czech words are not
displayed properly.

My question(s):

Is there some way I can nudge Netscape's browser to display these?

Is there a better way to write this admittedly mongrel HTML content?
I have heard somewhere that it is possible to change charset choice
"on the fly" and if would work, I would appreciate a pointer to
somewhere that says how best to do this.

Thanks in advance for any insights.


---
Gary Grosso
[EMAIL PROTECTED]
Arbortext, Inc.
Ann Arbor, MI, USA




First ICU Developer Workshop Meeting, September 2000, Cupertino, CA -- Register now

2000-08-11 Thread Helena Shih




First ICU Developer Workshop Meeting

  September 11-12, 2000

  IBM Emerging Technology Center

   Cupertino, California, USA

 **

Unicode is essential to software globalization development.  The
International Components for Unicode(ICU) is a Java, C and C++ library that
provides robust and full-featured Unicode support on a wide variety of
platforms. ICU is a collaborative, open-source development project jointly
managed by a group of companies and individual volunteers throughout the
world, using the Internet and the Web to communicate, plan, and develop the
software and documentation.

The First ICU developer workshop meeting is designed to provide you a better
understanding of the ICU technologies, specifically C/C++ libraries.  It
will also show you how to migrate from the existing proprietary technologies
to using ICU.

 CONFERENCE WEB SITE

http://oss.software.ibm.com/icu/workshop

 CONFERENCE PROGRAM and REGISTRATION

Visit the workshop Web site at
http://oss.software.ibm.com/icu/workshop/agenda.html to view the detailed
program
(complete with abstracts and speaker biographies).  The registration is
free.  Seating is limited on a first-come-first-serve basis.  ICU workshop
committee reserves the rights to cancel or reschedule the workshop meetings.

 CONFERENCE VENUE

IBM
10275 North DeAnza Blvd.
Cupertino, CA 95014
USA

Tel:   +1 408 777 5802
Fax:   +1 408 777 5890

*  *  *  *  *

 Unicode(r) and the Unicode logo are registered trademarks of Unicode,
 Inc.  Used with permission.