RE: Indic editing (was: RE: The real solution)

2001-12-20 Thread Roozbeh Pournader


A dead thread, but worth to note that:

On Tue, 18 Dec 2001, Marco Cimarosti wrote:

> > Would you kindly tell me how i can construct such input methods and
> > ultimately create fonts.
> 
> Er... It is not so easy to do this kind of things yourself. You should buy
> (or, however, "get") software that properly supports Devanagari.

You can also get Pango (http://www.pango.org/). It's a Free library that 
supports Unicode's Devanagari and other Indic scripts on both Linux and 
Windows.

roozbeh





Call for Papers - 21st Unicode Conference - May 2002 - Dublin

2001-12-20 Thread Misha . Wolf

 Twenty-First International Unicode Conference (IUC21)
   Unicode and the Web: The Global Connection
http://www.unicode.org/iuc/iuc21
May 14-17, 2002
Dublin, Ireland

  > > > > > > >  C A L L   F O R   P A P E R S  < < < < < < <

  Submissions due: January 11, 2002
  Notification date: February 1, 2002
Completed papers due : February 22, 2002
(in electronic form and camera-ready paper form)

   * * * * *

The Unicode Standard has become the foundation for all modern text
processing.  It is used on large machines, tiny portable devices, and
for distributed processing across the Internet.  The standard brings
cost-reducing efficiency to international applications and enables the
exchange of text in an ever increasing list of natural languages.

New technologies and innovative Internet applications, as well as the
evolving Unicode Standard, bring new challenges along with their new
capabilities.  This technical conference will explore the opportunities
created by the latest advances and how to leverage them, as well as
potential pitfalls to be aware of, and problem areas that need further
research.

We invite you to submit papers which either define the software of
tomorrow, demonstrate best practice with today's software, or articulate
problems that must be solved before further advances can occur.  Papers
should discuss subjects in the context of Unicode, internationalization
or localization. You can view the programs of previous conferences at:
http://www.unicode.org/unicode/conference/about-conf.html

Conference attendees are generally involved in either the development,
deployment or use of Unicode software or content, or the globalization
of software and the Internet.  They include managers, software
engineers, systems analysts, font designers, graphic designers, content
developers, technical writers, and product marketing personnel.

THEME & TOPICS

Computing with Unicode is the overall theme of the Conference.
Presentations should be geared towards a technical audience.  Topics of
interest include, but are not limited to, the following (within the
context of Unicode, internationalization or localization):

- UTFs: Not enough or too many?
- Security concerns e.g. Avoiding the spoofing of UTF-8 data
- Impact of new encoding standards
- Implementing Unicode: Practical and political hurdles
- Portable devices
- Implementing new features of recent versions of Unicode
- Algorithms (e.g. normalization, collation, bidirectional)
- Programming languages and libraries (Java, Perl, et al)
- The World Wide Web (WWW)
- Search engines
- Library and archival concerns
- Operating systems
- Databases
- Large scale networks
- Government applications
- Evaluations (case studies, usability studies)
- Natural language processing
- Migrating legacy applications
- Cross platform issues
- Printing and imaging
- Optimizing performance of systems and applications
- Testing applications
- XML and Web protocols
- Business models for software development (e.g. Open source)

SESSIONS

The Conference Program will provide a wide range of sessions including:
- Keynote presentations
- Workshops/Tutorials
- Technical presentations
- Panel sessions

All sessions except the Workshops/Tutorials will be of 40 minute
duration.  In some cases, two consecutive 40 minute program slots may be
devoted to a single session.

The Workshops/Tutorials will each last approximately three hours.  They
should be designed to stimulate discussion and participation, using
slides and demonstrations.

PUBLICITY

If your paper is accepted, your details will be included in the
Conference brochure and Web pages and the paper itself will appear on a
Conference CD, with an optional printed book of Conference Proceedings.

CONFERENCE LANGUAGE

The Conference language is English.  All submissions, papers and
presentations should be provided in English.

SUBMISSIONS

Submissions MUST contain:

1. An abstract of 150-250 words, consisting of statement of purpose,
   paper description, and your conclusions or final summary.

2. A brief biography.

3. The details listed below:

   SESSION TITLE: _

  _

   TITLE (eg Dr/Mr/Mrs/Ms):   _

   NAME:  _

   JOB TITLE: _

   ORGANIZATION/AFFILIATION:  _

   ORGANIZATION'S WWW URL:_

   OWN WWW URL:   _

   ADDRESS FOR PAPER MAIL:_

  __

Unicode 3.2: BETA files updated

2001-12-20 Thread Kenneth Whistler

To all concerned:

The beta files for the Unicode 3.2 version of the Unicode
Character Database currently posted at:

http://www.unicode.org/Public/BETA/Unicode3.2/

have been refreshed again.

This refresh fixed a number of small problems that have been
reported to date in the files and brings the derived data files
back in synch with the main property files.

Additionally, the first revision of all of the documentation
files (UnicodeCharacterData.html, UnicodeData.html, etc.) has
now been added to the directory.

And an updated version of the character index, Index-3.2.0d3.txt,
has been added. This should make it easier to find particular
characters, particularly among the large number of math symbols
newly added to Unicode.

As noted on the BETA information page:

http://www.unicode.org/versions/beta.html

any bug reports regarding problems in the data files (or
documentation files) should be addressed to [EMAIL PROTECTED]
and should include "Beta Bug Report" in the subject line.

A separate notice will be sent out when PDUTR #28, Unicode 3.2,
is posted for public review. This should happen quite soon, either
today or tomorrow.

--Ken




Character Model for the World Wide Web

2001-12-20 Thread Misha . Wolf

I'm very pleased to be able to announce the publication of a new Working
Draft of the Character Model for the World Wide Web:
   http://www.w3.org/TR/charmod/

An extract from the document follows:

   Abstract

   This Architectural Specification provides authors of specifications,
   software developers, and content developers with a common reference for
   interoperable text manipulation on the World Wide Web. Topics addressed
   include encoding identification, early uniform normalization, string
   identity matching, string indexing, and URI conventions, building on the
   Universal Character Set, defined jointly by Unicode and ISO/IEC 10646.
   Some introductory material on characters and character encodings is also
   provided.

   Status of this Document

   This section describes the status of this document at the time of its
   publication. Other documents may supersede this document. The latest
   status of this series of documents is maintained at the W3C.

   This is a W3C Working Draft published between the first Last Call
   Working Draft of 26 January 2001 and a planned second Last Call. This
   interim publication is used to document the further progress made on
   addressing the comments received during the first Last Call. A list of
   last call comments with their status can be found in the disposition of
   comments (Members only).

   Work is still ongoing on addressing the comments received during the
   first Last Call. We do not encourage comments on this Working Draft;
   instead we ask reviewers to wait for the second Last Call. We will
   announce the second Last Call on the W3C Internationalization public
   mailing list ([EMAIL PROTECTED]; subscribe). Comments from the
   public and from organizations outside the W3C may be sent to
   [EMAIL PROTECTED] (archive). Comments from W3C Working Groups may
   be sent directly to the Internationalization Interest Group
   ([EMAIL PROTECTED]), with cross-posting to the originating Group, to
   facilitate discussion and resolution.

   Due to the architectural nature of this document, it affects a large
   number of W3C Working Groups, but also software developers, content
   developers, and writers and users of specifications outside the W3C that
   have to interface with W3C specifications.

   This document is published as part of the W3C Internationalization
   Activity by the Internationalization Working Group (Members only), with
   the help of the Internationalization Interest Group. The
   Internationalization Working Group will not allow early implementation
   to constrain its ability to make changes to this specification prior to
   final release. Publication as a Working Draft does not imply endorsement
   by the W3C Membership. It is inappropriate to use W3C Working Drafts as
   reference material or to cite them as other than "work in progress". A
   list of current W3C Recommendations and other technical documents can be
   found at http://www.w3.org/TR.

   For information about the requirements that informed the development of
   important parts of this specification, see Requirements for String
   Identity Matching and String Indexing [CharReq].

Misha Wolf
W3C I18N WG Chair





- ---
Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.




A brief history of www.unicode.org

2001-12-20 Thread Misha . Wolf

See:
   http://web.archive.org/web/*/http://www.unicode.org

Misha





- ---
Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.




Re: Microsoft input method, 950, and Unicode mapping

2001-12-20 Thread Kevin Bracey

In message <[EMAIL PROTECTED]>
  Asmus Freytag <[EMAIL PROTECTED]> wrote:

> Because of this, you  get better interoperation among CJK code sets with
> using CIRCLED PLUS  instead of EARTH, but at the cost of having obscured
> the semantics (i.e.  compromised interoperation with Unicode-based
> systems).

I see. In constructing my tables, I was trying to identify semantics by
comparing surrounding and other characters in groups, so Earth/Sun was my
choice.

> > I was able to come up with a good Big5 mapping by taking the best ideas
> > from various Big5 and CNS11643 tables on the net, then making sure each
> > of those Unicode compatibility characters was used once, AND IN THE ORDER
> > THEY APPEAR IN UNICODE.
> 
> That's not always a good idea. Unicode order often does not follow any 
> standard, even when characters are intended to map.

But in this case, it seems clear that the correlation is too close to be
coincidental. U+FE30 to U+FE4E can extremely plausibly be found in order
in CNS11643/Big5. U+FE4F is out of order - the only exception. In the next
group, U+FE50 to U+FE6B again appear to appear in order. I would love to have
this confirmed by whoever placed the characters in Unicode. Here's my deduced
correlation for Big5:

0xA14A  0xFE30  # PRESENTATION FORM FOR VERTICAL TWO DOT LEADER
0xA155  0xFE31  # PRESENTATION FORM FOR VERTICAL EM DASH
0xA157  0xFE32  # PRESENTATION FORM FOR VERTICAL EN DASH
0xA159  0xFE33  # PRESENTATION FORM FOR VERTICAL LOW LINE
0xA15B  0xFE34  # PRESENTATION FORM FOR VERTICAL WAVY LOW LINE
0xA15C  0xFE4F  # WAVY LOW LINE
0xA15F  0xFE35  # PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS
0xA160  0xFE36  # PRESENTATION FORM FOR VERTICAL RIGHT PARENTHESIS
0xA163  0xFE37  # PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET
0xA164  0xFE38  # PRESENTATION FORM FOR VERTICAL RIGHT CURLY BRACKET
0xA167  0xFE39  # PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET
0xA168  0xFE3A  # PRESENTATION FORM FOR VERTICAL RIGHT TORTOISE SHELL BRACKET
0xA16B  0xFE3B  # PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET
0xA16C  0xFE3C  # PRESENTATION FORM FOR VERTICAL RIGHT BLACK LENTICULAR BRACKET
0xA16F  0xFE3D  # PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET
0xA170  0xFE3E  # PRESENTATION FORM FOR VERTICAL RIGHT DOUBLE ANGLE BRACKET
0xA173  0xFE3F  # PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET
0xA174  0xFE40  # PRESENTATION FORM FOR VERTICAL RIGHT ANGLE BRACKET
0xA177  0xFE41  # PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET
0xA178  0xFE42  # PRESENTATION FORM FOR VERTICAL RIGHT CORNER BRACKET
0xA17B  0xFE43  # PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET
0xA17C  0xFE44  # PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET
0xA1C6  0xFE49  # DASHED OVERLINE
0xA1C7  0xFE4A  # CENTRELINE OVERLINE
0xA1C8  0xFE4D  # DASHED LOW LINE
0xA1C9  0xFE4E  # CENTRELINE LOW LINE
0xA1CA  0xFE4B  # WAVY OVERLINE
0xA1CB  0xFE4C  # DOUBLE WAVY OVERLINE


0xA14D  0xFE50  # SMALL COMMA
0xA14E  0xFE51  # SMALL IDEOGRAPHIC COMMA
0xA14F  0xFE52  # SMALL FULL STOP
0xA151  0xFE54  # SMALL SEMICOLON
0xA152  0xFE55  # SMALL COLON
0xA153  0xFE56  # SMALL QUESTION MARK
0xA154  0xFE57  # SMALL EXCLAMATION MARK
0xA15A  0xFE58  # SMALL EM DASH
0xA17D  0xFE59  # SMALL LEFT PARENTHESIS
0xA17E  0xFE5A  # SMALL RIGHT PARENTHESIS
0xA1A1  0xFE5B  # SMALL LEFT CURLY BRACKET
0xA1A2  0xFE5C  # SMALL RIGHT CURLY BRACKET
0xA1A3  0xFE5D  # SMALL LEFT TORTOISE SHELL BRACKET
0xA1A4  0xFE5E  # SMALL RIGHT TORTOISE SHELL BRACKET
0xA1CC  0xFE5F  # SMALL NUMBER SIGN
0xA1CD  0xFE60  # SMALL AMPERSAND
0xA1CE  0xFE61  # SMALL ASTERISK
0xA1DE  0xFE62  # SMALL PLUS SIGN
0xA1DF  0xFE63  # SMALL HYPHEN-MINUS
0xA1E0  0xFE64  # SMALL LESS-THAN SIGN
0xA1E1  0xFE65  # SMALL GREATER-THAN SIGN
0xA1E2  0xFE66  # SMALL EQUALS SIGN
0xA242  0xFE68  # SMALL REVERSE SOLIDUS
0xA24C  0xFE69  # SMALL DOLLAR SIGN
0xA24D  0xFE6A  # SMALL PERCENT SIGN
0xA24E  0xFE6B  # SMALL COMMERCIAL AT

-- 
Kevin Bracey, Principal Software Engineer
Pace Micro Technology plc Tel: +44 (0) 1223 518566
645 Newmarket RoadFax: +44 (0) 1223 518526
Cambridge, CB5 8PB, United KingdomWWW: http://www.pace.co.uk/