RE: Indic editing (was: RE: The real solution)
A dead thread, but worth to note that: On Tue, 18 Dec 2001, Marco Cimarosti wrote: > > Would you kindly tell me how i can construct such input methods and > > ultimately create fonts. > > Er... It is not so easy to do this kind of things yourself. You should buy > (or, however, "get") software that properly supports Devanagari. You can also get Pango (http://www.pango.org/). It's a Free library that supports Unicode's Devanagari and other Indic scripts on both Linux and Windows. roozbeh
Call for Papers - 21st Unicode Conference - May 2002 - Dublin
Twenty-First International Unicode Conference (IUC21) Unicode and the Web: The Global Connection http://www.unicode.org/iuc/iuc21 May 14-17, 2002 Dublin, Ireland > > > > > > > C A L L F O R P A P E R S < < < < < < < Submissions due: January 11, 2002 Notification date: February 1, 2002 Completed papers due : February 22, 2002 (in electronic form and camera-ready paper form) * * * * * The Unicode Standard has become the foundation for all modern text processing. It is used on large machines, tiny portable devices, and for distributed processing across the Internet. The standard brings cost-reducing efficiency to international applications and enables the exchange of text in an ever increasing list of natural languages. New technologies and innovative Internet applications, as well as the evolving Unicode Standard, bring new challenges along with their new capabilities. This technical conference will explore the opportunities created by the latest advances and how to leverage them, as well as potential pitfalls to be aware of, and problem areas that need further research. We invite you to submit papers which either define the software of tomorrow, demonstrate best practice with today's software, or articulate problems that must be solved before further advances can occur. Papers should discuss subjects in the context of Unicode, internationalization or localization. You can view the programs of previous conferences at: http://www.unicode.org/unicode/conference/about-conf.html Conference attendees are generally involved in either the development, deployment or use of Unicode software or content, or the globalization of software and the Internet. They include managers, software engineers, systems analysts, font designers, graphic designers, content developers, technical writers, and product marketing personnel. THEME & TOPICS Computing with Unicode is the overall theme of the Conference. Presentations should be geared towards a technical audience. Topics of interest include, but are not limited to, the following (within the context of Unicode, internationalization or localization): - UTFs: Not enough or too many? - Security concerns e.g. Avoiding the spoofing of UTF-8 data - Impact of new encoding standards - Implementing Unicode: Practical and political hurdles - Portable devices - Implementing new features of recent versions of Unicode - Algorithms (e.g. normalization, collation, bidirectional) - Programming languages and libraries (Java, Perl, et al) - The World Wide Web (WWW) - Search engines - Library and archival concerns - Operating systems - Databases - Large scale networks - Government applications - Evaluations (case studies, usability studies) - Natural language processing - Migrating legacy applications - Cross platform issues - Printing and imaging - Optimizing performance of systems and applications - Testing applications - XML and Web protocols - Business models for software development (e.g. Open source) SESSIONS The Conference Program will provide a wide range of sessions including: - Keynote presentations - Workshops/Tutorials - Technical presentations - Panel sessions All sessions except the Workshops/Tutorials will be of 40 minute duration. In some cases, two consecutive 40 minute program slots may be devoted to a single session. The Workshops/Tutorials will each last approximately three hours. They should be designed to stimulate discussion and participation, using slides and demonstrations. PUBLICITY If your paper is accepted, your details will be included in the Conference brochure and Web pages and the paper itself will appear on a Conference CD, with an optional printed book of Conference Proceedings. CONFERENCE LANGUAGE The Conference language is English. All submissions, papers and presentations should be provided in English. SUBMISSIONS Submissions MUST contain: 1. An abstract of 150-250 words, consisting of statement of purpose, paper description, and your conclusions or final summary. 2. A brief biography. 3. The details listed below: SESSION TITLE: _ _ TITLE (eg Dr/Mr/Mrs/Ms): _ NAME: _ JOB TITLE: _ ORGANIZATION/AFFILIATION: _ ORGANIZATION'S WWW URL:_ OWN WWW URL: _ ADDRESS FOR PAPER MAIL:_ __
Unicode 3.2: BETA files updated
To all concerned: The beta files for the Unicode 3.2 version of the Unicode Character Database currently posted at: http://www.unicode.org/Public/BETA/Unicode3.2/ have been refreshed again. This refresh fixed a number of small problems that have been reported to date in the files and brings the derived data files back in synch with the main property files. Additionally, the first revision of all of the documentation files (UnicodeCharacterData.html, UnicodeData.html, etc.) has now been added to the directory. And an updated version of the character index, Index-3.2.0d3.txt, has been added. This should make it easier to find particular characters, particularly among the large number of math symbols newly added to Unicode. As noted on the BETA information page: http://www.unicode.org/versions/beta.html any bug reports regarding problems in the data files (or documentation files) should be addressed to [EMAIL PROTECTED] and should include "Beta Bug Report" in the subject line. A separate notice will be sent out when PDUTR #28, Unicode 3.2, is posted for public review. This should happen quite soon, either today or tomorrow. --Ken
Character Model for the World Wide Web
I'm very pleased to be able to announce the publication of a new Working Draft of the Character Model for the World Wide Web: http://www.w3.org/TR/charmod/ An extract from the document follows: Abstract This Architectural Specification provides authors of specifications, software developers, and content developers with a common reference for interoperable text manipulation on the World Wide Web. Topics addressed include encoding identification, early uniform normalization, string identity matching, string indexing, and URI conventions, building on the Universal Character Set, defined jointly by Unicode and ISO/IEC 10646. Some introductory material on characters and character encodings is also provided. Status of this Document This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this series of documents is maintained at the W3C. This is a W3C Working Draft published between the first Last Call Working Draft of 26 January 2001 and a planned second Last Call. This interim publication is used to document the further progress made on addressing the comments received during the first Last Call. A list of last call comments with their status can be found in the disposition of comments (Members only). Work is still ongoing on addressing the comments received during the first Last Call. We do not encourage comments on this Working Draft; instead we ask reviewers to wait for the second Last Call. We will announce the second Last Call on the W3C Internationalization public mailing list ([EMAIL PROTECTED]; subscribe). Comments from the public and from organizations outside the W3C may be sent to [EMAIL PROTECTED] (archive). Comments from W3C Working Groups may be sent directly to the Internationalization Interest Group ([EMAIL PROTECTED]), with cross-posting to the originating Group, to facilitate discussion and resolution. Due to the architectural nature of this document, it affects a large number of W3C Working Groups, but also software developers, content developers, and writers and users of specifications outside the W3C that have to interface with W3C specifications. This document is published as part of the W3C Internationalization Activity by the Internationalization Working Group (Members only), with the help of the Internationalization Interest Group. The Internationalization Working Group will not allow early implementation to constrain its ability to make changes to this specification prior to final release. Publication as a Working Draft does not imply endorsement by the W3C Membership. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR. For information about the requirements that informed the development of important parts of this specification, see Requirements for String Identity Matching and String Indexing [CharReq]. Misha Wolf W3C I18N WG Chair - --- Visit our Internet site at http://www.reuters.com Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
A brief history of www.unicode.org
See: http://web.archive.org/web/*/http://www.unicode.org Misha - --- Visit our Internet site at http://www.reuters.com Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
Re: Microsoft input method, 950, and Unicode mapping
In message <[EMAIL PROTECTED]> Asmus Freytag <[EMAIL PROTECTED]> wrote: > Because of this, you get better interoperation among CJK code sets with > using CIRCLED PLUS instead of EARTH, but at the cost of having obscured > the semantics (i.e. compromised interoperation with Unicode-based > systems). I see. In constructing my tables, I was trying to identify semantics by comparing surrounding and other characters in groups, so Earth/Sun was my choice. > > I was able to come up with a good Big5 mapping by taking the best ideas > > from various Big5 and CNS11643 tables on the net, then making sure each > > of those Unicode compatibility characters was used once, AND IN THE ORDER > > THEY APPEAR IN UNICODE. > > That's not always a good idea. Unicode order often does not follow any > standard, even when characters are intended to map. But in this case, it seems clear that the correlation is too close to be coincidental. U+FE30 to U+FE4E can extremely plausibly be found in order in CNS11643/Big5. U+FE4F is out of order - the only exception. In the next group, U+FE50 to U+FE6B again appear to appear in order. I would love to have this confirmed by whoever placed the characters in Unicode. Here's my deduced correlation for Big5: 0xA14A 0xFE30 # PRESENTATION FORM FOR VERTICAL TWO DOT LEADER 0xA155 0xFE31 # PRESENTATION FORM FOR VERTICAL EM DASH 0xA157 0xFE32 # PRESENTATION FORM FOR VERTICAL EN DASH 0xA159 0xFE33 # PRESENTATION FORM FOR VERTICAL LOW LINE 0xA15B 0xFE34 # PRESENTATION FORM FOR VERTICAL WAVY LOW LINE 0xA15C 0xFE4F # WAVY LOW LINE 0xA15F 0xFE35 # PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS 0xA160 0xFE36 # PRESENTATION FORM FOR VERTICAL RIGHT PARENTHESIS 0xA163 0xFE37 # PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET 0xA164 0xFE38 # PRESENTATION FORM FOR VERTICAL RIGHT CURLY BRACKET 0xA167 0xFE39 # PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET 0xA168 0xFE3A # PRESENTATION FORM FOR VERTICAL RIGHT TORTOISE SHELL BRACKET 0xA16B 0xFE3B # PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET 0xA16C 0xFE3C # PRESENTATION FORM FOR VERTICAL RIGHT BLACK LENTICULAR BRACKET 0xA16F 0xFE3D # PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET 0xA170 0xFE3E # PRESENTATION FORM FOR VERTICAL RIGHT DOUBLE ANGLE BRACKET 0xA173 0xFE3F # PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET 0xA174 0xFE40 # PRESENTATION FORM FOR VERTICAL RIGHT ANGLE BRACKET 0xA177 0xFE41 # PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET 0xA178 0xFE42 # PRESENTATION FORM FOR VERTICAL RIGHT CORNER BRACKET 0xA17B 0xFE43 # PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET 0xA17C 0xFE44 # PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET 0xA1C6 0xFE49 # DASHED OVERLINE 0xA1C7 0xFE4A # CENTRELINE OVERLINE 0xA1C8 0xFE4D # DASHED LOW LINE 0xA1C9 0xFE4E # CENTRELINE LOW LINE 0xA1CA 0xFE4B # WAVY OVERLINE 0xA1CB 0xFE4C # DOUBLE WAVY OVERLINE 0xA14D 0xFE50 # SMALL COMMA 0xA14E 0xFE51 # SMALL IDEOGRAPHIC COMMA 0xA14F 0xFE52 # SMALL FULL STOP 0xA151 0xFE54 # SMALL SEMICOLON 0xA152 0xFE55 # SMALL COLON 0xA153 0xFE56 # SMALL QUESTION MARK 0xA154 0xFE57 # SMALL EXCLAMATION MARK 0xA15A 0xFE58 # SMALL EM DASH 0xA17D 0xFE59 # SMALL LEFT PARENTHESIS 0xA17E 0xFE5A # SMALL RIGHT PARENTHESIS 0xA1A1 0xFE5B # SMALL LEFT CURLY BRACKET 0xA1A2 0xFE5C # SMALL RIGHT CURLY BRACKET 0xA1A3 0xFE5D # SMALL LEFT TORTOISE SHELL BRACKET 0xA1A4 0xFE5E # SMALL RIGHT TORTOISE SHELL BRACKET 0xA1CC 0xFE5F # SMALL NUMBER SIGN 0xA1CD 0xFE60 # SMALL AMPERSAND 0xA1CE 0xFE61 # SMALL ASTERISK 0xA1DE 0xFE62 # SMALL PLUS SIGN 0xA1DF 0xFE63 # SMALL HYPHEN-MINUS 0xA1E0 0xFE64 # SMALL LESS-THAN SIGN 0xA1E1 0xFE65 # SMALL GREATER-THAN SIGN 0xA1E2 0xFE66 # SMALL EQUALS SIGN 0xA242 0xFE68 # SMALL REVERSE SOLIDUS 0xA24C 0xFE69 # SMALL DOLLAR SIGN 0xA24D 0xFE6A # SMALL PERCENT SIGN 0xA24E 0xFE6B # SMALL COMMERCIAL AT -- Kevin Bracey, Principal Software Engineer Pace Micro Technology plc Tel: +44 (0) 1223 518566 645 Newmarket RoadFax: +44 (0) 1223 518526 Cambridge, CB5 8PB, United KingdomWWW: http://www.pace.co.uk/