Re: What characters have baseline?

2002-04-19 Thread Vladimir Ivanov

Philipp Reichmuth wrote:

> I don't think it's fixed 27.5° in handwritten script, it varies quite
> considerably, partly depending on how much text has to fit in the line
> in calligraphy. In ordinary handwriting, the angle easily reaches 45°
> or more.

I couldn't find any reference in books about such an angle. But it is an
opinion of a Persian calligraph. It has something to do with the Golden
Section (90°*0.618/2). The closer to this angle, the more beautiful is your
handwriting considered to be (see Golden Section in Art).

Vladimir Ivanov




Zub3nst3.png
Description: Binary data


Re: PS names for glyphs corresponding to non-BMP chars

2002-04-19 Thread John Hudson

At 13:31 4/19/2002, James H. Cloos Jr. wrote:

>The latest docs I've seen indicate four hex chars in the uni names
>for glyphs corresponding to BMP chars.  What should be done for glyphs
>corresponding to characters in the supplementary planes?

Adobe are supposed to have posted an update to their glyph naming rules. 
The basic upshot of this update are that non-BMP characters should be named 
using scalar values with the prefix 'u', e.g. u344DE. Ligated or glyphs 
otherwise representing more than one character should use the underscore 
convention: u344DE_u3456A. Note that the ligature glyph name form 
uni04560368 is limited to BMP characters only. Unencoded variant forms 
should use the dot convention, as current with BMP characters: u344DE.alt, 
u344DE.swash, etc..

In future software, you should be able to use the 'u' prefix for either BMP 
or non-BMP characters, but for backwards compatibility you probably should 
use 'uni' for the former.

I should let someone from Adobe answer your other questions.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

Last words of Jesuit grammarian Dominique Bouhours:
"I am about to — or I am going to — die; either expression is used."





Re: browsers and unicode surrogates

2002-04-19 Thread Stefan Persson

- Original Message -
From: "Steffen Kamp" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: den 19 april 2002 23:25
Subject: Re: browsers and unicode surrogates


> I am not sure if your UTF-16 and UTF-32 test pages really conform to the
> HTML standard. The server states a content type of "text/html" without
> charset information. From the content type a browser should therefore
> expect pure ASCII - at least until the META tag defining the documents
> character encoding.

I put this in a test HTML file:



IE5.5 identified that as "Unicode." However, it displayed all text after
that point as if it were UTF-8. The same thing happened with .

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: SCSU compression (WAS: RE: Thai word list)

2002-04-19 Thread Markus Scherer

Yves Arrouye wrote:

> Seriously, SCSU is fine for some uses, but in this example, was definitely
> not the best way to appreciate a reduction in file size.


Not alone, right.


> By the 20% you mean an additional 20% by doing SCSU+gzip versus just gzip,
> right?


Yep.

markus





Re: browsers and unicode surrogates

2002-04-19 Thread Steffen Kamp

>I have added a couple more variations of the Unicode supplementary
>characters example page, for utf-16 and utf-32.

I am not sure if your UTF-16 and UTF-32 test pages really conform to the
HTML standard. The server states a content type of "text/html" without
charset information. From the content type a browser should therefore
expect pure ASCII - at least until the META tag defining the documents
character encoding. 

>From the HTML 4.01 specification , section 5.2.2:

"The META declaration must only be used when the character encoding is
organized such that ASCII-valued bytes stand for ASCII characters (at
least until the META element is parsed)."

Your documents, however, just start with a BOM and I couldn't find
anything stating that a BOM would be a valid way of specifying the
character encoding.
Although some browsers seem to guess the character encoding from an
available BOM I wouldn't expect them to do so when there usually are
other ways of determining this information.

To get a second opinion I asked w3.org's online validation service to
check your UTF-16 document with auto detection of the character encoding.
()
The Validator complained about the BOM as well as (not surprisingly) a
lot of ASCII zero (0x00) characters.
However, when giving the validator a ASCII-only document with a META tag
specifying UTF-16 as encoding (just for testing) it says that it does not
yet support this encoding, so I don't fully trust the validator in this case.

Steffen

-- 
Steffen Kamp
mailto:[EMAIL PROTECTED]
http://homepage.mac.com/earthlingsoft





LAST Call for Papers - 22nd Unicode Conference - Sep 2002 - San Jose,CA

2002-04-19 Thread Misha . Wolf

 Twenty-second International Unicode Conference (IUC22)
 Unicode and the Web: Evolution or Revolution?
http://www.unicode.org/iuc/iuc22
  September 9-13, 2002
  San Jose, California
***
Call for Papers >>> Just 3 weeks to go >>> Send in your submission now!
***
 Submissions due: May 10, 2002
Notification date: May 31, 2002
  Completed papers due : June 21, 2002
(in electronic form and camera-ready paper form)

The software industry continues its rapid growth and change.  In this
year alone, Unicode 3.2 was released and several new proposals for the
Internet and the World Wide Web were promoted to standards.  Web
Services is the latest buzz.  Are the vendors of software that support
these technologies keeping up?  How can you be sure that you are
deploying software components that work well together today and in the
future?  This Conference is where you go to find out.  Experts will
describe the latest changes to the Unicode standard and the other
standards used for e-business today.  You will also learn about the best
practices for utilizing, integrating and deploying these technologies
based on real-world examples and experience.  Demonstrations are often
provided.

We invite you to submit papers which either define the software of
tomorrow, demonstrate best practice with today's software, or articulate
problems that must be solved before further advances can occur.  Papers
should discuss subjects in the context of Unicode, internationalization
or localization.  You can view the programs of previous Conferences at:
http://www.unicode.org/unicode/conference/about-conf.html

Conference attendees are generally involved in either the development,
deployment or use of Unicode software or content, or the globalization
of software and the Internet.  They include managers, software engineers,
systems analysts, font designers, graphic designers, content developers,
technical writers, and product marketing personnel.

THEME & TOPICS

Computing with Unicode is the overall theme of the Conference.
Presentations should be geared towards a technical audience.  Topics of
interest include, but are not limited to, the following (within the
context of Unicode, internationalization or localization):

- Web Services
- XML and related specifications
- The World Wide Web (WWW)
- Portable devices
- UTFs: Not enough or too many?
- Security concerns e.g. Avoiding the spoofing of UTF-8 data
- Impact of new encoding standards
- Implementing Unicode: Practical and political hurdles
- Implementing new features of recent versions of Unicode
- Algorithms (e.g. normalization, collation, bidirectional)
- Programming languages and libraries (Java, Perl, et al)
- Search engines
- Library and archival concerns
- Operating systems
- Databases
- Large scale networks
- Government applications
- Evaluations (case studies, usability studies)
- Natural language processing
- Migrating legacy applications
- Cross platform issues
- Printing and imaging
- Optimizing performance of systems and applications
- Testing applications
- Business models for software development (e.g. Open source)

SESSIONS

The Conference Program will provide a wide range of sessions including:
- Keynote presentations
- Workshops/Tutorials
- Technical presentations
- Panel sessions

All sessions except the Workshops/Tutorials will be of 40 minute
duration.  In some cases, two consecutive 40 minute program slots may be
devoted to a single session.

The Workshops/Tutorials will each last approximately three hours.  They
should be designed to stimulate discussion and participation, using
slides and demonstrations.

PUBLICITY

If your paper is accepted, your details will be included in the
Conference brochure and Web pages and the paper itself will appear on a
Conference CD, with an optional printed book of Conference Proceedings.

CONFERENCE LANGUAGE

The Conference language is English.  All submissions, papers and
presentations should be provided in English.

SUBMISSIONS

Submissions MUST contain:

1. An abstract of 150-250 words, consisting of statement of purpose,
   paper description, and your conclusions or final summary.

2. A brief biography.

3. The details listed below:

   SESSION TITLE: _

  _

   TITLE (eg Dr/Mr/Mrs/Ms):   _

   NAME:  _

   JOB TITLE: _

   ORGANIZATION/AFFILIATION:  _

   ORGANIZATION'S WWW URL:_

   OWN WW

RE: SCSU compression (WAS: RE: Thai word list)

2002-04-19 Thread Yves Arrouye

> This looks like a nice endorsement of SCSU:

:D

> It saves 59% just as a charset,
> and it saves almost 20% in a system with a "real compression".

I am all for SCSU as a charset (after my tools can view it properly), but
that was not the use there. OTOH there is gzip encoding in HTTP 1.1 :)
Seriously, SCSU is fine for some uses, but in this example, was definitely
not the best way to appreciate a reduction in file size.

By the 20% you mean an additional 20% by doing SCSU+gzip versus just gzip,
right?

YA





PS names for glyphs corresponding to non-BMP chars

2002-04-19 Thread James H. Cloos Jr.

The latest docs I've seen indicate four hex chars in the uni names
for glyphs corresponding to BMP chars.  What should be done for glyphs
corresponding to characters in the supplementary planes?

Will using five or six hex chars break any software¹ out in the wild?

Also, would setting up lig pairs such as:

C xxx ; WX 0 ; N uniD875 ; B 0 0 0 0 ; L uniDC00 uni1D400 ; L uniDC01 uni1D401 ...

(to use afm syntax) be reasonable to try to kludge surrogate support
into software that doesn't grok them natively?²

-JimC

¹ Any /useful/ software, anyway. :)

² could the whole set of U+D875 U+Dxxx pairs even work in an afm?
  I don't see anything in technote 5004 about line-length limitations
  or support for a continuation escape such as (in C) "\\\n".

P.S.  For the purpose of this note please presume I got the
  utf-16 surrogate pairs correct for those plane 1 characters





RE: Thai word list

2002-04-19 Thread Miikka-Markus Alhonen


On 19-Apr-02 Yves Arrouye wrote:
>> If you can process SCSU, and would appreciate a 59% reduction in file
>> size, try:
>> 
>> http://home.adelphia.net/~dewell/th18057-scsu.txt(135,731 bytes)
> 
> Not to knock down SCSU, but if it had been gzipped instead, the resulting
> file would be about half that size: 70,912 bytes. (The gzipped SCSU-encoded
> file is 57,987 itself).

With bzip2 the result is something I wouldn't have expected, when comparing to
gzip. The original file would be 61840 bytes and the SCSU-file 61389 bytes.
The difference is only 451 bytes. Does bzip2 do some kind of a SCSU by itself
already?

Best regards,
Miikka-Markus Alhonen





Re: Thai word list

2002-04-19 Thread Markus Scherer

Yves Arrouye wrote:

> Not to knock down SCSU, but if it had been gzipped instead, the resulting
> file would be about half that size: 70,912 bytes. (The gzipped SCSU-encoded
> file is 57,987 itself).


This looks like a nice endorsement of SCSU:

It saves 59% just as a charset,
and it saves almost 20% in a system with a "real compression".

markus





Re: browsers and unicode surrogates

2002-04-19 Thread jshin




On Fri, 19 Apr 2002, Tom Gewecke wrote:

> >I have added a couple more variations of the Unicode supplementary
> >characters example page, for utf-16 and utf-32.
>
> I had the impression that it was not really practical to use web pages with
> these encodings over the internet, because they do not preserve ascii and
> are not compatible with html.  Could someone enlighten me on this?

  UTF-16 and UTF-32 have drawbacks you mentioned and may not be
practical. (Personally, I would never put up html files in those encodings
other than for testing purpose.) Nonetheless, neither of them is forbidden
by any standard.  Actually, W3 html standard explicitly mentions them
as possible encodings for html files.

  With BOM at the beginning, Netscape 4.x, Netscape 6.x/Mozilla and MS
IE 5.x/6.x can handle them without much problem except that support
for characters above BMP varies from browser to browser as Tex tried to
demonstrate in his test pages. IIRC, none of those browsers has UTF-16
and UTF-32 'visible' in 'Encoding' menu.  UTF-16 and UTF-32 entries in
'Encoding' menu get 'exposed' only when users try to view UTF-16 and
UTF-32 encoded pages.

  Jungshik Shin





Re: browsers and unicode surrogates

2002-04-19 Thread David Starner

On Fri, Apr 19, 2002 at 06:27:36AM -0700, Tom Gewecke wrote:
> I had the impression that it was not really practical to use web pages with
> these encodings over the internet, because they do not preserve ascii and
> are not compatible with html.  Could someone enlighten me on this?

html has no such requirement. You can send EBCDIC across the net,
provided your web server sends the right Content-Type. As a practial
manner, most web browsers can detect and handle UTF-16, especially if
it's preceded by a BOM. (It is still required to add a Content-Type to
the header or have it sent by server, although this may not be true in
practice.)

-- 
David Starner - [EMAIL PROTECTED]
"It's not a habit; it's cool; I feel alive. 
If you don't have it you're on the other side." 
- K's Choice (probably referring to the Internet)




Re: browsers and unicode surrogates

2002-04-19 Thread Tom Gewecke

>I have added a couple more variations of the Unicode supplementary
>characters example page, for utf-16 and utf-32.

I had the impression that it was not really practical to use web pages with
these encodings over the internet, because they do not preserve ascii and
are not compatible with html.  Could someone enlighten me on this?