date:20020826

Re: SC UniPad 0.99 released.

2002-08-26 Thread Doug Ewell

Jungshik Shin  wrote:

> On several occasions, I heard  about it on this mailing list and
> finally my curiosity drove me to try it. Unfortunately, I was mightly
> disappointed.  At first, I was intrigued by their claim that it
> supports Hangul Jamos.  I've seen some false claims that Hangul
> Jamos is supported and wanted to see if it really support them. Well,
> it does not do any better than most other fonts/software that made
> that claim. It just treats them as 'spacing characters' instead of
> combining characters. Basically, it's useless except for making
> Unicode code chart (so is Arial MS Unicode.)

This is one of those cases where the verb "support" is so flexible that
it loses meaning.  UniPad does include glyphs for individual jamos as
well as precomposed Hangul syllables, which is more than most
non-Korean-specific TrueType fonts can offer.  But it does not provide
any mechanism for combining jamos into syllables, which of course is
required for proper handling of Korean.  Again, I don't know of any
other mainstream Windows tools or fonts that can do this either
(although I'm sure there are Korean-specific tools that can).

> Then, I found its claim that it supports 300 languages(scripts). Wow !
> Does it properly support various South and Southeast Asian scripts?
> Again, it does not. It treats combining characters as spacing
> characters.  I don't think users of those scripts would regard SC
> Unipad as supporting their scripts/languages.

UniPad never claims to support 300 scripts.  I'm not even sure there are
300 scripts.  Probably half of the 300 "supported languages" are written
with the Latin script.  But again, Jungshik has a good point that true
"support" for Devanagari, Khmer, etc. really does imply shaping and
combining behavior, similar to what UniPad already provides for Arabic.

>   You may want to check out Yudit (http://www.yudit.org). Although its
> author is not so fond of MS Windows,

That's putting it mildly -- he refers to Win32 as a "joke-api" [sic] and
brags several times that he "will never touch Windows again."

> it works in MS Windows as well as in Unix/X11.

I haven't downloaded it yet, so I haven't seen whether this is true.  I
have my doubts, however, based on release notes like the following:

"CreateProcess works in an unexpected way  so the viewer won't find the
file. As a workaround execute yudit from the desktop shortcut."

No real Windows application gives a hoot whether you run it from a
desktop shortcut, the Start menu, a taskbar button, the Start | Run
dialog box, or a command-prompt window.

> It supports South and Southeast Asian scripts, Arabic,
> Hebrew with BIDI, Hangul Jamos(at the same level as Korean MS Office
> XP in terms of the number of syllables made out of Jamos) and many
> other (easier-to-deal-with) writing systems with various input
> methods/keyboards (including Unicode codepoint in hex input).  It can
> also  represent unrenderable characters with hex code in a box. If it
> lacks support for your script/language and you can code, you may be
> able to add it yourself either for yourself or with the author's help
> as I did for Hangul Jamos.

"If you can code" is a big stumbling block for anyone who is not a
programmer.  But certainly Yudit, like other similar open-source
projects, appears to be highly extensible.

-Doug Ewell
 Fullerton, California

Re: The Unicode Technical Committee meeting in Redmond, Washington State, USA.

2002-08-26 Thread Doug Ewell

Kenneth Whistler  wrote:

>> Is there an official press spokesperson for the meeting please?
>
> Well, I guess I just nominated myself. ;-)

A fine choice.  The ability to answer a reporter's questions BEFORE they
are asked is a rare gift in the field of press relations, and the mark
of a true professional.

-Doug Ewell
 Fullerton, California

>
> --Ken Whistler
>
> 16 August 2002
>
>>
>> William Overington
>>
>> 21 August 2002

Re: Revised proposal for "Missing character" glyph

2002-08-26 Thread Barry Caplan


At 09:49 PM 8/26/2002 -0400, John Cowan wrote:
>Nowadays, experts can detect mismatched character sets from the
>nature of the byte barf that appears on their screen.

And super-experts can read languages in "byte barf" as it is not random!

Barry Caplan
http://www.i18n.com

RE: Revised proposal for "Missing character" glyph

2002-08-26 Thread Carl W. Brown


Ken,

The little square boxes do not help much if you what to know exactly what
the missing characters are.  I do however feel that any solution to the
problems should be Unicode based.  If left to the vendors that may display
the code page characters and you are guessing again.

The tool idea is great but I do not see how it could be embedded in the OS
without changing the application.  It will also require user training.

I think that as we move away from code  page text we will find that the next
big problem will be characters that are missing from the font or sets of
fonts.  The trick will be to change the set of fonts.  This might require
trial and error if we do not have good diagnostic tools.

Implementing this change will probably be easier that using the special
symbols for the script which will also require special handling and many not
catch all errors.  This approach will also allow critical test that can not
be redisplayed to be deciphered.

This has been a pet peeve of mine having used the Fujitsu Shift JIS solution
and seen it work in a real live situation.

Carl



> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of Kenneth Whistler
> Sent: Monday, August 26, 2002 2:01 PM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: Re: Revised proposal for "Missing character" glyph
>
>
> [Resend of a response which got eaten by the Unicode email
> during the system maintenance last week. Carl already responded
> to me on this, but others may not have seen what he was
> responding to. --Ken]
>
>
> > Proposed unknown and missing character representation.  This would be an
> > alternate to method currently described in 5.3.
> >
> > The missing or unknown character would be represented as a series of
> > vertical hex digit pairs for each byte of the character.
>
> The problem I have with this is that is seems to be an overengineered
> approach that conflates two issues:
>
>   a. What does a font do when requested to display a character
>  (or sequence) for which it has no glyph.
>
>   b. What does a user do to diagnose text content that may be
>  causing a rendering failure.
>
> For the first problem, we already have a widespread approach that
> seems adequate. And other correspondents on this topic have pointed
> out that the particular approach of displaying up hex numbers for
> characters may pose technical difficulties for at least some font
> technologies.
>
> [snip]
>
> >
> > This representation would be recognized by untrained people as
> unrenderable
> > data or garbage.  So it would serve the same function as a missing glyph
> > character except that it would be different from normal glyphs
> so that they
> > would know that something was wrong and the text did not just
> happen to have
> > funny characters.
>
> I don't see any particular problem in training people to recognize when
> they are seeing their fonts' notdef glyphs. The whole concept of "seeing
> little boxes where the characters should be" is not hard to explain to
> people -- even to people who otherwise have difficulty with a lot of
> computer abstractions.
>
> Things will be better-behaved when applications finally get past the
> related but worse problem of screwing up the character encodings --
> which results in the more typical misdisplay: lots of recognizable
> glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that
> must be another piece of Korean spam mail in my mail tray.)
>
> >
> > It would aid people in finding the problem and for people with
> Unicode books
> > the text would be decipherable.  If the information was truly
> critical they
> > could have the text deciphered.
>
> Rather than trying to engineer a questionable solution into the fonts,
> I'd like to step back and ask what would better serve the user
> in such circumstances.
>
> And an approach which strikes me as a much more useful and extensible
> way to deal with this would be the concept of a "What's This?"
> text accessory. Essentially a small tool that a user could select
> a piece of text with (think of it like a little magnifying glass,
> if you will), which will then pop up the contents selected, deconstructed
> into its character sequence explicitly. Limited versions of such things
> exist already -- such as the tooltip-like popup windows for Asmus'
> Unibook program, which give attribute information for characters
> in the code chart. But I'm thinking of something a little more generic,
> associated with textedit/richedit type text editing areas (or associated
> with general word processing programs).
>
> The reason why such an approach is more extensible is that it is not
> merely focussed on the nondisplayable character glyph issue, but rather
> represents a general ability to "query" text, whether normally
> displayable or not. I could query a black box notdef glyph to find
> out what in the text caused its display; but I could just as well
> query a properly displayed Tel

Re: SC UniPad 0.99 released.

2002-08-26 Thread Jungshik Shin

On Mon, 26 Aug 2002, William Overington wrote:

> This latest version is SC UniPad 0.99 and is available for free download
> from the following address on the web.
>
> http://www.unipad.org

 On several occasions, I heard  about it on this mailing list and finally
my curiosity drove me to try it. Unfortunately, I was mightly
disappointed.  At first, I was intrigued by their claim that it
supports Hangul Jamos.  I've seen some false claims that Hangul
Jamos is supported and wanted to see if it really support them. Well,
it does not do any better than most other fonts/software that made that
claim. It just treats them as 'spacing characters' instead of combining
characters. Basically, it's useless except for making Unicode code chart
(so is Arial MS Unicode.)

Then, I found its claim that it supports 300 languages(scripts). Wow !
Does it properly support various South and Southeast Asian scripts?
Again, it does not. It treats combining characters as spacing characters.
I don't think users of those scripts would regard SC Unipad as supporting
their scripts/languages.

Its FAQ 4.2 has the following:

SC> We have to differentiate between the simple inclusion of
SC> the glyphs into the UniPad font and the implementation of special
SC> text processing algorithms. It's definitely our goal to finally support
SC> all CJK (Chinese, Japanese, Korean) characters and all Indic scripts
SC> (Devanagari, Gurmukhi, etc.).

Judging from the above, I think they are well aware that simply including
the nominal glyphs for scripts taken from the Unicode code chart in
the UniPad font is diffferent from supporting scripts.  In addition,
its list of general features makes it clear that it does not support
'combined rendering of non-spacing marks'.  I can't help wondering, then,
why they  list Hindi, Thai, Tibetan, Lao, Bengali and many other
South and Southeast Asian languages  in the list of supported languages.

> A particularly interesting new feature is that one may hold down the Control
> key and press the Q key and a small dialogue box appears within which one
> may enter the hexadecimal code for any Unicode character.  Upon pressing the

> I first learned of the existence of the UniPad program in a response to a
> question which I asked in this forum, so I am posting this note so that any

  You may want to check out Yudit (http://www.yudit.org). Although its
author is not so fond of MS Windows, it works in MS Windows as well
as in Unix/X11. It supports South and Southeast Asian scripts, Arabic,
Hebrew with BIDI, Hangul Jamos(at the same level as Korean MS Office XP
in terms of the number of syllables made out of Jamos) and many other
(easier-to-deal-with) writing systems with various input methods/keyboards
(including Unicode codepoint in hex input).  It can also  represent
unrenderable characters with hex code in a box. If it lacks support for
your script/language and you can code, you may be able to add it yourself
either for yourself or with the author's help as I did for Hangul Jamos.

  Jungshik

Re: Revised proposal for "Missing character" glyph

2002-08-26 Thread John Cowan


Kenneth Whistler scripsit:

> Things will be better-behaved when applications finally get past the
> related but worse problem of screwing up the character encodings --
> which results in the more typical misdisplay: lots of recognizable 
> glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that
> must be another piece of Korean spam mail in my mail tray.)

In the old days, experts could detect mismatched serial-line
connections based on the nature of the baud barf that the remote
system emitted.

Nowadays, experts can detect mismatched character sets from the
nature of the byte barf that appears on their screen.

-- 
John Cowan   [EMAIL PROTECTED]
"You need a change: try Canada"  "You need a change: try China"
--fortune cookies opened by a couple that I know

Re: Romanized Cyrillic bibliographic data--viable fonts?

2002-08-26 Thread James Kass



J. M. Craig wrote,

> ... If anyone has access to 
> the Arial Unicode MS font and can check to see if U+FE20 and U+FE21 
> combine properly, I'd be grateful--I don't want to spend the money to 
> get it if it won't solve the display problem!
> 

Unless a font is fixed width, Latin combiners can't currently
consistently combine well without "smart font technology" 
support enabled on the system.  So, don't blame the Arial 
Unicode MS font if these glyphs don't always merge well.  

While awaiting Latin OpenType support, it might be a good
idea to take a look at a well populated fixed width pan-Unicode 
font like Everson Mono.

Best regards,

James Kass.

Re: Romanized Cyrillic bibliographic data--viable fonts?

2002-08-26 Thread James Kass


James Kass wrote,

> ...would become:
> 
> Unicode 0078 0360 0077
>   
>

U+0360 is the double wide combining tilde.
U+0361 is the double wide combining inverted breve.

Oops.

Best regards,

James Kass.

Re: Romanized Cyrillic bibliographic data--viable fonts?

2002-08-26 Thread J M Craig

Thanks for the suggestion--of U+0361 (I don't think U+0360 is going to 
do what I want terribly well). I'm assuming that U+0361 IS in your font 
(I hadn't checked yet). One of the problems with that approach is that I 
don't have enough control over the conversion algorithm to make that 
work--or maybe I could make the right ligature half a non-translated 
character--hmm. I'll have to think about that. At any rate, what I'm 
working with is an algorithm that is much happier with round-trippable 
conversions (which the double breve wouldn't give me). So, no, I don't 
think that'll work. Shoot.

I appreciate your pointing out about the copyright issues--I try to take 
copyrights appropriately seriously. I am in contact with the developer 
of the font in question (from Agfa/Monotype) and I'm REALLY hoping 
they'll agree to add the characters in question. If anyone has access to 
the Arial Unicode MS font and can check to see if U+FE20 and U+FE21 
combine properly, I'd be grateful--I don't want to spend the money to 
get it if it won't solve the display problem!

James Kass wrote:

>J. M. Craig wrote,
>
>>... The ultimate problem is, I can't find an available font 
>>that properly supports the combining half marks FE20 and FE21.
>>
>Why not use U+0360 and U+0361 instead?
>
>>/ts/
>>Unicode 0078 FE20 0077 FE21
>>   
>>

>>...would become:
>>
>>Unicode 0078 0360 0077
>>  
>>
>>... or, three characters vs. four characters to write the same thing.
>>
>>
>
>James Kass,
>who is now adding U+FE20 .. U+FE23 to the font here.
> 
>Great!
>
John

Re: Romanized Cyrillic bibliographic data--viable fonts?

2002-08-26 Thread James Kass



J. M. Craig wrote,

> ... The ultimate problem is, I can't find an available font 
> that properly supports the combining half marks FE20 and FE21.
> 

Why not use U+0360 and U+0361 instead?

> /ts/
> Unicode 0078 FE20 0077 FE21
>

...would become:

Unicode 0078 0360 0077
  

... or, three characters vs. four characters to write the same thing.

> Any suggestions welcomed! Is there a tool out there that will allow you 
> to edit a font to add a couple of missing characters?
>

William Overington has mentioned the Softy editor.  Please keep
in mind that fonts are copyrighted material, and, mostly users
are forbidden to modify them, even for internal use purposes.

The best way to get characters added to a font is to ask the
font's developer.

Best regards,

James Kass,
who is now adding U+FE20 .. U+FE23 to the font here.

Re: The Unicode Technical Committee meeting in Redmond, Washington State, USA.

2002-08-26 Thread Kenneth Whistler


William Overington inquired:

> As many readers may know, the Unicode Technical Committee was due to start a
> four day meeting yesterday at the Redmond, Washington State, USA campus of
> Microsoft, that is, on 20 August 2002.
> 
> Here in England I am interested to know of what is happening and to learn of
> news from the meeting.

As Sarasvati has indicated, minutes will be publicly posted in a few weeks.
See:
 
http://www.unicode.org/unicode/consortium/utc-minutes.html

[BTW, the minutes from the February and April/May meetings have actually been 
approved, although their status has not been updated to "Approved" yet on the 
website page.]

> It is the early hours of the morning in Washington State at present.  It is
> hoped that when delegates get up for breakfast that they might look in their
> emails and make early morning responses, or perhaps arrange for an official
> briefing to be posted later in the day.
> 
> If I were conducting a live interview with the committee chairman or with an
> official spokesperson I would ask the following questions.

Unfortunately, the UTC has not yet arranged its television contract
with ESPN, since character encoding has not generally been considered
a mass-appeal spectator sport.

However, since I did attend the UTC meeting last week, I may be able to
provide up-to-date commentary regarding some of the questions which are
not better answered by waiting for the official minutes.

> * What was discussed yesterday (Tuesday) please, and what formal decisions,
> if any, were taken please?

Wait for the minutes.

> 
> * How many people attended please?

16 on Tuesday. 18 on Wednesday. Back down to 15(?) on Thursday and Friday.

> 
> * Is it only companies which are full members of the Unicode Consortium who
> send delegates to the meeting, or are there also representatives of
> organizations who do not vote in decisions present as well?

The latter.

> * Will there be a press statement at the close of the meeting please, and if
> so, will it also be posted in the Unicode mailing list please?

No, there will not be a press statement. Encoding of a VERTICAL LINE EXTENSION
character was not considered of such earth-shattering consequence that
it would lead to headlines in the technology press.

> * Has there been, or is there on the agenda, any discussion of the wording
> in the Unicode specification about the use of the Private Use Area and, if
> so, are any changes to that wording being implemented?

Not discussed by the UTC last week. This is in the purview of the editorial
committee.

> 
> * Has there been, or is there on the agenda, any discussion concerning the
> status of the code points U+FFF9 through to U+FFFC please?  There has been
> some discussion recently in the Unicode mailing list about these code
> points, as regards issues of U+FFF9 through to U+FFFB as an issue, the issue
> of using U+FFFC as a single issue, and the issue of using U+FFF9 through to
> U+FFFC all together.  Is the committee discussing these issues at all and,
> if so, are they discussing the matter of whether U+FFFC can be used in
> sending documents from a sender to a receiver please?  Is there any
> discussion of a possible rewording, or changing of meaning, of the wording
> about the U+FFF9 through to U+FFFC code points in the Unicode specification
> please?

Not discussed by the UTC last week. This is in the purview of the editorial
committee.

> 
> * Are any matters concerning how the Unicode specification interacts with
> the way that fonts are implemented being discussed please? 

Yes. In a general way, this ends up being discussed at every meeting. 

> If so, is due
> care being taken that as font format is not, at present, an international
> standards matter that therefore the committee must take great care to ensure
> that Unicode does not become dependent upon a usage, express or implied, of
> the intellectual property rights or format of any particular font format
> specification?

The UTC always attempts to exercise "due care" in what it considers, but it
is unclear just what clarification you are asking for here. The UTC does
not standardize font formats.

> * Is there any discussion of the possibility of adding further noncharacters
> please, considering either or both adding some more noncharacters in plane 0
> and a large block of noncharacters in one of the planes 1 through to 14?

No.

> * Is the committee discussing the issue of interpretation, namely as to how,
> if various people read the published specification so as to have different
> meanings, how people may receive a ruling as to the formally correct meaning
> of the wording of the specification.  This recently arose in relation to the
> U+FFFC character and has previously arisen in relation to what is correct
> usage of the Private Use Area, so there are at least two areas where the
> issue of interpretation has arisen.

No. The UTC is a standardization committee, not a court of law.

If a problem of interpreta

RE: Revised proposal for "Missing character" glyph

2002-08-26 Thread Carl W. Brown


William,

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of William Overington
> Sent: Friday, August 23, 2002 12:55 AM
> To: James Kass; Carl W. Brown; Unicode List
> Cc: [EMAIL PROTECTED]
> Subject: Re: Revised proposal for "Missing character" glyph
> 
> 
> James Kass wrote as follows.
> 
> quote
> 
> For non-BMP, how about a double tall glyph at the left as the
> plane signifier?  

I double high number or letter will look like a standard letter that will just be 
narrower unless you are displaying text in a narrow font.  In that case it will look 
like a separate character...

This will be very confusing.  Besides I don't like mixing bases and more than using 
octal for represents 8 bit bytes.  It was confusing to use base 4, base 8, base 8, 
base 4, base 8, base 8 etc.

How will you display the rest of the data.  Will you use 65536 glyphs?  That is a 
monster font.  Better would be to use the top 4 bits of the low order 2 bytes then the 
bottom 4 bits of the same bytes.  

In any case you are going to a lot of trouble to avoid vertical hex which is the 
simple solution.  Remember "keep it stupid, simple".

Carl

Re: Revised proposal for "Missing character" glyph

2002-08-26 Thread Kenneth Whistler


[Resend of a response which got eaten by the Unicode email
during the system maintenance last week. Carl already responded
to me on this, but others may not have seen what he was
responding to. --Ken]


> Proposed unknown and missing character representation.  This would be an
> alternate to method currently described in 5.3.
> 
> The missing or unknown character would be represented as a series of
> vertical hex digit pairs for each byte of the character.

The problem I have with this is that is seems to be an overengineered
approach that conflates two issues:

  a. What does a font do when requested to display a character
 (or sequence) for which it has no glyph.

  b. What does a user do to diagnose text content that may be
 causing a rendering failure.

For the first problem, we already have a widespread approach that
seems adequate. And other correspondents on this topic have pointed
out that the particular approach of displaying up hex numbers for
characters may pose technical difficulties for at least some font
technologies. 

[snip]
 
> 
> This representation would be recognized by untrained people as unrenderable
> data or garbage.  So it would serve the same function as a missing glyph
> character except that it would be different from normal glyphs so that they
> would know that something was wrong and the text did not just happen to have
> funny characters.

I don't see any particular problem in training people to recognize when
they are seeing their fonts' notdef glyphs. The whole concept of "seeing
little boxes where the characters should be" is not hard to explain to
people -- even to people who otherwise have difficulty with a lot of
computer abstractions.

Things will be better-behaved when applications finally get past the
related but worse problem of screwing up the character encodings --
which results in the more typical misdisplay: lots of recognizable 
glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that
must be another piece of Korean spam mail in my mail tray.)

> 
> It would aid people in finding the problem and for people with Unicode books
> the text would be decipherable.  If the information was truly critical they
> could have the text deciphered.

Rather than trying to engineer a questionable solution into the fonts,
I'd like to step back and ask what would better serve the user
in such circumstances.

And an approach which strikes me as a much more useful and extensible
way to deal with this would be the concept of a "What's This?"
text accessory. Essentially a small tool that a user could select
a piece of text with (think of it like a little magnifying glass,
if you will), which will then pop up the contents selected, deconstructed
into its character sequence explicitly. Limited versions of such things
exist already -- such as the tooltip-like popup windows for Asmus'
Unibook program, which give attribute information for characters
in the code chart. But I'm thinking of something a little more generic,
associated with textedit/richedit type text editing areas (or associated
with general word processing programs).

The reason why such an approach is more extensible is that it is not
merely focussed on the nondisplayable character glyph issue, but rather
represents a general ability to "query" text, whether normally
displayable or not. I could query a black box notdef glyph to find
out what in the text caused its display; but I could just as well
query a properly displayed Telugu glyph, for example, to find out what 
it was, as well.

This is comparable (although more point-oriented) to the concept of
giving people a source display for HTML, so they can figure out
what in the markup is causing rendering problems for their rich
text content.

[snip]

> This proposal would provide a standardized approach that vendors could adopt
> to clarify missing character rendering and reduce support costs.  By
> including this in the standard we could provide a cross vendor approach.
> This would provide a consistent solution.

In my opinion, the standard already provides a description of a cross-vendor
approach to the notdef glyph problem, with the advantage that it is
the de facto, widely adopted approach as well. As long as font vendors stay
away from making {p}'s and {q}'s their notdef glyphs, as I think we can
safely presume they will, and instead use variants on the themes of hollowed
or filled boxes, then the problem of *recognition* of the notdef glyphs
for what they are is a pretty marginal problem.

And as for how to provide users better diagnostics for figuring out the
content of undisplayable text, I suppose the standard could suggest some
implementation guidelines there, but this might be a better area to just
leave up to competing implementation practice until certain user interface
models catch on and get widespread acceptance.

--Ken

Re: Romanized Cyrillic bibliographic data--viable fonts?

2002-08-26 Thread William Overington


J M Craig wrote as follows.

[snipped]

>Any suggestions welcomed! Is there a tool out there that will allow you
>to edit a font to add a couple of missing characters?

You might like to have a look at Softy, which is a shareware font editor for
TrueType fonts.  Softy can be used to produce new TrueType fonts and to edit
existing TrueType fonts.

http://users.iclway.co.uk/l.emmett/

There is some more information about Softy, including the correct email
address for registrations, at the following page.

http://cgm.cs.mcgill.ca/~luc/editors.html

Having a look for

Softy

and

Softy font

at http://www.yahoo.com might be helpful.

I am trying to obtain a copy of the tutorial by "Grumpy", so far without
success.

I have found the other tutorial and it is very useful.

I have had lots of fun with the Softy program and although I have not tried
to implement the U+FE20 and U+FE21 which you mention, I have tried various
experiments using Softy and have found it a very satisfactory package to
use.

Softy is shareware, so perhaps you might think it worth a try to find out if
it will help you do what you want to achieve.

Also, you might like to have a look at the SC UniPad program which I
mentioned earlier today in another thread.  When I was studying your posting
I used SC UniPad to have a look at the various Cyrillic characters which you
mentioned.  As far as I can tell at present SC UniPad does not position the
U+FE20 and U+FE21 characters as you might want them to appear, yet SC UniPad
would seem like a good way to key in the text, ready to copy and paste it
into another program which would be used to display the thus keyed text
using a font of your choice.

William Overington

26 August 2002

Re: Romanized Cyrillic bibliographic data--viable fonts?

2002-08-26 Thread Michael Everson


At 07:27 -0600 2002-08-26, J M Craig wrote:

>Any suggestions welcomed! Is there a tool out there that will allow 
>you to edit a font to add a couple of missing characters?

The choices are, in general, buying font programs or hiring someone 
to modify your font for you.

Having said that, it would be nice if the major OSes had better 
support for Latin than they do. :-)
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com

Re: SC UniPad 0.99 released.

2002-08-26 Thread Doug Ewell

William Overington 
wrote:

> A particularly interesting new feature is that one may hold down the
> Control key and press the Q key and a small dialogue box appears
> within which one may enter the hexadecimal code for any Unicode
> character.  Upon pressing the Enter key, that character is entered
> into the document.  SC UniPad contains its own font.

In a thread two weeks ago about Alt+NumPad sequences, I did mention that
SC UniPad 0.99 would include this Ctrl+Q feature.  It's a very handy
device; my biggest obstacle so far, in fact, is simply *remembering that
it's there* and using it, instead of opening Character Map and clicking
on the character, which is what I had to do before (and which is still
useful if I needed to browse CM to find the character in the first
place).

> Please note in particular the buttons in a column down the left hand
> side of the display.  These alter the way in which some code points
> are indicated in the display.  For example, if one clicks on the
> button labelled FMT (which controls Character Rendering: Formatting
> Characters)and selects Picture Glyph, then entry of U+200D into the
> text document shows a box with the letters ZWJ in it.

And best of all, you can set these rendering options independently for
space characters, ASCII controls, other formatting characters (a broad
category), characters unsupported in the UniPad font (a dying breed;
only Plane 2 is not supported), unassigned code points, unpaired
surrogates, and private-use characters.  Note that unpaired surrogates
are supported for testing purposes, but aren't really a good thing to
have lying around.  Also note that your choices for private-use
characters are a generic picture glyph or a rectangle containing the USV
in hex -- sorry, you can't install your own PUA font.

ALSO, note that the hex-value display option for unassigned code points
provides a neat solution to Martin Kochanski's earlier question about
.notdef glyphs (and the ensuing discussion where Carl Brown and others
suggested 2×2, 2×3, or 3×2 blocks of hex digits).

BTW, the View toolbar doesn't have to run down the left side.  It's
there by default, but you can dock it elsewhere or let it float as a
separate window.  I have the Convert toolbar on the left side and View
on the right because I use Convert more often.

> I first learned of the existence of the UniPad program in a response
> to a question which I asked in this forum, so I am posting this note
> so that any end users of the Unicode system who are at present unaware
> of the existence of the UniPad program might know of the opportunity
> to have a look at it if they so choose.
>
> The web site has a facility to request email notification of
> developments to SC UniPad.  It was by a such requested email
> notification that I became aware of the availability of SC UniPad
> 0.99.

I have asked the main developer of UniPad to post regular update notices
on this list, and he says he will do so shortly, when he can put
together a more thorough list of the new features in 0.99.  Trust me,
there are a LOT.  ☺

-Doug Ewell
 Fullerton, California

Re: Romanized Cyrillic bibliographic data--viable fonts?

2002-08-26 Thread Frank da Cruz


> Gory details:
> ...
> The specified Romanization for each of these Cyrillic characters 
> includes a ligature over the top of the two Latin code points in 
> question (to indicate that the Latin characters represent a single 
> Cyrillic character presumably).
>
If you can use horizontal bars over the characters rather than than
the half-ligature marks, this seems to be supported by most fonts:

  http://www.columbia.edu/kermit/st-erkenwald.html

- Frank

Re: GX Technology

2002-08-26 Thread John Jenkins


On Sunday, August 25, 2002, at 10:12 PM, K S Rohilla wrote:

Hi Everybody
I am Working On Open type Font Technology. Pl. tell me any one GX Technology.
 


Well, outside of the fact that what you want to ask about is called Apple Advanced Typography now (AAT), what is it you need to know?  Have you checked Apple's typography site, ?

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Romanized Cyrillic bibliographic data--viable fonts?

2002-08-26 Thread J M Craig


Anyone at all familiar with bibliographical data (the MARC standards) 
knows that they can be a real pain to deal with. In this case, the 
difficulty isn't with the MARC data itself, but with the Library of 
Congress's Romanization standards and the lack of support for combining 
half marks in available fonts. I'm trying to help a client properly 
display Romanized Cyrillic from MARC data on a Unicode-enabled 
application. The ultimate problem is, I can't find an available font 
that properly supports the combining half marks FE20 and FE21.

Alan Wood lists these two on his page of fonts by ranges (a truly 
impressive collection of info, BTW, Mr. Wood):

Arial Unicode MS
   Apparently you can only get this with MS Office or Publisher these 
days--not a good solution for my client since their budget's very 
limited and they'd need it on a bunch of workstations. The most 
important issue from a technical point of view is that the marks may not 
properly combine and I don't have a copy of the font to test it myself. 
Does anyone know if these marks will properly combine with T, t, S, s, 
I, i, A, a, & U, u when using the MS font?

Naqsh
   A cursive font (not practical) and the marks don't appear to combine 
properly in any case.

Any suggestions welcomed! Is there a tool out there that will allow you 
to edit a font to add a couple of missing characters?

(A more extensive explanation of the problem follows for those who want 
the gory details.)

John Craig
Alpha-G Consulting, LLC

Gory details:
The bibliographical data in question follows the Library of Congress 
Romanization rules (see this link):

http://lcweb.loc.gov/catdir/cpso/romanization/russian.pdf

An effective conversion to Unicode for the specified Romanizations of 
these Cyrillic characters is proving elusive:

/ts/
Unicode 0426 (capital) & 0446 (lower case)
/yu/
Unicode 042E & 044E
/ya/
Unicode 042F & 044F

The specified Romanization for each of these Cyrillic characters 
includes a ligature over the top of the two Latin code points in 
question (to indicate that the Latin characters represent a single 
Cyrillic character presumably). Now, the proper Unicode sequence for 
what the Library of Congress wants (based on their own documentation of 
the correspondances between the MARC ANSEL character set and Unicode) 
requires the use of the combining half marks left-half ligature U + FE20 
and right-half ligature U + FE21:

/ts/
Unicode 0078 FE20 0077 FE21
   
/yu/
Unicode 0069 FE20 0075 FE21
   
/ya/
Unicode 0069 FE20 0061 FE21
   

All very well, but the application can't paint it because of the lack of 
the combining half marks in the available fonts.

Re: Recent changes to i18n standards

2002-08-26 Thread James E. Agenbroad


On Fri, 23 Aug 2002 [EMAIL PROTECTED] wrote:

> On 08/23/2002 04:54:58 AM "Doug Ewell" wrote:
> 
> >For those who like to keep up on such things, there have been recent
> >changes to the code lists of two important standards related to
> >internationalization -- ISO 639 (language codes) and ISO 3166-2 (codes
> >for country subdivisions).
> 
> In addition to the two new code elements in ISO 639-2, there's another 
> development of interest in relation to language coding: ISO/TC 37 has 
> begun working toward development of a new part to this standard, to be 
> designated ISO 639-3, that will provide 3-letter identifiers for all known 
> languages. The relationship to part 2 will be that this the 
> individual-language code elements in part 2 will be a subset of part 3 
> (part 2 will continue to have collective-language identifiers but part 3 will 
> not). The reason for the subsetting relationship of part 2 to part 3 
> (rather than just adding a bunch of things to part 2) is that some user 
> communities (e.g. bibliographers) have indicated a need to restrict 
> individual-language identifiers to only developed languages with 
> significant bodies of literature. I'm anticipating a time frame of about 
> one year for this to be completed (assuming the process goes smoothly).
> 
> 
> 
> - Peter
> 
> 
> ---
> Peter Constable
> 
> Non-Roman Script Initiative, SIL International
> 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
> Tel: +1 972 708 7485
> E-mail: <[EMAIL PROTECTED]>
> 
Monday, August 26, 2002
Peter, 
 I congratulate you and others who reached this reasonable solution.
 Regards,
  Jim Agenbroad ( [EMAIL PROTECTED] )
 "It is not true that people stop pursuing their dreams because they
grow old, they grow old because they stop pursuing their dreams." Adapted
from a letter by Gabriel Garcia Marquez.
 The above are purely personal opinions, not necessarily the official
views of any government or any agency of any.
 Addresses: Office: Phone: 202 707-9612; Fax: 202 707-0955; US
mail: I.T.S. Sys.Dev.Gp.4, Library of Congress, 101 Independence Ave. SE, 
Washington, D.C. 20540-9334 U.S.A.
Home: Phone: 301 946-7326; US mail: Box 291, Garrett Park, MD 20896.

SC UniPad 0.99 released.

2002-08-26 Thread William Overington


As an end user of Unicode I was interested to learn recently that the latest
version of SC UniPad, a Unicode plain text editor for various PCs, has been
released.

This latest version is SC UniPad 0.99 and is available for free download
from the following address on the web.

http://www.unipad.org

A particularly interesting new feature is that one may hold down the Control
key and press the Q key and a small dialogue box appears within which one
may enter the hexadecimal code for any Unicode character.  Upon pressing the
Enter key, that character is entered into the document.  SC UniPad contains
its own font.

Please note in particular the buttons in a column down the left hand side of
the display.  These alter the way in which some code points are indicated in
the display.  For example, if one clicks on the button labelled FMT (which
controls Character Rendering: Formatting Characters)and selects Picture
Glyph, then entry of U+200D into the text document shows a box with the
letters ZWJ in it.

I first learned of the existence of the UniPad program in a response to a
question which I asked in this forum, so I am posting this note so that any
end users of the Unicode system who are at present unaware of the existence
of the UniPad program might know of the opportunity to have a look at it if
they so choose.

The web site has a facility to request email notification of developments to
SC UniPad.  It was by a such requested email notification that I became
aware of the availability of SC UniPad 0.99.

William Overington

26 August 2002

Re: SC UniPad 0.99 released.

Re: The Unicode Technical Committee meeting in Redmond, Washington State, USA.

Re: Revised proposal for "Missing character" glyph

RE: Revised proposal for "Missing character" glyph

Re: SC UniPad 0.99 released.

Re: Revised proposal for "Missing character" glyph

Re: Romanized Cyrillic bibliographic data--viable fonts?

Re: Romanized Cyrillic bibliographic data--viable fonts?

Re: Romanized Cyrillic bibliographic data--viable fonts?

Re: Romanized Cyrillic bibliographic data--viable fonts?

Re: The Unicode Technical Committee meeting in Redmond, Washington State, USA.

RE: Revised proposal for "Missing character" glyph

Re: Revised proposal for "Missing character" glyph

Re: Romanized Cyrillic bibliographic data--viable fonts?

Re: Romanized Cyrillic bibliographic data--viable fonts?

Re: SC UniPad 0.99 released.

Re: Romanized Cyrillic bibliographic data--viable fonts?

Re: GX Technology

Romanized Cyrillic bibliographic data--viable fonts?

Re: Recent changes to i18n standards

SC UniPad 0.99 released.

21 matches

Site Navigation

Mail list logo

Footer information