Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread jameskass

Peter Kirk wrote,

 The solution may be a catch-all, but the problem is a real one. Dr 
 Kaufman's response makes it clear that to professionals in the field 
 Everson's proposal is not just questionable but ridiculous. There is 
 certainly some PR work to be done in this area, not name-calling.

Does Dr. Kaufman speak for all professionals in the field, or would
it be fair to say that Dr. Kaufman is speaking for only one such
professional?

Best regards,

James Kass 



Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread jameskass
Peter Constable wrote,

  I'm sure even Youtie would go for this.

Except that she's too busy writing new lyrics for Janis Joplin tunes.

Ernest Cline wrote,

 ... This indicates to me that variation
 sequences are a potential solution that should be considered,
 even if it ends up being rejected in favor of disunification.

In order for Phoenician to be disunified from Hebrew, it must
first have been unified with Hebrew.  This is not the case.

(If anyone can cite from TUS any passage recommending that Phoenician
text should be encoded using Hebrew characters, I'll stand corrected.)

Variation sequences could be very helpful to distinguish variants in
plain text.  But, if every character in an entire text needs to have a
corresponding variant selector in order for the text to render as
expected, then that's a strong argument in favor of a separate encoding.

Variation sequences could be used to distinguish glyph variants between
Phoenician and neo-Punic, though, or even between neo-Punic and neo-Punic.
If members of any discipline need such granularity in plain text, say
epigraphers or numismatists, then they'll float a proposal and the
proposal can be judged on its merits.

Somebody  You should use graphics for such distinctions.

Graphics aren't part of plain text.

Somebody  Well then, you should just use mark-up.

Neither is mark-up.

Best regards,

James Kass



Re: Multiple Directions (was: Re: Coptic/Greek (Re: Phoenician))

2004-05-17 Thread jameskass

Philippe Verdy wrote,

 How can I get so much difference in Internet Explorer when rendering Ogham
 vertically (look at the trucated horizontal strokes), and is the absence of
 ligatures in Mongolian caused by lack of support of Internet Explorer or the
 version of the Code2000 font that I use (I though I had the latest version)?

The Ogham text shown in the graphic you attached is not from Code2000.

Apparently, your browser is substituting Ogham glyphs from another font.

The Mongolian positional variants which ligate well are not yet supported
by released versions of Uniscribe (USP10.DLL), as far as I know.

Best regards,

James Kass




Re: ISO-15924 script nodes and UAX#24 script IDs

2004-05-17 Thread jameskass

Philippe Verdy wrote,

 140;Mnda;Mandaean;mand饮  //Is it same as Mende Kikakui Syllabic?

Here's a good scan of the Mandaean alphabet:
http://essenes.net/Nabc.htm

It's not the same as Mende.

Best regards,

James Kass




RE: Archaic-Greek/Palaeo-Hebrew (was, interleaved ordering; was, Phoenician)

2004-05-15 Thread jameskass

Jony Rosenne wrote,

 There is another option - to postpone the decision. If the question is
 controversial, and consent impossible to achieve, this is often the best
 choice.

If it is impossible to achieve a consensus, it's disingenous to suggest
that a decision be postponed until an agreement is reached.

Rather, if no consent is possible, it's pointless to postpone making
a decision.

Further, when everyone agrees, no decision is required.

Suppose nobody celebrated the Sabbath until all of the World's
religious experts agree on the correct day of the week?

Best regards,

James Kass



Unicode fallback font

2004-05-14 Thread jameskass

Around August of 2002 there was a discussion on this list about
the possibility of having some kind of Unicode fall-back font 
which would have glyphs to display the hex code of any character.

Bob Hallissy has just released such a font for the BMP.  The font is 
now on-line at:
http://scripts.sil.org/UnicodeBMPFallbackFont 

Best regards,

James Kass



RE: interleaved ordering (was RE: Phoenician)

2004-05-14 Thread jameskass

Dean A. Snyder wrote,

 The issue is not what we CAN do; the issue is what will we be FORCED to
 do that already happens right now by default in operating systems,
 Google, databases, etc. without any end user fiddling?

That's the question.  

Since search engines like Google survive based on their ability to serve
users' wants and find what users seek, why wouldn't Google make such
a tailoring?  

I don't have any contacts at Google, so don't know who to ask.

But, IMHO Google is one of the best search engines available.  From
observation, they seem to roll with the punches quite well.  They 
seem to be first with multilingual and Unicode-based search
capabilities, multilingual user interfaces, and they even have a beta 
translator which has given many hours of amusement.

(Google interface in Hebrew, http://www.google.com/intl/iw/  )

Plus, they clearly *like* to be avant-garde, even if it takes a little
extra work.  (They also have user interfaces in Klingon and various
other interesting languages.  Although many of their language-based
interfaces transliterate to Latin, one suspects that this is only
because of the lack of widespread system support for many complex
scripts, and that this will change when appropriate.)

If giving Phoenician script and Hebrew script equivalence for searching
purposes means that scholars can use their service to find what they
want, it seems only natural that the good folks at Google would do
the job right.
 
 Obviously for the statistically fewer custom applications we would write
 software.

Although perhaps statistically fewer, it would seem to be just as
obvious that the most useful applications in your work would be
custom out of necessity.

A custom application, for example, would allow the user to set
a font for showing, say, cuneiform glyphs in the private use
area to display custom file names.  But, a default application
might just substitute an inappropriate font willy-nilly.
 
 But it would seem that encoding defaults should mirror script-user defaults.

Would it be fair to say that people who don't use the Phoenician 
script aren't members of its user community?

Best regards,

James Kass



RE: interleaved ordering (was RE: Phoenician)

2004-05-14 Thread jameskass

Dean A. Snyder wrote,

 You only make a response regarding Google; but that is only one of the
 search engines; and it leaves issues with operating systems and database
 engines still unanswered.

http://www.unicode.org/reports/tr10/#Tailoring

The entire report contains much useful information about what
a default collation table should and should not try to do.  There
are also handy examples illustrating that different users will
have differing expectations even in the same script.

Best regards,

James Kass



RE: interleaved ordering (was RE: Phoenician)

2004-05-13 Thread jameskass

Dean A. Snyders asks,

 Why make something we do all the time more difficult and non-standard,
 when what we do now works very well?
 


Please, one thing to remember about default collation is that
it's default.  It's only there when no other instructions exist.

Another thing to remember about collation is that it's best
when tailorable.

Anyone wishing to sort anything will want to impose their
own rules on the sort, and anyone who has done this in the
past has already worked out a method for such imposition.

If you're making a library database, do you want 1984 to
sort under the digit 1, would you prefer that it be sorted
under O for one, or would it be better if it sorted under
N for nineteen?  If the database is for biblios rather than
books, you might prefer that the book title be sorted under
M.  

If someone keys in nineteen eighty four to a search box,
and you want them to be able to find 1984 in your database,
you will program for it.

If you want Richard III to match with Richard the third,
a bit of extra work is required.

If it's your purpose to set up a Hebrew script/Hebrew language
database of Hebrew inscriptions, and the original script used
in the inscription is irrelevant for your purposes, and you are
importing data from multiple sources who may use alternate
encodings, you will 'normalize' the data upon import.  In this
case 'normalize' would include converting the character set
if necessary, transliterating/transcribing to Hebrew characters
if necessary, stripping off points if they're present and not
wanted, and so on.

If you're importing data into a DSS Unicode database, and your
source is using Web Hebrew or another ASCII-masquerade, then
you're already performing normalization.

If you're importing data originally entered in visual order rather
than logical order, you're already normalizing.

If your database includes a field to indicate the original script,
here presuming that the original script is of some interest, and
you want to export something, you'll either export it as Hebrew
text, or you'll 'normalize' it back into the original script on export.

Either way, it's about as hard to program for as allowing for
differences in case, like TROLL vs. troll.  And, in either case,
it should be done by the tools and trivial to the users, although
any application which doesn't allow the user to set preferences
and make rules in such an instance is next to worthless.

Best regards,

James Kass




RE: Phoenician

2004-05-09 Thread jameskass

Peter Constable wrote,

 Of things already in
 Unicode, what have been boundary cases between unificiation and
 de-unification?

Canadian Aboriginal Syllabics?  Old Italic?

Best regards,

James Kass



Re: Phoenician

2004-05-09 Thread jameskass

The author of the web site A Bequest Unearthed, Phoenicia
( http://phoenicia.org )
has kindly given permission for his response to a request for comments 
on the Phoenician proposal to be forwarded to Unicode's public list.

Best regards,

James Kass, 
forwarded message follows...

Hello James,

Thank you for visiting A Bequest Unearthed, Phoenicia and for taking the
time to write such a kind yet very important message.

I am indebted to you for having alerted me to this bit of information.  I
was aware the the proposal was underway though I had never had a chance to
read it.  Further, I was unaware of the attempt to smother Phoenician script
by not allowing it to have its unique and separate Unicode identity.

No one can deny that the modern Hebrew script is very useful in dealing
with Phoenician script in the computer world.  However, Hebrew is not the
only medium script-wise which can be useful for Phoenician, in fact, Aramaic
script as well as its Syriac branch are useful too.  Many scholar find
western Aramaic to be relatively modern Phoenician.  Further, as far as I am
concerned, I find it much easier for me to read Phoenician using the
Phoenician script than to read it using Hebrew.  I cannot recognize all the
Hebrew characters while I can easily see Latin characters in the Phoenician
alphabet.  

With due respect to Hebrew, I believe that it must not substitute Phoenician
in the computer medium.  Phoenician Canaanite is separate, unique and
independent of any language, despite its similarities with many ancient
languages of the Middle East.

I believe one of the strongest points made in the proposal is this:
 Phoenician is quintessentially illustrative of the historical problem of where
 to draw lines in an evolutionary tree of continuously changing scripts in use
 over thousands of years. The twenty-two letters in the Phoenician block may be
 used, with appropriate font changes, to express Punic, Neo-Punic, Phoenician
 proper, Late Phoenician cursive, Phoenician papyrus, Siloam Hebrew,  Hebrew
 seals, Ammonite, Moabite, and Palaeo-Hebrew. The historical cut that has been
 made here considers the line from Phoenician to Punic to represent a single
 continuous branch of script evolution.

The objection and use of Hebrew instead of the Phoenician script reminds of
the problem Champolion was faced with when he was trying to decipher
Egyptian Hieroglyphics.  He had access to the Coptic language which is the
closest to ancient Egyptian.  However, at some point in time, Coptic books
were not anymore written in Egyptian Hieroglyphics but in Greek; therefore,
Egyptian was forgotten as a written medium.

Refusing to encode Phoenician and using Hebrew is an intellectual crime
against the Phoenician heritage and history which I very strongly condemn.

I have already planned and started to contact my colleagues in the Aramaic,
Coptic and Syriac computer community to lobby their support in approving the
unicoding of the Phoenician script.

Regretfully, I am not experienced or seasoned in the machination of lobbying
support among scholars of this field but I will do my best so to do, thanks
to you.

My site, a labor of love for preserving and disseminating information about
my heritage, is continuously growing with new materials as time permits.

Kind regards,
Salim* George Khalaf, Byzantine Phoenician Descendent
* perhaps from Shalim, Phoenician god of dusk
A Bequest Unearthed, Phoenicia ? Encyclopedia Phoeniciana
http://phoenicia.org
Center for Phoenician Studies
Chapel Hill, NC
USA

 Greetings,
 
 Your wonderful web site is keeping me on-line!  Thank you so much
 for making all of this information available on the World wide web.
 
 There's currently a proposal before ISO/Unicode to encode the
 ancient Phoenician script so that it can have a unique range in
 the World's standard for the computer encoding of text.
 
 Interested scholars and users are invited to review this proposal
 and comment upon its merits.
 
 Objections have been raised to this proposal by some scholars that
 the ancient Phoenician writings should be encoded on computers
 using the modern Hebrew script range, and that Phoenician writing
 doesn't need to have its own computer encoding range because there
 is no need to be able to distinguish between modern Hebrew writing
 and ancient Phoenician writing in computer plain text.
 
 There has been a lively discussion about this on the Unicode public
 mailing list recently.  The author of the proposal has said that the
 proposal will be revised.  This is why it is important that scholars and
 other users voice their opinions and why I am writing you.  If you
 have any opinions about this and would like to respond, your response
 would be most welcome and would be forwarded to the responsible
 people.  If you know of anyone interested who would like to
 offer an opinion, please feel free to forward this message along.
 
 The current proposal is on-line in PDF format at:
 

(OT) Sailing Greeks (was Re: New contribution)

2004-05-08 Thread jameskass

Dean Snyder wrote,

 2 Greeks are better sailors.

Evidence supporting this can be seen here:

http://www.greekshops.com/images/ChildrensVideoDVD/popayvideo.jpg
 
 It was a troll.

And a good one!

Best regards,

James Kass



Re: Phoenician

2004-05-08 Thread jameskass

Elaine Keown wrote,

  Hardly.  If the rest of you hadn't agreed with his
  judgments most of the time, the Roadmap might look 
  quite different.  It's more like Potter
  Stewart on pornography.
 
 Who's Potter Stewart?  (I don't own a TV).Elaine

Potter Stewart doesn't get on TV much these days.

A while ago, when asked to define pornography (or, possibly it
was obscenity?) his response was something like, 'I can't define
it, but I know it when I see it'.

So, his expert supporters could conclude from this that Potter
Stewart was a just and righteous person who spoke the truth
with conviction.

Experts from the opposition, however, could infer that Potter
Stewart must've seen a lot of pornography in order to be such
an expert on distinguishing it.

The above merely to illustrate that experts in any persuasion
seldom agree on everything; if they did -- they couldn't be
contentious.

Best regards,

James Kass




Re: Nice to join this forum....

2004-05-07 Thread jameskass

James Kass wrote,

 Enter the marks above (tone marks) first, then enter marks below.  

My error.  Enter either the marks below first or the marks above
first.  It's equivalent and the display is supposed to be the same
either way.  There was a problem with the font here...

The inside out rule on page 125 (TUS 4.0) shows above marks
coming before below marks in Figure 5-7.  Canonical ordering
(TUS 4.0, p. 84) would reverse this. 

Best regards,

James Kass




Re: Philippe's Management of Microsoft (was: Re: Yoruba Keyboard)

2004-05-07 Thread jameskass

Raymond Mercier wrote,

 Isn't it the other way round ?
 I attach a file with three characters all in UTF8, representing CJK(A), CJK
 and CJK(B). The CJK(A) displays in IE6 only if span lang=ZH.../span is
 included, but it *does* handle the CJK(B) without any reference to lang.
 
 In Mozilla all three display without the lang=ZH

Well, I tested here before writing you privately.  I've never been able to
get IE6 to show non-BMP text encoded as UTF-8. And, I've never had a problem
getting IE6 to show CJK-A in UTF-8.

I attach a file with two lines of CJK characters.  The first line is CJK-A,
the second line is CJK-B.  It's just a simple test file.  Here, the first
line displays just fine, the second doesn't.  The second line won't display
even with a FONT FACE inserted.

Also attached is a small gif showing the HTML source as it appears in
NotePad.  So, some Windows apps *can* display CJK-B in UTF-8, but, AFAICT,
IE6 can not.

 Of course to see the CJK(B) you need the font Simsun (Founder Extended).

I don't have the Founder Extended SimSun font, though.

Best regards,

James Kass



㓀㓁㓂㓃㓄㓅㓆㓇㓈㓉㓊㓋㓌㓍㓎㓏

‘’“”•–—˜™š›œžŸ

cjka_b.gif

Re: CJK(B) and IE6 (was Re: Philippe's Management...)

2004-05-07 Thread jameskass

(Many thanks to Raymond Mercier who has helped me resolve the
display problem here with CJK-B, UTF-8, and MSIE6.)

I just got the UTF-8 CJK-B in my test page to display in IE6.

Here's how.  The registry setting for Windows XP allows for a default
font for the BMP, a different font for Plane One, a different font for
Plane Two, and so forth.

NotePad (and, presumably other Windows apps) use this registry
setting for font switching in plain text.  The browser does not
seem to use this particular registry setting.

The registry settings for Internet Explorer only allow for one
font for surrogates.  Naturally, I had that setting as Code2001,
which only tries to cover Plane One. 

So, what I did was add the CJK-B font as the IEFixedFontName value
in the appropriate registry setting, then put PRE tags around the 
CJK-B text.  Voila!

(Of course, I could have made the CJK-B font be the IEProp... font 
name, then it would work without the PRE tags, but I want to have 
the 'best of both worlds', so kept Code2001 as the proportional 
surrogate font.)

So, IE6 *will* display Plane Two material in UTF-8.
IE6 will not display Plane One material in UTF-8.  That's the bug.

(NotePad does display both Plane One and Plane Two UTF-8 text.)

(If you're on Windows, and want to tweak your registry for supplementary
character support, this page by Tex Texin will help...
http://www.i18nguy.com/surrogates.html  )

Best regards,

James Kass



Re: Arid Canaanite Wasteland

2004-05-05 Thread jameskass

Peter Kirk wrote,

 That might help, but living users are better than ones long dead.

If you ask us to dig up members of a dead script's user community,
it shouldn't surprise if we use a shovel.

Best regards,

James Kass




Re: New contribution

2004-05-05 Thread jameskass

- Original Message - 
From: D. Starner [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, May 03, 2004 9:37 PM
Subject: Re: New contribution


  A possible question to ask which is blatantly leading would be:
  
   Would you have any objections if your bibliographic database
   application suddenly began displaying all of your Hebrew
   book titles using the palaeo-Hebrew script rather than
   the modern Hebrew script and the only way to correct
   the problem would be to procure and install a new font?
 
 Again, change Hebrew to Latin and palaeo-Hebrew to Fraktur and see 
 how many objections you get. Again, no, you can't use archaic forms
 of letters in many situations, but that doesn't mean they aren't
 unified with the modern forms of letters. No one would have procure
 and install a new font, because Arial/Helevica/FreeSans/misc-fixed
 have the modern form of Hebrew and will always have the modern form
 of Hebrew and all other scripts that have a modern form.
 
 I mean, maybe you're right and Phonecian has glyph forms too far from
 Hebrew's to be useful, and it's connected with Syriac and Greek as
 much as Hebrew, but this argument just doesn't fly.

It was only a contrived example of a leading question devised to
elicit a pre-determined specific response and was intended to
be mildly funny.  It was offered in response to a question proposed
by John Hudson, which, although not exactly leading, I considered
unfair.

Yes, it's pretty far-fetched.  But, your response supposes that
bibliographic databases are always displayed in a fixed-width font.
I have a bibliographic database which can display UTF-8 material
in a proportional font.  It works by exporting a record (or, group
of records) in HTML format as a separate file and firing up the 
browser with this on-the-fly page loaded.  Since the database 
application is stone-age, it has no awareness of anything as exotic 
as character sets.  So, in order to edit these UTF-8 records, a record 
is exported in plain text format and my application fires up BabelPad, 
then re-imports to the database from the altered text file.  This is a 
poor man's Unicode enabled multilingual database.  Yeah, it's
kludgey, but it sure does work!

Suppose that,

1)  Phoenician is unified with Hebrew.

2)  A user has a bibliographic database which uses FreeSans.

3)  The FreeSans developer is a Phoenician script enthusiast
who removes the Hebrew glyphs from the font and
replaces them with Phoenician glyphs.

4)  The user updates FreeSans on the system and fails to make
a back-up copy of the font.

5)  Meanwhile, the FreeSans developer has pulled all of the
previous editions of FreeSans off the internet...

Hey, it *could* happen!  (Yeah, and pigs could learn to fly.) 

Best regards,

James Kass




Re:CJK(B) and IE6

2004-05-04 Thread jameskass

Raymond Mercier wrote,

 BabelPad is great, but it chokes in converting all the UTF8 in unihan.txt to
 NCR at one
 go. I wrote a dedicated program to do that.

Options - Advanced Options - (Edit Options) -
Make sure the box for Enable Undo/Redo is not checked.

Yes, when the commas in UNIHAN.TXT were being globally replaced
with middle dots here, BabelPad stopped responding.  But then,
Andrew wrote to the list with a tip about the undo/redo feature.
(Just in time, I was going to write a dedicated program.)

When making global changes in such a large file,
Options - Advanced Options - (Edit Options) -
Make sure the box for Enable Undo/Redo is not checked.

Best regards,

James Kass





Re: New contribution

2004-05-03 Thread jameskass

Please take a look at the attached screen shot taken from:

www.yahweh.org/publications/sny/sn09Chap.pdf 

If anyone can look at the text in the screen shot and honestly
say that they do not believe that it should be possible to
encode it as plain text, then the solution is obvious:

We'll disagree.

Best regards,

James Kass


tetra.gif

Re: Nice to join this forum....

2004-05-03 Thread jameskass

Dele Olawole wrote,

 That is what I have said that gb is a letter, a single letter and not
 combination of letter. Look at this statement -
 
 Gbogbo awon are GB ti de. - All people from Great Britain have arrived.
 Going further to be a bit funny I can say Great Britain o great britain o
 awon ara Great Britain ti de.

Mo gbó̩ Òyìnbó.  (My e-mailer doesn't tag outgoing messages as UTF-8, so
some people have to manually select UTF-8 encoding in their e-mail display
if they want to see it.)

Unicode considers such combinations of letters to be presentation forms
of letters which are already covered in the Unicode Standard.  Although
for the Yoruba language, the gb digraph is treated as a single letter,
for computer encoding it is a string of two characters, g plus b.

 I do not know what you were trying to say concerning the letter g - What
 about gangan, ganganran, gongo, gogongo, gudugudu and etc Since I do not
 know what you were trying to say, I will stop there.

Philippe Verdy had commented on putting a mark under the letter g, and
I only said that Yoruba doesn't use any marks with the letter g.

 I chose the 3rd options and that makes Ariya the best Yoruba fonts available
 today.

It is exciting to know that you are making good fonts for Yoruba!
Do you have any examples on-line?

Best regards,

James Kass




Re: Nice to join this forum....

2004-05-03 Thread jameskass

Dele Olawole wrote,

 Here are few Yoruba alphabets which might not be new to you, so how can you
 equate G+B with GB even if you claimed it has significant. How significant
 is significant?
 
 A B D E E F G GB

Please take a moment to visit this page:
http://www.unicode.org/standard/where/

Notice that the ch digraph as used in Slovak (and Spanish)
is simply encoded as U+0063 plus U+0068.

For more details on characters versus glyphs,
www.unicode.org/versions/Unicode4.0.0/ch02.pdf 

 and

http://www.unicode.org/reports/tr17/#Characters vs. Glyphs

Best regards,

James Kass




Re: Nice to join this forum....

2004-05-03 Thread jameskass

Asmus Freytag wrote,

 This is only true if:
 
 a) there is no visual differentiation

There is no visual differentiation in any of the examples I've ever seen.

 I would like to see a (small) picture of Yoruba text with these digraphs.

I sent a small picture off-list taken from this on-line PDF:
http://www.learnyoruba.com/ORTHOGRAPHY_1.pdf

Wondering about casing, if the gb diagraph appears initially, I have
a booklet for learning Yoruba which includes the proper name of the 
Rt. Rev. Isaac Gbekeleoluwa Abiodun Jadesimi in the bilingual dedication.  
In both the Yoruba and English versions of the dedication, only the 
letter G in Gbekeleoluwa is in upper case.

Best regards,

James Kass




Re: New contribution

2004-05-03 Thread jameskass

John Hudson wrote,

 Again, I'm not opposing the encoding of 'Phoenician' on principle, but I do 
 think it is 
 more complex than Michael's proposal presumes, and that more consultation with 
 potential 
 users is desirable. I think one of the questions asked should be, frankly:
 
   Do you have any objections to encoding text in
   the Phoenician / Old Canaanite letters using
   existing 'Hebrew' characters? If so, what are
   these objections?


That question misses being a 'leading question' slightly.  The easiest
answer for the respondent is No, as then no further explanation on
respondent's part is necessary.  Furthermore, if we are to believe
the allegations about these users, they are already performing this
reprehensible practice, and so have apparently surmounted any
objections they might have once held.

A possible question to ask which is blatantly leading would be:

  Would you have any objections if your bibliographic database
  application suddenly began displaying all of your Hebrew
  book titles using the palaeo-Hebrew script rather than
  the modern Hebrew script and the only way to correct
  the problem would be to procure and install a new font?

A fairer question to ask might be:

 Would you have any objections if the Phoenician script were given
 a separate encoding in the Unicode Standard as long as such an 
 encoding wouldn't interfere with your ability to continue
 encoding texts as you please?

(And to the last, I'd be tempted to add:  If so, what on Earth could those
objections be?)

Best regards,

James Kass




Re: New contribution

2004-05-03 Thread jameskass

John Hudson wrote,

 That said, I am very glad that Ms Anderson's further questions 
 encourage users to review the Phoenician proposal and to comment 
 on its merits.
 

Encouraging users to review the proposal and comment on its merits 
strikes me as a fairer approach than the questions you and I have 
constructed.

Best regards,

James Kass





Re: New contribution

2004-05-03 Thread jameskass

John Cowan wrote,

  (And to the last, I'd be tempted to add:  If so, what on Earth could those
  objections be?)
 
 Expense.  Complication.  Delays while the encoding gets into the Standard
 and thence into popular operating systems, with all the accoutrements
 such as keyboard software.

Those objections are quite generic and could be made just as well
for N'ko, Ol Cemet', Egyptian Hieroglyphics, c.  

While those objections might be voiced by actual users, none of
those objections should impact the decision making process.

Best regards,

James Kass




Re: Nice to join this forum....

2004-05-03 Thread jameskass

Philippe Verdy wrote,

 From: D. Starner [EMAIL PROTECTED]
  Unicode will not allocate any more codes for characters that can be made
  precomposed, as it would disrupt normalization.
 
 But what about characters that may theorically be composed with combining
 sequences, but almost always fail to be represented successfully?

Likewise.

 If such ligature has a distinct semantic from a ligature created by ligaturing
 separate letters for presentation purpose, the character is not a ligature (the
 AE and OE ligated glyphs are distinct abstract characters) .

The gb combination mentioned in the original post is considered a letter
in the Yoruba alphabet.  It is not a ligature, it is a digraph.  Likewise,
in the Spanish alphabet, the ll combination is considered a letter.  It
is also a digraph.  Both of these combinations are already handled by ASCII.

(Note that the AE and OE ligated glyphs *are* ligatures.)

 The case of dot below however should be handled in fonts by proper glyph
 positioning and probably not by new assigned codepoints, unless this is only one
 possible presentation form for an actual distinct abstract character that may
 have other forms without this separate diacritic (for example if g with dot
 below was only one presentation for an abstract character that may be also
 renderd with a small gamma)

Yoruba doesn't use any marks with the letter g.  It does use some diacritics
like acute, grave, and macron to indicate tones.  It also uses a mark below
the letters e, o, and s which alter the pronunciation of those letters.
This is where there remains some controversy.  One faction prefers the use
of a vertical line below which should attach to the base letter, and the 
other faction prefers to use the dot below.

Best regards,

James Kass




Re: Nice to join this forum....

2004-05-03 Thread jameskass

Dele Olawole wrote,

Ẹ ́ the accent is at the edge of the E with dot below - It is the same no
matter which font is used
On this Ọ̀ it almost fell off
éẹ́èẹ̀ - On all these ones they are not on the same level

One reason that it displays badly is because it is encoded wrong.

In the first example, you have E plus dot below plus space plus 
combining acute.  This should be E plus combining acute + 
dot below.

Likewise, the encoding is wrong for the other examples.

Ẹ́  or, more properly (depending upon point of view) É̩

(Both of these display perfectly well here.  If they do not display well
there, then, assuming that the UTF-8 text survived transmission, either
your system lacks a proper font, or your system does not support complex
script shaping for the Latin script.  In either case, this is beyond
the scope of Unicode and is considered a display issue.)

Enter the marks above (tone marks) first, then enter marks below.  Don't use
spaces between base letters and marks, that breaks the complex shaping.

For display issue problems, please see:
http://www.unicode.org/help/display_problems.html

Best regards,

James Kass



Re: Arid Canaanite Wasteland

2004-05-02 Thread jameskass

D. Starner wrote,

 
 And there are sites that consider Gaelic and Fraktur seperate scripts, 
 including one by Michael Everson. Even if we assume knowledge and competence,
 we still can't assume they're using the same definition for a seperate script
 as Unicode does.

I agree with the second statement above, but would like to see the
link to the Everson page(s) mentioned.  Sure, there are people who
consider Roman and Italic to be separate scripts, too.  When someone
requests evidence of how users treat something, we just try to
find that evidence and factor it in accordingly.

 
  Imagine going back in time ten years or so and approaching the
  user community with the concept of a double-byte character
  encoding system which could be used to store and transfer
  electronic data in a standard fashion.  If they'd responded to
  this notion by indicating that their needs were already being
  well-served by web-Hebrew, would the Unicode project have
  been scrapped?
 
 Yes. How many millions of dollars have gone into defining and implementing 
 Unicode? Do you honestly think that Microsoft and IBM and Apple would
 have spent all the money they have if their users were well-served by
 what you call web-Hebrew?


I don't think that the users were well-served by what is called
web Hebrew and never said I did.  Web Hebrew is a standard
which involves what we now call the masquerading of Hebrew
characters as upper-ASCII.

Web Hebrew AD and Web Hebrew Monospace are the names
of TrueType fonts.   Other fonts use the same masquerade, thus
it was an ad-hoc standard.

http://www.brijnet.org/ivrit/webheb.htm
http://www.stanford.edu/~nadav/hebrew.html
http://www.jewfaq.org/alephbet.htm
... and many other pages give info about Web Hebrew.

Quoting from the jewfaq page,

The example of pointed text above uses Snuit's Web Hebrew AD font. 
These Hebrew fonts map to ASCII 224-250, high ASCII characters 
which are not normally available on the keyboard, but this is the 
mapping that most Hebrew websites use. I'm not sure how you use 
those characters on a Mac. In Windows, you can go to ...

 So now if you think that two scripts that are isomorphic and closely related
 should be unified, then you're exerting political pressure?

Since no rational basis for the heated objections to the proposal
seems apparent, political pressure appears to be a likely choice.

Best regards,

James Kass





Re: New contribution

2004-05-02 Thread jameskass

John Hudson wrote,

 This is a silly question, because the whole debate is about that constitutes 
 'properly 
 encoded'. The Mesha Stele can be perfectly easily encoded using existing Hebrew 
 codepoints 
 and displayed in the Phoenician style with appropriate glyphs.
 
 I'm not saying that this is necessarily the best encoding for the Mesha Stele, 
 but I'm 
 certainly not convinced that there is anything improper about it, or that having 
 a 
 separate encoding for those glyphs would be more proper.
 

There's nothing improper about transliteration.  Likewise, the Phoenician 
inscription of Edessa in Macedonia could be easily encoded using existing 
Hebrew code points, even though its language is Greek.

If one wanted to go through the trouble of setting up OpenType tables
accordingly (to point to redundant glyphs mapped with positional variants
to compensate for default shaping behaviour), the Meshe Stele could 
probably be easily encoded using existing Arabic code points, as well.

Best regards,

James Kass




Re: New contribution

2004-05-02 Thread jameskass

John Hudson wrote,

 Again, you are missing the point because you are *assuming* that encoding the 
 Mesha Stele 
 with Unicode Hebrew characters = transliteration, i.e. that there is some other 
 encoding 
 that is more proper or even 'true'. The contra-argument is that the 'Phoenician' 
 script is 
 identical to the Hebrew script, the differences in letterforms being merely 
 glyphic 
 variants. The contra-argument disagrees with your premise that encoding the 
 Mesha Stele 
 with Hebrew characters is transliteration. You can't proceed past that argument 
 simply by 
 restating your premise.

(The ISP here is line-breaking John's text inappropriately.)

The Meshe Stele and the inscription of Edessa were originally written
in the same script.  If encoding the Edessa inscription using the
Hebrew range would be transliteration, then so would the encoding
of the Meshe Stele in the Hebrew range.

If Phoenician is considered a glyphic variation of modern Hebrew, then
it can also be considered a glyphic variation of modern Greek.  Would
it then follow that modern Greek should have been unified with modern
Hebrew?  (Directionality aside.)

If Unicode were about encoding languages rather than scripts, then 
I would see nothing wrong with encoding the Meshe Stele using
modern Hebrew characters and relegating correct display to a
font switch.

Best regards,

James Kass





Re: For Phoenician

2004-05-01 Thread jameskass

Peter Kirk wrote,

 
 This pedagogical usage is not in plain text, or at least plain text 
 usage has not been demonstrated. I think I asked before and didn't 
 receive an answer: should Unicode encode a script whose ONLY 
 demonstrated usage is in alphabet charts? I think the answer is not, 
 because essentially these charts are graphics of glyphs, not text.
 

I wonder where the folks who made those charts got those glyphs?

Best regards,

James Kass
 



Re: New contribution

2004-05-01 Thread jameskass

Peter Kirk wrote,

 This is based on a historically unproven assumption that this script 
 originated with the Phoenicians. I don't think it's even true that the 
 oldest surviving texts in this script are Phoenician.

Would the oldest surviving texts in the Phoenician script be
in a script other than Phoenician?

 The Mesha Stele (otherwise known as the Moabite Stone) is already 
 available in Hebrew script. What is the need for a separate encoding of 
 the same text?

There are probably other transliterations of the text already available,
too, such as Latin.  Wouldn't it be nice to see the inscription displayed
in its original script, properly encoded?

 Yes, this is what I have been talking about, mostly. Sorry to everyone 
 for not making this clear. I take it as self-evident that a Phoenician 
 etc text to be presented (transliterated if you like) with square 
 Hebrew glyphs should be encoded with the Unicode Hebrew characters. What 
 is in dispute is how a text to be presented with Phoenician or Old 
 Canaanite glyphs should be encoded.

If the current proposal isn't derailed and eventually accepted, then
such a text should be encoded with Phoenician characters because texts
should be encoded in the scripts in which they were written unless
transliteration is the goal.

If the current proposal is derailed, then such a text should be encoded
in the PUA.

Best regards,

James Kass




Re: Arid Canaanite Wasteland (was: Re: New contribution)

2004-05-01 Thread jameskass

- Original Message - 
From: Peter Kirk [EMAIL PROTECTED]
To: Kenneth Whistler [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Saturday, May 01, 2004 9:43 AM
Subject: Re: Arid Canaanite Wasteland (was: Re: New contribution)


Peter Kirk wrote,

 Understood. But on the other hand, the lack of a consensus among *any* 
 people that they have a need for an encoding does seem to imply that 
 there is no need for an encoding. I have yet to see ANY EVIDENCE AT ALL 
 that ANYONE AT ALL has a need for this encoding. So I am asking simply 
 that the proposer demonstrates that there is SOME community of users who 
 actually have a need for this encoding, for plain text rather than 
 graphics. I have asked for this over several months. The new proposal 
 not only fails to demonstrate this, it indicates that the proposer has 
 not even attempted to find any such community of users, because he 
 admits to not contacting any user community.
 

Let's find out how some actual users in the user community deal with 
this controversial issue.

Googling for palaeo-Hebrew brings us this...

http://ebionite.org/fonts.htm

... (it's the second or third hit, depending on how you count) web
site all about fonts and how they can be used to render Hebrew
text on our computers.

The Evyoni web site uses the good old symbol font to depict the 
occasional Greek glyph.

Quoting from the page:
We also use a font that uses upper ASCII to show Hebrew in the same 
manner as Web Hebrew fonts (with the same character assignments) 
but with added features. Included in the font is transliteration symbols 
for Hebrew in two schemes to make it backwards compatible with our 
first special font we used on our sites. And instead of using the square 
script used to represent Hebrew today and over the last few milennia, 
we use Palaeo-Hebrew script. Palaeo-Hebrew has been used in the past to 
archaize, that is, to preserve a link to an earlier state of things. That is 
after all, what we are about, so Palaeo is the perfect script for us to use.

(Note that this site considers Palaeo a separate script, this is quite
clear in the paragraph quoted above.)

some flippancy
What a simple solution, using upper-ASCII for non-Latin glyph display.

Why, with that novel approach, we could set up our computers to
handle all kinds of script changes by simply changing the font-in-use
to something different!

Let's clean up our act and get in on this band wagon.  We could start
with so-called Linear-B.  That's just palaeo-Greek, if one prefers
not to refer to the script of the Greeks as Linear, for whatever
reason.  So, we can deprecate the entire Linear-B range and put
notes in the Standard explaining how Linear-B is actually a glyph
or font variant of Greek.  While we're at it, we can do Coptic the
same way, by gosh.

Shoot, if we use that clever upper-ASCII method delineated above,
we can deprecate the Greek range, too.
end flippancy

Their home page has a graphic of Hebrew script surrounding a Menorah,
a graphic showing Latin script with diacritics, and a graphic showing 
good, old palaeo-Hebrew.

Let's move on to another web page,
http://www.fossilizedcustoms.com/critic.html
...where the author has been criticized for his choice of using
palaeo-Hebrew characters and is responding...

Lew:   YHWH Elohim used palaeo-Hebrew to write the Torah in the 
stone tablets, so I stand on my choice of characters with Him.  In fact, 
most of the prophets wrote in the archaic, primary Hebrew;  it was 
only during the Babylonian Captivity that the Yahudim took the 
Babylonian Hebrew characters on -- Belshatstsar needed Daniel to 
read this outlandish and ridiculous script, because the Babylonians 
knew nothing of it.   Mosheh, Abraham, Enoch, Dawid, Shlomoh -- 
these men could not read modern Hebrew; they used that outlandish 
and ridiculous palaeo-Hebrew script.  The Great Scroll of Isaiah 
(YeshaYahu) is a copy of the original, and it is on display in the 
Shrine of the Book Museum in Yerushaliyim -- the Name is preserved 
in its original outlandish and ridiculous palaeo-Hebrew script, while 
the rest of the text is in modern Hebrew.

Another user heard from who apparently regards Phoenician
and Hebrew as different scripts.

Let's move on again to...
www.yahweh.org/publications/sny/sn02Chap.pdf
... this PDF which doesn't need to be downloaded because we can
see all we need in the Google blurb:

 ... In most cases he will come across a notation that the personal 
name Yahweh ( hwhy in palaeo-Hebrew and hwhy in Aramaic script) 
has M ... 

It's obvious that the good people at yahweh.org aren't complying with
the upper-ASCII method for displaying non-Latin text in their PDF;
apparently considering that both palaeo-Hebrew and Aramaic script
can best be encoded with regular ASCII.

Moving on,
http://www.geocities.com/stojangr/transliterating___the___ancient.htm
(Sorry, it's geocities.)  ... here's a page all about the Phoenician inscription
of Edessa in 

Re: New contribution

2004-05-01 Thread jameskass

Simon Montagu wrote,

 This misses the point. The question is whether the oldest surviving texts
 in the Phoenician script were written by Phoenicians. The fact that it's
 called Phoenician script doesn't prove anything about its origin: it may
 be analogous to the term Arabic numbers, which are Indian in origin but
 reached Europe via the Arabs.

It's an interesting point, and I got it.  Since we're all discussing scripts
and script encoding, and since Peter Kirk had written, I don't think it's 
even true that the oldest surviving texts in this script are Phoenician,
without specifying that he meant '...in this script are in the Phoenician
*language*', I was only having a bit of fun with his wording.

While the fact that it's called Phoenician script doesn't prove anything
about its origin, it might be considered indicative of the path through
which the script was borrowed.  

Best regards,

James Kass




Re: CJK(B) and IE6

2004-05-01 Thread jameskass

The lack of support for supplementary characters expressed in UTF-8
in the Internet Explorer is a bug.  As Philippe Verdy mentions, the
Mozilla browser does not have this same bug.  Also it should be 
noted that the Opera browser handles non-BMP UTF-8 just fine.

While working with NCRs may be an ugly nightmare, there are some shortcuts.

The BabelPad editor can easily convert between UTF-8 and NCRs.  Also,
even though Internet Explorer doesn't display the material, it doesn't
destroy the encoded text, either.  It can be copy/pasted from the browser
window into any aware application and retain its content.

The Internet Explorer browser itself can convert between UTF-8 and NCR
encoding forms with the File - Save As command.

The Windows registry settings allow a default font to be specified for
any plane.  I have one font set for Plane One and a different font
set for Plane Two in my registry, and Windows seems to handle this well.
(Except for the UTF-8 bug in Internet Explorer.)

Note also that it is possible to set a font other than the default font
for displaying non-BMP text, just as it's possible to change the font
in an HTML file.  Either with CSS or font-face/family tags.  The registry
settings should only be for default, in other words if the application
or mark-up has not specified another font.

I *think* that Windows 2000 uses Unicode always internally and uses an
internal conversion chart if material is non-Unicode like GB-18030.  As
far as I know, this means that GB-18030 support on Win2000 would be 
limited to Unicode's BMP unless the special registry settings were made.  
But, I could be wrong on this.  Since GB-18030 is important to many, it's
very possible that Microsoft already made allowances for this.

Best regards,

James Kass




Re: Public Review Issues Updated

2004-04-30 Thread jameskass

Kenneth Whistler wrote,

 What nobody seems to have noticed yet is that in that same document,
 Rev. J. Owen Dorsey also used an uppercase turned T (the capital
 letter form of U+0287 LATIN SMALL LETTER TURNED T, which also appears
 in this text). Those turned t's were used in Dorsey's orthography of
 Omaha and Ponca texts. 

Turned upper case T is also used in Fraser script.  (Daniels  Bright,
page 582)

Best regards,

James Kass




Re: New contribution

2004-04-30 Thread jameskass

Dean Snyder wrote,

 1) The script is wrongly called Phoenician - the same script was used
 for Old Phoenician, Old Aramaic, Old Hebrew, Moabite, Ammonite, and
 Edomite. That is why I propose it be named [Old] Canaanite.

The Latin script is used for English, German, Tahitian, Apache, etc..
But it remains the Latin script.  Likewise, Phoenician is Phoenician,
even if other users borrowed it.

Dean Snyder wrote,

 Then why were Chinese, Japanese, and Korean unified? 

They weren't.  There are three distinctive writing systems involved
with CJK.  They share some common ideographs and this is where
some unification has been involved.  In the case of ideographic
unification, one can look at the glyphs involved and clearly observe
the similarity.  This is not so with Phoenician and Hebrew, clearly.

Unifying Phoenician and Hebrew would be akin to unifying
Katakana and Hiragana.  *That* would be silly.

Peter Kirk wrote in response to Chris Fynn's Telugu/Kannada comparison:

 Yes, but two wrongs don't make a right. One past mistake of Unicode, or 
 decision it had to take for compatibility reasons, does not create a 
 precedent.

Treating Telugu and Kannada as distinct scripts was not a mistake.

Peter Kirk wrote,

 Not really. Acceptance of the proposal would create an expectation that 
 Phoenician texts should be encoded with the new Phoenician characters, 
 and so that existing practices are wrong and should be changed.

Not necessarily.  The existence of a Cyrillic range doesn't preclude
Latin script users from writing Trotsky.

 ...That 
 expectation is of course not acceptable to scholars. Also not acceptable 
 is the inevitable result that Phoenician texts will be encoded in two 
 different ways, leading to lack of searchability and potentially total 
 confusion.

Chris Fynn previously pointed out a similar issue with Sanskrit texts
written in various Indic scripts.  Having one language encoded in more
than one script is not unprecedented.  Search features can just be
programmed accordingly.

 If there is such a small minority, let us hear from them. As far as I 
 know this is a minority of one.

Please.  When the Phoenician script is approved, I will post a hypertext
version of the Meshe Stele.  
( http://home.att.net/~jameskass/phoeniciantest.htm )

John Hudson provided this scan:

 http://www.tiro.com/view/NorthSemitic.jpg

...which shows the Phoenician script at various stages.  It's a bit misleading,
though.  If the only available reference were this scan, we could infer
that, although the Phoenician language used the letters K, L, and M from
975 to 930 B.C.E., these letters were dropped from the language
by 900 B.C.E. only to be added back into the repertoire by the Moabites
around 830 B.C.E..

Quoting Birnbaum from John Hudson's letter:

  To apply the term Phoenician to the script of the
 Hebrews is hardly suitable. I have therefore coined the
 term Palaeo-Hebrew.

In one sense, it is OK to call Phoenician a Hebrew script, since Phoenician
was used to write Hebrew.  In another sense, calling Phoenician a Hebrew
script would be just as incorrect as calling the Phoenicians Hebrews.

To apply the term Phoenician to the script of the Phoenicians seems
eminently suitable.

Best regards,

James Kass




UNIHAN.TXT

2004-04-30 Thread jameskass

Like UNIHAN.TXT, brevity is not a feature of the following...

Tabs...  In addition to the points Mike made about the tab character having
different semantics depending on the application/platform, I just don't
think a control character like tab belongs in a *.TXT file period.  Although
UNIHAN.TXT is referred to as a database, it isn't.  Rather, it's the raw
material for a database offered in plain-text form.  Still, tabs are arguably
OK.  It's easy enough to strip them out when they're not wanted.  (I'd
rather deal with tabs in a text file which is to be imported into a database
than ASCII quotes.)

Unix -vs- DOS...  I'll stick with the tools I've been using for a quarter century
and their descendants, thanks just the same.  With respect to the idea that a 
text editor is not the proper tool with which to open a *.TXT file, well...

Trivial -vs- non-trivial...  Once the raw data has been imported into a database,
it's trivial to massage or manipulate it.  It's easy enough to generate a CSV
file from a database application, and I've done so.  But, the only reason that
I wanted it in CSV in the first place was to make it easy to import the data
into the database application.  This was *not* trivial to do; it involved a lot
of coding and counting, and a bit of trial-and-error with various field lengths.
Still, the task managed to keep me quiet for a few days...

With a CSV file, importing data from a text file into a database file simply
involves a single line command in the interactive mode (once the database
file structure has been established).  This is true for dBASE, FoxPro, and
related database applications.

Of course, the same kind of single line command can be (and was) used to
import the data from the UNIHAN.TXT file into a database, but this 
produces a huge database file [266844944 bytes] which *still* does not
have proper fields.  It still has one record/one field just like the original
UNIHAN.TXT file.  Which means, if you want to get the information for
a certain field of a certain character, that you have to go skipping through
all 1063127 records checking each one rather than the mere 71098 records
that the database actually requires.  (Of course, you'd use an index file
rather than skipping through all those records in either case.)

But, if you wanted to modify only one field, it's more efficient to skip
through 71098 records reading and modifying only the appropriate field
in the record than to go skipping through all 1063127.  Easier to program, too.
(Suppose you were a purist who wanted to see Stimson's pronunciations using
the actual characters that Stimson used?  Or, say you wanted pronunciations
in lower case rather than upper case and preferred that the tone marks be
superscripted?  Hmmm, maybe you'd want those Japanese pronunciations
in kana instead of romaji...)

So, UNIHAN.TXT is 27592561 bytes, but the CSV text file is 13384544 bytes.
Zipped, UNIHCSV.ZIP is 3477887 bytes.  (The CSV file lacks the initial 802 lines
of comments in the source UNIHAN.TXT file.)

Only cut the size in about half, not as great a savings as I'd imagined.  This
is because many of the fields in the source UNIHAN.TXT are actually
empty, and thus don't occupy a line in the file, while empty fields
in the CSV file still require a single byte for that comma.

D. Starner wrote,

 Because it's a data file, and it's easier to process without all that HTML 
 junk to discard.   

Right on!

John Jenkins wrote,

 Now that UTF-8 support is relatively common, we're moving more and more 
 data in the file to non-ASCII form.

It is a delight to observe this happening already.

 But, changing the format of the file might make it harder for some
 users to find the data they seek.  So, I'm not necessarily proposing
 any change, but rather pointing out that alternatives exist.

 That's the *real* problem.  Goodness knows the current format has real 
 problems, and brevity is not among its virtues.  (OTOH, the format it 
 replaces was brief to the point of being incomprehensible.)  
 Unfortunately, nobody's come up with a good strategy for migrating to 
 something else.

I could send you the CSV file for posting, if you think anyone else would
want it.

Doug Ewell wrote,

 And as John said, converting LF to CRLF is quite a simple task -- it can
 even be done by your FTP client, while downloading the file -- and
 should not be thought of as a deficiency in the current plain-text
 format.

Right.  It's not a deficiency, it simply adds one more step to a multi-step
process for some of us.

Benjamin Peterson wrote,

 Wow -- I'd hate to see your idea of a non-trivial solution!

Me too!

Edward H. Trager wrote,

 People tend to use what they know best, ...

Exactly.

 Absolutely.  The existence of Cygwin makes work on Windows much more tolerable,
 especially since Cygwin provides the OpenSSH client, XFree86, Perl,
 console vim, egrep, etc.  However, I still haven't figured out how to display
 a UTF-8 file with non-latin 

Re: Public Review Issues Updated

2004-04-30 Thread jameskass

John Cowan wrote,

 Ah, I see the next battle line forming:  Is Fraser a separate script, or
 just an oddball application of Latin caps for which we need a few new ones?

Well, the Punic wars may not be over yet.

But, I'd go with Fraser being just an oddball application of Latin caps for
which we need a few new ones.  Like the turned T and reversed K, which seem
to have other uses, too.  Fraser might need some special punctuation-style
characters, or these might be treated as ligature presentation forms of
existing Western punctuation.

Best regards,

James Kass




Re: Fraser

2004-04-30 Thread jameskass

John Cowan wrote,

 Is there an explanation anywhere on the Net?  I don't have D  B.

The Proel page on Miao has a good scan of Fraser script interspersed
with several examples of Pollard script.  Note that Proel fails to make
the distinction between Fraser and Pollard.  The Fraser example 
follows the text La figura inferior muestra el mismo texto, Juan 3:16, 
en caracteres lisu y en dialecto lisu occidental, hablado en China 
sudoccidental.

http://www.proel.org/alfabetos/miaonew.html

But, this example doesn't show the 'punctuation' strings that are
in DB.

Thank goodness for Omniglot!

http://www.omniglot.com/writing/fraser.htm

Here's the text example from DB making use of the PUA in
UTF-8 (you might have to manually select UTF-8, my ISP doesn't
tag outgoing e-mails) which can be viewed if you have a certain
font installed...

[from Daniels and Bright p. 582] 
Sample of Hwa (Western) Lisu 

NY N. NU MI: : SI KW ΛW FI DU FI 
U KO_ LO-. YI NY NU J GU YE T VU NY, G⅂_ BV_ 
LO= O: : DE KW L ℲO I RO U TY_ M S NY 
SI  J GU NU W YE T VU NY, YI CƎ. TƎ, 
TƎ,; BE XY, B LO= 

(I just used existing ASCII punctuation in this example.)

Best regards,

James Kass




Re: Fraser

2004-04-30 Thread jameskass

 (I just used existing ASCII punctuation in this example.)

Actually, I used PUA for these tonal marks, too, it appears.

Best regards,

James Kass




Re: Brahmic Unification (was Re: New contribution )

2004-04-30 Thread jameskass

Andrew C. West wrote,

 No, not at all. The charts may show consonant-vowel syllables, but that does not
 mean that I believe that they should be proposed to be encoded as syllables.
 
 What I was saying was that all the glyphs needed for a proposal are nicely laid
 out here, not that there is necessarily a one-to-one correspondence between
 these glyphs and Unicode characters.
 

Furthermore, Jost Gippert (the author of the Tocharian page) has long been
a proponent of Unicode, has worked with other Indic scripts, and has a good
understanding of Unicoding principles.  What more could we ask?

Best regards,

James Kass




Re: New contribution

2004-04-30 Thread jameskass

Dean Snyder wrote,

 In the case of ideographic
 unification, one can look at the glyphs involved and clearly observe
 the similarity.  This is not so with Phoenician and Hebrew, clearly.
 
 Yes it is, for the ancient periods. 

Because the ancient Hebrews used the Phoenician script.

 Hebrew has been frequently used inexactly in the context here as a
 cover term for a wide range of script variants, spanning thousands of
 years. 

That may be, but not by me.

This is useful in some contexts, but not when we are talking about
 the ancient periods. Hebrew (as a cover term for the scripts used by the
 Israelites down through the millenia) underwent several developmental
 stages. That is why I specifically use the phrase Old Hebrew when
 talking in a Phoenician context. They were contemporary scripts and in
 the earlier periods are practically indistinguishable (as is also Old
 Aramaic). I posted several glyph charts from several scholarly sources on
 the Unicode Hebrew list illustrating the marked similarities (and
 distinctions) that exist between most of the West Semitic diascripts.
 (Multiple columns of which, by the way, are entirely, and conveniently,
 missing from the current proposal.)

Birnbaum apparently coined the phrase palaeo-Hebrew because he
didn't like referring to a Hebrew script as Phoenician.  But, that's
what it was.  When I speak in a Phoenician context, I'm pleased to use
the word Phoenician.  Old Hebrew, palaeo-Hebrew, Phoenician,
and even Old Aramaic are, indeed, practically indistinguishable.

One of the several glyph charts which you kindly provided came here,
to the main Unicode public list.  As I recall, it illustrated similarities
in various scripts which were already unified in the (then) current
proposal.

 Please.  When the Phoenician script is approved, I will post a hypertext
 version of the Meshe Stele.
 
 You can do it right now - just specify one of the nice Phoenician (or
 better here, Moabite) fonts available for the text.


There aren't any, because Phoenician hasn't been encoded yet.

(Couldn't resist, could I?)
 
 
 John Hudson provided this scan:
 
  http://www.tiro.com/view/NorthSemitic.jpg
 
 ...which shows the Phoenician script at various stages.  It's a bit
 misleading,
 though.  If the only available reference were this scan, we could infer
 that, although the Phoenician language used the letters K, L, and M from
 975 to 930 B.C.E., these letters were dropped from the language
 by 900 B.C.E. only to be added back into the repertoire by the Moabites
 around 830 B.C.E..
 
 They're missing from the charts because examples for those particular
 glyphs were not extant in the sparse data available when those charts
 were compiled.
 

The scan John provided shows Phoenician as written by six different
scribes at different times in slightly different places, as far as I can
tell.  (Or, it could have been the same scribe who lived a long life and
moved around a bit.)  The first three lines of the scan are Phoenician 
and labelled as such.  The last line of the scan, Palaeo-Hebrew is a name 
coined by Birnbaum for Phoenician used to write Hebrew.  Likewise, 
the Moabite and Aramaic examples are showing the Phoenician script
used to write those languages.

My guess would be that several letters weren't included in all
of the examples because the original examples, some of which were
apparently single inscriptions, were too short to include all the 
letters of the alphabet.

 
 It is the same script shared by the ancient Phoenicians, Hebrews,
 Samaritans, Aramaeans, Moabites, Ammonites, and Edomites.  In short, the
 name Canaanite seems preferable. After the first few centuries of use
 of this script by these peoples, each of the major cultural groups
 developed this shared script along sometimes more, sometimes less,
 independent tracks.
 

If a name like Canaanite or proto-Canaanite would be preferable, 
then so be it.

Best regards,

James Kass



Re: Unihan.txt and the four dictionary sorting algorithm

2004-04-20 Thread jameskass

Raymond Mercier wrote,

 John Jenkins writes
 Also, even though the full Unihan database is 25+ Mb in size, given the
 cheapness of disk space nowadays, it's not all *that* big, surely.
 
 
 The problem of the size of Unihan has nothing at all to do with the cost of
 storage, and everything to do with the functioning of programs that might
 open and read it.
 Since the lines in Unihan are separated by 0x0A alone, not 0x0A0x0D, this
 means that when opened in notepad the lines are not separated. Notepad does
 have the advantage that the UTF-8 encoding is recognized, and the characters
 are displayed.

UNIHAN.TXT isn't going to get any smaller by itself.  The trend indicates
that it will just keep on growing, even if VS characters are used with CJK.

The DOS editor chokes on such a large text file, so does my older hex
editor.  Thank goodness for BabelPad, otherwise it would've been hard
to insert proper (for my system) line breaks into the file.

The tab character is used in the file.  Arguably, this character should
never appear in a plain text file, rather it should be converted to an
appropriate number of U+0020 characters by the application on save.
Of course, this would make the file even bigger.

Instead of (for instance) KUA4, why not KUA⁴?

Much of the text in UNIHAN.TXT is redundant, the hex character
is repeated along with each field name over and over again.  

Putting the hex character at the beginning of each line, with one
character per line and CSVs would make UNIHAN.TXT *much* smaller.
Of course, commas would have to be removed from the definition
fields.  (Hmmm, maybe definition field commas could be replaced
with MIDDLE DOT?)

But, changing the format of the file might make it harder for some
users to find the data they seek.  So, I'm not necessarily proposing 
any change, but rather pointing out that alternatives exist.

In spite of its unwieldy size, UNIHAN.TXT is a useful tool and I'm
grateful for its existence.

Best regards,

James Kass




CJK U+3ADA and U+66F6

2004-04-08 Thread jameskass

Is there a difference between U+66F6 and U+3ADA?

The newest UNIHAN.TXT file doesn't have a definition field for
U+66F6.  The glyphs in the Unicode 4.0 book appear identical
for these two characters.  One is placed with radical 72, the
other with radical 73, although UNIHAN.TXT gives both as
having radical 73.  

U+3ADA  kIRGKangXi  0502.080

U+66F6  kKangXi 0502.080

(In UTF-8:)
U+3ADA (㫚)
U+66F6 (曶)

Best regards,

James Kass




Re: CJK U+3ADA and U+66F6

2004-04-08 Thread jameskass

Asmus Freytag wrote,

 this is the kind of thing that you should report via
 our error reporting form. Here on the open list, it's
 liable to get lost (no-one owns excerpting issues from
 this forum).

Before reporting it through proper channels, I wanted to try
to find out which kind of error it is.  It could either be
a bad glyph in the font(s) or a truly duplicated character.

Radicals 72 and 73 are similar in appearance, but the central
horizontal line in Rad. 73 doesn't meet the vertical line on
the right.  But, when radical 73 is used as a component of
other characters, it often looks just like radical 72.  So,
I'm not sure whether there are two separate characters with
the same top component (U+52FF 勿) over two different radicals, 
or just a duplication.

Best regards,

James Kass




Re: New Currency sign in Unicode

2004-04-01 Thread jameskass

See the currency symbol in use on postage stamps,
http://www.bird-stamps.org/country/ghana.htm
...notice the different glyphs in the third and fourth rows.

Best regards,

James Kass



Re: New Currency sign in Unicode

2004-04-01 Thread jameskass

Jim Allan wrote,

 This web page also has a slashed capital G for the Paraguayan guarani, 
 another symbol not in Unicode.


The guarani symbol has been accepted by the UTC.  Here's
the original proposal:

http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2579.pdf

Best regards,

James Kass




Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread jameskass

John Cowan quoted,

 Well, it depends on what the equivoque combining marks in the title of
 Section 7.7 means.  This is where (p. 187) the remarks about SP and NBSP
 appear:
 
 # Marks as Spacing Characters.  By convention, combining marks may be exhibited
 # in (apparent) isolation by applying them to U+0020 SPACE or to U+00A0 NO-BREAK
 # SPACE.  This approach might be taken, for example, when referring to the
 # diacritical mark itself as a mark, rather than using it in its normal way
 # in text. 

Note the use of may and might in the quoted text rather than must.  

The above could be interpreted in part as '... combining marks may be exhibited 
in (apparent) isolation by applying them to U+0020 SPACE, or they may not.'
Such an interpretation might lead people to decide that the approach is
up to the renderer.

Semantics aside, if the default display appearance of a combining mark in isolation
on a certain system is the mark on a dotted circle, then that system should be 
considered conformant when it displays space+mark as dotted_circle+mark.

An observation, FWIW:  on the system here, combiners in Indic scripts get
the dotted circle, but combining diacritics from the (mostly) Western
combining diacritics range don't.  Space + U+0327 displays a stand-alone
cedilla here; no dotted circle.

Best regards,

James Kass




Re: What is the principle?

2004-03-28 Thread jameskass

Asmus Freytag wrote,

 While applications predating VSs have no choice but to treat them as what
 they are (in that context) i.e. unassigned characters, applications of later
 date have no business treating unapproved VS sequences as unassigned 
 *characters*.
 
 The intent of VSs is to mark a difference that falls below the distinction
 between separately encoded characters. Therefore I would expect that by default
 all VS charactesr are ingnored in an fullblown collation implementation, 
 leaving
 open the choice of supporting, say, a fourth level difference between specific
 known variation sequences.
 
 They are also best ignored in any kind of identifier or name matching, as 
 otherwise
 the presence of invisible characters can change the lookup--with all the 
 consequences
 for spoofing and security.

What you're saying makes perfect sense for purposes of forwards
compatibility.  Thanks to both you and Ernest Cline for pointing
this out.

I'd prefer to see some kind of toggle for file/archive searching with
respect to ignoring VS characters, but can't argue with ignoring them
for security/spoofing issues.  Otherwise, the spam problem might well
become even worse.

Good collations are tailorable, so if the default condition is for 
collation to ignore VS characters, that shouldn't make problems for 
anyone.

Best regards,

James Kass

 At 07:53 PM 3/27/2004, [EMAIL PROTECTED] wrote:
 
 
   What does the collation standard say to do with unassigned codepoints
   anyhow?
  
   Variation selectors are not unassigned characters.
 
 But, they might be regarded as such by any application predating VSs.  And,
 likewise for any VS sequences approved after the application was created.
 
 While applications predating VSs have no choice but to treat them as what
 they are (in that context) i.e. unassigned characters, applications of later
 date have no business treating unapproved VS sequences as unassigned 
 *characters*.
 
 The intent of VSs is to mark a difference that falls below the distinction
 between separately encoded characters. Therefore I would expect that by default
 all VS charactesr are ingnored in an fullblown collation implementation, 
 leaving
 open the choice of supporting, say, a fourth level difference between specific
 known variation sequences.
 
 They are also best ignored in any kind of identifier or name matching, as 
 otherwise
 the presence of invisible characters can change the lookup--with all the 
 consequences
 for spoofing and security.
 
 A./ 
 
 
 



Re: Printing and Displaying Dependent Vowels

2004-03-28 Thread jameskass

C J Fynn responded to John Hudson,

 If someone wants this,  isn't it possible to put a specific lookup in the font
 so that any dependant vowel following a space character renders as a spacing
 (stand-alone) dependant vowel? Surely a specific lookup should overide it being
 displayed on a dotted circle by default.

Has anyone tried this?  Would the space glyph U+0020 be expected to trigger
a look-up in the Tamil GSUB table as if it were a Tamil base character?

The reason that I haven't tried this is because, in the OpenType look-ups here
for the re-ordrant vowel signs of Tamil, the vowel sign is INPUT1 and the
base letter is INPUT2.  This is because the rendering engine has already
re-ordered the character string before this look-up is performed.  It doesn't
seem likely that a rendering engine would re-order a vowel sign before a space.
It could be tested both ways, I suppose...

This seems to be OT for this list, but, here it is, and it will probably keep
popping up from time to time unless clarified.

I can only make inferences and suppositions based on observation of the
behavior and reasoning behind the behavior of the rendering engine used
here, Microsoft's Uniscribe.  People who know all about this do follow
this list, so they're free to offer corrections.

inference and supposition

Uniscribe inserts the dotted circle into the display for complex scripts in
order to give a visual indication of an encoding or spelling error.  This seems
quite useful whether text is being entered or merely displayed.

Allowing dependent vowels to follow the space character breaks this utility.
In other words, somebody could write a Tamil word in a web page starting
with the E-vowel-sign (U+0BC6), and there'd be no indication that this is 
improper, either to the author or the visitor.

Someone searching for that word on that page wouldn't find it, and so on.

Maybe some kind of spell-checker should be used by the original author, but,
there seems to be no way to assure that spell-checking was performed by the
author of any web page one visits.

It is the very appearance of that dotted circle unexpectedly in our texts which
alerts us to the fact that we have made a mistake.  That dotted circle jumps out
of the page into our vision exclaiming, Hey, I'm wrong!  I'm so wrong, don't
even bother running your spell-checker on me!  This is the basis upon which
Uniscribe renders text which includes dependent vowel signs, not just for Tamil,
but for the other so-called complex scripts, too.  The dotted circle plus the
matra is the default rendering for combining marks *in isolation*.  Uniscribe
seems to rightly treat a vowel sign following a space as being in isolation, and,
how could it do otherwise?  What goes for the space character also seems to
go for any other character which is not a valid character *within the Unicode 
range*.  Again, how could it be otherwise.  If the first character in a string
isn't a Tamil character, there's no reason for the renderer to consult the Tamil
OpenType tables in a font.  If it did, my gosh, imagine all the pointless look-ups
just to display a page which was, for example, mostly Chinese with a few Tamil
phrases.

end of supposition and inference

The good folks engineering the Uniscribe have been most responsive to all kinds
of special requests and pointers related to complex script shaping.

I think asking them to break the existing mechanism in order to support
vowel signs on spaces asks too much, though.

People generating texts for educational purposes will always have special needs.
So, they'll always need to make special effort to get special effects.  Workarounds
concerning the original question have already been suggested.

If this is treated as a Unicode issue rather than a display issue, then one solution
would be for someone to propose a new character, (back on topic a little bit)
COMBINING DOTTED CIRCLE FOR COMBINING MARKS.
Then, rather than inserting DOTTED CIRCLE into the display, a rendering engine
could be changed to insert this new character.  Then, these updated rendering
engines could be distributed and font developers could add the new characters
to fonts and distribute updated fonts.  This might just take a while, but it
wouldn't be too hard to find examples of the character in actual text use to
accompany the proposal...

If it ain't broke, don't fix it.  So, is it 'broke'?

Best regards,

James Kass





RE: Printing and Displaying Dependent Vowels

2004-03-27 Thread jameskass

Peter Jacobi wrote,


 Using the Linux version of Abiword, which uses the Pango renderer,
 both the Code 2000 and the MS Latha font display the vowel signs without the
 unwanted dotted circle. NBSP and normal SPACE give identical results.
 For Code 2000 only, the dotted circle or a similiar ersatz glypg (the
 screenshot is
 not that clear) is drawn for the two-part vowel signs U+0BCA, U+0BCB and
 U+0BCC
 between the two parts.

U+0B82 TAMIL SIGN ANUSVARA is substituted and re-positioned in the compound 
glyphs of Code2000 for the normal dotted circle in the default glyphs for 
U+0BCA, U+0BCB, and U+0BCC.  

This is only expected to appear with a rendering system which does not support 
OpenType.  This is because the default glyphs for these surroundrant vowel signs 
would never be drawn on the screen.  Rather, the expected approach from the 
rendering engine is to use the component glyphs for these three vowel signs, such 
as U+0BC7 for the left part of U+0BCA, and U+0BBE for the right-side portion.

If the presence of these default glyphs in Code2000 is making problems, they can
be adjusted.  (Just because I expect a rendering engine to take a certain approach,
doesn't mean that a rendering engine will take that approach!)

On Windows, as others have noted, the rendering engine (Uniscribe) inserts the
dotted circle glyph (if the font has a dotted circle glyph) into the display.  The
dotted circle character is not inserted into the text, of course.

So, if the question is how to make an OpenType font *not* display the dotted
circle on Windows with Uniscribe, one idea would be to add a spacing glyph to
U+25CC (DOTTED CIRCLE) in the font.  This spacing glyph should be a no-contour
glyph, perhaps with the same advance width as U+0020.  I've not tried this,
but it might just work.

Another approach is to simply use a non-OpenType Unicode TrueType font for
Tamil.  The dotted circles don't seem to ever appear unless the font-in-use has
OpenType tables covering the script-in-use.

Best regards,

James Kass




Re: What is the principle?

2004-03-27 Thread jameskass

Asmus Freytag wrote,

 Surly not!

Intentional pun, inadvertent one, or Freudian slip?

 Uninterpreted VS characters should *not* turn into black blobs. If we had
 wanted that to happen, we would have coded different characters.

U+E000 COMBINING BLACK BLOB?  Censors would probably love it.

 What does the collation standard say to do with unassigned codepoints
 anyhow?
 
 Variation selectors are not unassigned characters.

But, they might be regarded as such by any application predating VSs.  And,
likewise for any VS sequences approved after the application was created.

Best regards,

James Kass




Re: tick, tick box, cross, cross box

2004-03-21 Thread jameskass

Avarangal wrote,

 We are in need of tick, tick box, crossand cross box preferably as symbols with 
 code points.

Here are some symbols with code points which might work:

U+2610 BALLOT BOX
U+2611 BALLOT BOX WITH CHECK
U+2612 BALLOT BOX WITH X
U+22A0 SQUARED TIMES
U+229E SQUARED PLUS
U+2713 CHECK MARK
U+2714 HEAVY CHECK MARK
U+2715 MULTIPLICATION X
U+2716 HEAVY MULTIPLICATION X
U+2617 BALLOT X
U+2618 HEAVY BALLOT X

Best regards,

James Kass


We are in need of tick, tick box, crossand cross box preferably as symbols with code 
points.

Any advice on this is appriciated

SArivas


We are in need of tick, tick box, 
crossand cross box preferably as symbols with code points.

Any advice on this is appriciated

SArivas


RE: New What is Unicode translation.

2004-03-20 Thread jameskass

Speaking of translations of What is Unicode?, I found this page:

http://asuult.net/badaa/unicode.htm

It is in Mongolian (Cyrillic).

Best regards,

James Kass

 Don,
 
 Offers to translate What is Unicode? to a particular language should
 be addressed to the Unicode office. This can be done through our
 reporting form http://www.unicode.org/reporting.html or by emailing me
 directly.
 
 Magda
 
 PS: For everybody's convenience, we provide an html template for the
 translation as well as a set of translation formatting instructions.
 
 
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 On
  Behalf Of [EMAIL PROTECTED]
  Sent: Thursday, March 18, 2004 6:30 AM
  To: [EMAIL PROTECTED]
  Subject: Re: New What is Unicode translation.
  
  If someone were interested in translating to an additional
 language(s), to
  whom
  should they write?  TIA...
  
  Don Osborn
  Bisharat.net
  
  Quoting Magda Danish \\(Unicode\\) [EMAIL PROTECTED]:
What is Unicode in Finnish is now online thanks to Jarkko
 Hietaniemi.
  
   Check it out at
   http://www.unicode.org/standard/translations/finnish.html
  



Re: Irish dotless I

2004-03-18 Thread jameskass

Anyone who feels that past monetary contributions towards encoding 
efforts were made based on false pretenses may be able to seek legal 
redress.

There's a certain barrister in Africa who might be able to help in this
regard.  Of course, this barrister works under conditions of strict
confidentiality, so I can't tell you the exact nature of our business
relationship.

Perhaps we should wait and see if the big pile of money actually shows
up in the bank account here before forwarding the barrister's contact
information along.

After all, just because someone puts something into an e-mail...
that doesn't make it true...

Best regards,

James Kass




Re: Investigating: LATIN CAPITAL LETTER J WITH DOT ABOVE

2004-03-18 Thread jameskass

Curtis Clark wrote,

 on 2004-03-18 01:05 Pavel Adamek wrote:
  So it would be convenient to have an empty diacritical mark,
  (COMBINING NOTHING ABOVE)
  which would cause the soft dot of j or i to disappear,
  without adding anything else.
 
 Assuming this could be added to any other character, my mind boggles at 
 the implications, both for decomposition and for rendering. :-)

The glyph could look like the old Pac-Man video game.  It should remain
visible until it has consumed the applicable diacritic, then vanish.

Best regards,

James Kass




Re: About the Kikaku script for Mende, and an existing font for it

2004-03-16 Thread jameskass

Philippe Verdy wrote,

 So it seems that tone marks used in the latin transcription of Mend頡re not
 marked in the Kikaku script. It would be interesting to have some book prints
 available to see if there are punctuation signs or symbols to mark word
 separation, as well as digits or numbers (some syllables in the Kikaku script
 closely ressemble to the European digits, and I wonder if an alternate notation
 was used to mark numbers, or dates, or simply commercial quantities for market
 exchange and accounting or for marriage dotations, or for customary judiciary
 decisions, in countries where most of negocations were performed orally).

Does anyone have access to a copy of the following?:
 Tuchscherer, K.T. 1996. The Kikakui (Mende) syllabary 
 and number writing system. Ph.D., London School of 
 Oriental and African Studies. 

Based on the title, it seems that Kikakui might have additional symbols
for numbers.

Best regards,

James Kass




Re: Battles lost before they begin?

2004-03-14 Thread jameskass

Chris Jacobs wrote,

 If you have the text in UniPad try the following:
 
 Edit Convert Decompose Combinations, and then
 Search Replace Text to find: \u0323 Replace with: ;. Replace All

Or \u0329, depending on which diacritic was used in the source.

Also, since the mark below can appear along with a combining grave
or acute, those decompositions would have to be considered in the
search and replace operations.

 Since the input for the search window already does not have to be the font
 codes a non-conformant font is not as bad as it first seemed.
 I think it should be not that hard to change the search window to let it
 accept unicode too.

Perhaps they've chosen their custom encoding in order to side-step the
mark-below issue.  As you say, it shouldn't be hard to enable Unicode
in their search window.  And, once an agreement within the user community
is reached on the mark below, it shouldn't be that hard to convert their
entire web site and database to Unicode, too!

Best regards,

James Kass




Re: Battles lost before they begin?

2004-03-13 Thread jameskass

Don Osborne wrote about the on-line Yoruba dictionary.

Without some kind of an agreement among Yoruba users as to which combining
mark should be used under certain letters (vert. line or dot), Unicode font 
development for Yoruba is pretty much stymied.  This is really a shame.

It's also too bad that the good folks behind the dictionary project didn't
use an existing 8-bit encoding scheme rather than adding to the disarray.

Best regards,

James Kass




Re: Mende Kikakui syllabary

2004-03-12 Thread jameskass

Konrad T. Tuchscherer, Ph.D. wrote,

 I write to the list from Cameroon where I am conducting research on the Bagam 
 and Bamum scripts.
  
 The Proel page should not be consulted for information on the Mende syllabary 
 (Kikakui) or any other African script (or system of graphic symbolism, like 
 Adinkra).  The Mende syllabary is not pictographic.  The dubious map shows the 
 Mende in Liberia, the Loma in Sierra Leone (they are in Liberia, known as Loma; 
 in Guinea known as Toma), the Bamana in Sierra Leone, and Adinkra in Liberia!!!
  
 As I often explain to my students, any one can publish something on the internet 
 -- lots of unreliable stuff out there!
  

Indeed.

Researching the Bagam and Bamum scripts sounds fascinating.

(I've quoted Dr. Tuchscherer's entire message above, it was clearly 
intended for the list, but does not seem to have appeared there.)

At least Proel's page on bamún shows them in Camerún.  Although Proel's
accuracy is questionable, they often have fairly good scans of some
fairly obscure writing systems.

http://www.proel.org/alfabetos/bamun.html

Sadly, many of their examples of the evolution of the Bamum script
are unclear.

Best regards,

James Kass





Re: Canadian Unified Syllabics

2004-02-10 Thread jameskass


Chris Harvey wrote,

... I want the
 examples on my site to be legible (dot accents non-spaced in the middle of
 syllabics instead of above them aren't really acceptable), and I want the
 characters to look like what speakers are familiar with, otherwise they may
 very well choose not to use the font, keyboards, etc.

 My aim is that people can type their own language on the computer they have
 now. Once OpenType is available on my machine and others, I will release
 fonts which have OpenType tables, calling the same glyphs that are now in
 the PUA. This way, I am trying to make some humble attempt at backward
 compatibility. But for now, if people cannot use the OpenType
 substitutions, what else should I do? 
 
 I am building specific fonts for specific languages, but I wanted one font
 that would display the lot. That way, if someone wanted to use
 languagegeek.com, they would only have to download one font, instead of one
 per language.

These are all laudable goals with understandable intentions.  As far as
*characters* which aren't yet encoded, the PUA really seems to be the
only method.

Since you asked, however, an alternative to the current approach would
be to:

* Encode the pages as compliantly as possible.

* Offer the one font to fit all the pages while awaiting either
   language-specific fonts or OpenType technology availability.

* Note on the pages that the one font aims to cover all syllabics, but that
   language-specific variants exist which can't yet be covered in a single 
   font due to technological limitations.

* Use any combining dots and so forth from the COMBINING
   DIACRITIC range.  (A font like Code2000 won't display these
   combiners well due to technology limitations, but, so what?
   In *your* font, you can place the combining glyphs so that
   their default position is acceptable and won't overstrike the
   base glyphs.)

An advantage to doing something like the above is that backwardness
isn't being perpetuated under the guise of backwards-compatibility.
Another merit is that text (aside from necessary PUA matter) is
correct, compliant, interchangeable, and permanent.  Parsers,
search engines, indexing operations, and all the rest, will work
as they should.

A disadvantage of the current approach is that users may be too
easily tempted to also generate text, data, and web pages using
a proprietary encoding.  In the long run, many might view this
a something other than a favor to the user communities.

Best regards,

James Kass




Re: Phonology [was: interesting SIL-document]

2004-02-05 Thread jameskass

John Cowan wrote,

 Arcane Jill scripsit:
 
  Delenn said abso-fragging-lutely dammit on Babylon 5 once. Wasn't that 
  American?
 
 Indeed.  ...

Nope, sorry.  Not American -- Minbari.

For more info on the Minbari, please see:
http://www.sadgeezer.com/babylon5/minbari.htm

Best regards,

James Kass



Re: Panther PUA behavior

2004-02-05 Thread jameskass

Doug Ewell wrote,

 ... On Windows, I can't even rely on
 being able to display real Unicode characters for Vietnamese in places
 like the Start menu or the title bar of the browser, because they're not
 in the one and only font used for each of those places.

For the title bar of the browser,

[Start] - [Control Panel] - [Display] - [Appearance] - 
[Advanced] - Select Inactive Title Bar in the box for Item,
then select a font from the pop-up list that covers the
encoding and range of characters.  Select a size that looks good.
[OK] - [Apply] - [OK]
Then, exit Control Panel and try it.  

Note that there are other font settings besides Inactive 
Title Bar that can be changed in that same menu to
customize the appearance of other items.  Also note that
Inactive Title Bar seems to apply to active ones, too!

Best regards,

James Kass




Re: Panther PUA behavior

2004-02-05 Thread jameskass

Doug Ewell wrote,

 No, no, I know how ...

I thought you might.

 ... I meant
 that because Windows doesn't do any fancy font switching in title bars
 to cover glyphs that aren't in the selected font ...

It's too bad that these user-selectables don't allow for some kind of
prioritized font list.  For power users.

Best regards,

James Kass



RE: Infix profanity (Very OT) (was Phonology)

2004-02-05 Thread jameskass

Arcane Jill wrote,

 ...However, at the time she said 
 abso-fraggin-lutely, she did so because she was learning how to swear
 in English ...

In this context, your initial observation appears to be spot on!  
Rescind your retraction, I'll recall my rhyme.

Best regards,

James Kass




Re: Examples of Cuneiform Ideographic Descriptor Usage

2004-02-03 Thread jameskass

Dean Snyder wrote,

 In preparation for tomorrow's Unicode Technical Committee meeting, and
 for general review and comments, I have uploaded a 140kb PDF file that
 illustrates some usage examples of the proposed Cuneiform Ideographic
 Descriptors.
 
 http://www.jhu.edu/ice/basesigns/CuneiformDescriptorUsage.pdf
 
 Circumstances have forced me to get this out in a hurry and I know there
 are mistakes in it, but I believe it will still be useful as a point of
 departure for discussion. [As an exercise for the reader, see if you can
 find any mistakes ;-)]
 

In circled 9 and 10, the same code point (1221B) is given for LU2 SQUARED
and LU2 TENU.

In circled 15, the same glyph is used for 1240A INFIX and 1240B OUTFIX.

Glyph descriptors could theoretically be applied to any script.  Once
more than one or two strokes are used to form glyphs, there are bound 
to be recognizable components.

So, I think I understand how the system you are proposing works, although
some of the sequences are less than clear for me, perhaps because I'm not a 
Cuneiform expert.  Please see attached 4KB GIF picture in which graphics 
from the PDF file were borrowed and applied to some Latin glyphs.  What 
I'm not understanding is why this approach should be considered superior 
to the static approach underlying the current Cuneiform proposal.

Cuneiform ideographic descriptors could be quite useful for illustrating
the components of existing Unicode Cuneiform characters as well as
providing a method for scholars to describe hitherto unknown and/or
unencoded characters.

But, I share the concern expressed by others on this list that bringing
up an alternative encoding method for Cuneiform at this stage might
derail the existing proposal, which appears to be on-track.

Best regards,

James Kass
 
ideodesc.gif

Re: Chinese FVS? (was: RE: Cuneiform Free Variation Selectors)

2004-01-20 Thread jameskass
- Original Message - 
From: John Jenkins [EMAIL PROTECTED]
To: Unicode List [EMAIL PROTECTED]
Sent: Tuesday, January 20, 2004 9:32 AM
Subject: Re: Chinese FVS? (was: RE: Cuneiform Free Variation Selectors)


.
John Jenkins wrote,

 1)  U+9CE6 is a traditional Chinese character (a kind of swallow) 
 without a SC counterpart encoded.  However, applying the usual rules 
 for simplifications, it would be easy to derive a simplified form which 
 one could conceivably see in a book printed in the PRC.  Rather than 
 encode the simplified form, the UTC would prefer to represent the SC 
 form using U+9CE6 + a variation selector.

Except that this character is listed in CJK Extension C, on page 612.
(File:  IRGN9285.PDF 08/06/02)

Best regards,

James Kass
.



Re: Combining down-pointing triangle above?

2004-01-18 Thread jameskass
.
Doug Ewell wrote,

 Is this just a fancified hacek, or a potential candidate for proposal?
 Naturally, from a Unicode standpoint I'm thinking about a combining
 character, not a precomposed c-with-triangle.

It might be a caron, see:
http://www.chumashlanguage.com/pronun/pronun-00-fr.html

Best regards,

James Kass
.



Mongolian Unicoding (was Re: Cuneiform Free Variation Selectors)

2004-01-18 Thread jameskass
.
Dean Snyder wrote,

 SOMEONE at SOMETIME must have thought that free variation selectors were
 a good idea for Mongolian in Unicode. If the thinking has changed on this
 since then, I would love to hear about why it has changed. Is Mongolian
 functioning well in Unicode or not? If not, what specifically in it is
 broken, or is at least sub-optimal? And what are suggested solutions for
 fixing Mongolian in Unicode if it is indeed problematic?

Andrew C. West offers test pages for both Mongolian and Manchu.

These pages have some of the technical background that you seek
concerning variation selectors and Mongolian, as well as explore
many issues concerning Unicode Mongolian.

There is some good information about Variation Selectors on the
Mongolian page under the heading 
Mongolian Free Variaton Selectors.
(Hello Andrew, ...Typo alert!)

http://uk.geocities.com/BabelStone1357/Test/Mongolian.html

Unicode for Mongolian is working perfectly on many platforms,
(smile) but only if we're discussing Cyrillic script.

Best regards,

James Kass
.



Mongolian Unicoding (was Re: Cuneiform Free Variation Selectors)

2004-01-18 Thread jameskass
. 
Dean Snyder wrote,

 Tom Gewecke wrote at 2:26 PM on Sunday, January 18, 2004:
 ... 
 
 Agreed.  I can't imagine that anyone who has ever tried to actually do
 anything with Unicode Mongolian would recommend variation selectors as an
 encoding technique, unless perhaps they wanted to make sure the encoding
 was never implemented.
 
 Could you please elaborate? Has this modle not been implemented? Either
 via Unicode or otherwise?
 

Here's how it works:  there are three factions involved.  The OS and
rendering-engine developers, the editor/processor/input developers, 
and the font developers.  Each faction considers that the fancy stuff 
needed for Mongolian rendering should properly be handled through
a combination provided by the other two factions.

Seriously, it's my understanding that implementation guidelines
for Mongolian script and Unicode are still being worked out.

Aside from experimental set-ups, it's unlikely that anyone can
yet correctly (or, even reasonably) display the Mongolian text
on Andrew C. West's test pages.

Best regards,

James Kass
.
 



U+0185 in Zhuang and Azeri (was Re: unicode Digest V4 #3)

2004-01-05 Thread jameskass

- Original Message - 
From: Peter Kirk [EMAIL PROTECTED]
To: Philippe Verdy [EMAIL PROTECTED]
Cc: Unicode Mailing List [EMAIL PROTECTED]
Sent: Monday, January 05, 2004 8:16 AM
Subject: Re: unicode Digest V4 #3


Peter Kirk wrote,
 
 I note an incorrect glyph for U+0185 in Code2000 and in Arial Unicode 
 MS; this looks like b with no serif at the bottom but should be much 
 shorter, like ь, the Cyrillic soft sign. The Arial Unicode MS glyph for 
 U+04BB is also incorrect - it should look identical to Latin h - but 
 this problem is well known.
 

No comment on U+04BB.  With regards to U+0185, could it be
said that the informative glyph in TUS 2.0, 3.0 and 4.0 is a bit
misleading, or does that glyph represent a variance from the
text(s) with which you're familiar?

http://www.unicode.org/charts/PDF/U0180.pdf
Magnify U0180.pdf to 400% and put the row 0185 - 0195 - 01A5
towards the top of the screen so that the top of U+0185 touches
the screen area border.  Note that the top of U+0185 aligns with
the top of U+0195, suggesting that these glyphs would have the
same height.

In THE LANGUAGES OF THE WORLD by Kenneth Katzner (1975),
the example for Chuang seems to show a glyph covering U+0185
as you describe.  (page 212)

This page uses a scan from THE LANGUAGES OF THE WORLD
as its Chuang example:
http://www.worldlanguage.com/Languages/Chuang.htm

No sample text, no lower case illustration:
http://www.alphabets-world.com/chuang.html

If the informative glyph in TUS *is* misleading, I'll be happy
to make appropriate changes here.

Best regards,

James Kass
.
 



Re: Saving in Unicode

2004-01-05 Thread jameskass
.
Jose Rodriguez wrote,

 Can anyone tell me if it is possible to save a file in Unicode format
 through Visual Basic and if so how to do it?
 
 I have a Visual Basic program which converts my client's file from one
 format to another.
 
 However the resulting file must be saved in Unicode.
 
 Please give any help you can or at least point me in the right
 direction.

Internationalization with Visual Basic 
By Michael S. Kaplan 
http://www.i18nwithvb.com/

Best regards,

James Kass
.



Re: U+0185 in Zhuang and Azeri (was Re: unicode Digest V4 #3)

2004-01-05 Thread jameskass
.
Michael Everson wrote,

 Well, James, I think it would be A LOT better if we got some actual 
 documents from Zhuangland.

Agreed.  Meanwhile...

The glyphs used in Everson Mono Terminal for U+0185 and U+044C appear
to be identical.

That's good enough for me.  I'll fix things here accordingly.

Best regards,

James Kass
.



Re: U+0185 in Zhuang and Azeri (was Re: unicode Digest V4 #3)

2004-01-05 Thread jameskass
.
Kenneth Whistler wrote,

 Note that there are more modern representations of Zhuang that
 dispense with the special tone letters altogether and
 substitute out ordinary Latin letters, in a Pinyin-like
 simplification. See:
 
 http://www.liuzhou.co.uk/liuzhou/language.htm
 
 with a sign showing the substitution of Latin J, H, Z, X, W(?)
 for the 5 Zhuang tone letters.

The chart on this Japanese page about the modern Latin based
Zhuang writing systems appears to confirm that ASCII letters are 
now used for tone marking, but uses the q in place of your 
questioned W.

http://www.geocities.co.jp/NatureLand/3973/zhuangyu_ch06.htm

Best regards,

James Kass
.



Re: Ancient Northwest Semitic Script

2003-12-27 Thread jameskass
.
Dean Snyder wrote,

 
 But, in either case it is hoped that the needs of script
 taxonomists and paleographers won't be disregarded.
 
 So Unicode is now prepared to provide support, in plain text, for the
 needs of paleographers?


Practitioners of many sciences need Unicode in order to store and exchange 
information.  Mathematicians have successfully encoded what are essentially 
Latin glyph variants separately for usage as math variables in Plane One, 
including Fraktur and cursive styles.

Epigraphers may elect to classify and codify specific variants for specific
needs.  They could organize and submit a proposal for these requirements
using, say, the existing Unicode mechanism of variation selectors.  If they 
did so, wouldn't the various bodies give such a proposal due consideration?
 
 
  Well I, for one, prefer to read in more paleographically relevant
  renderings; and fonts combined with markup will, of course, take care of
  everything.
 
 That's not very useful in plain text.  Unicode is an encoding standard for 
 plain text.
 
 Fraktur has precisely the same plain text rendering issues.


Indeed it does.  (Unless you're a mathematician, of course!)
 
 Quoting from N2311.PDF:
 
 This document by Michael Everson is particularly revealing and in the end
 damning to his whole attempt at disunification of the Northwest Semitic
 script.


The document by Michael Everson is what I had thought had sparked
this thread.
 
 
 If we compare this list to the taxonomic chart he reproduces on the next
 page (see the attachment), we see convenient, but nevertheless glaring,
 discrepancies between the two. Not mentioned in his list but appearing in
 the chart under Phoenician are Samaritan, Hebrew Square, Arabic, and
 Aramaic - including Nabatean, Palmyrene, Mandaic, Syriac, etc. (See the
 attachment.)


It is an evolutionary chart.
 
 
 Everson's fuller quote here is:
 
 Phoenician is the catch-all for the largest group of related scripts
 including its ancestors, Proto-Sinaitic/Proto-Canaanite. Looking at
 tables 5.1, 5.3, and 5.4 (below) most of the scripts are so similar that
 there doesn't seem to be any point in trying to encode them separately.
 
 But he conveniently excludes any tables for Aramaic, Hebrew Square, and
 Samaritan paleography and also fails to mention the one column out of
 sixteen in these tables that IS devoted to Aramaic.


A possible reason for omitting tables for Hebrew Square is that this
is what is already encoded under HEBREW in Unicode, thus it doesn't
need covering in a proposal for unencoded scripts.  It's also possible 
that full tables for Aramaic were omitted because, as the document 
mentions, further research is required for Aramaic.  Samaritan is 
covered (at least with a chart) in a different document,
http://www.evertype.com/standards/iso10646/pdf/samaritan.pdf
 
 So once again I refer to other tables with broader paleographic attestation
 
 http://www.jhu.edu/ice/ancientnorthwestsemitic/gesenius.gif
 http://www.jhu.edu/ice/ancientnorthwestsemitic/gibson1.gif
 http://www.jhu.edu/ice/ancientnorthwestsemitic/gibson2.gif
 
 and, based on such tables, suggest, in Everson's words, that Looking at
 [THESE tables] most of the scripts are so similar that there doesn't seem
 to be any point in trying to encode them separately.
 

gesenius.gif shows logical divisions between Old Hebrew, Samaritan, Old
Aramaic, and Aramaic-Hebrew.  It would seem to align well with Michael
Everson's N2311.PDF.

gibson1.gif is all about (palaeo-) Hebrew and Moabite, which would seem
to already all be covered under Phoenician in N2311.PDF

gibson2.gif appears to show the evolution of the Aramaic script.  Some 
of the Hebrew legend glyphs at the extreme left bear a passing 
resemblance to some of the Aramaic glyphs.  There is a resemblance
between many of the Aramaic glyphs and many of the the Phoenician 
(palaeo-Hebrew) glyphs.  Again, further research is required on
Aramaic.

Here's another interesting chart:
http://phoenicia.org/imgs/evolchar.gif

Quoting Herodotus (translated by Audrey de Selincourt)

quote
The Phoenicians who came with Cadmus - amongst whom 
were the Gephyraei - introduced into Greece, after their 
settlement in the country, a number of accomplishments, 
of which the most important was writing, an art till then, 
I think, unknown to the Greeks. At first they used the same 
characters as all the other Phoenicians, but as time went on, 
and they changed their language, they also changed the 
shape of their letters.
end quote

Phoenician shouldn't be unified with either Greek or Hebrew.

Best regards,

James Kass
.



Re: Ancient Northwest Semitic Script

2003-12-27 Thread jameskass
.
Peter Kirk wrote,

 Perhaps we should have a special block of Epigraphical Alphanumeric 
 Symbols, to go with the Mathematical..., for which epigraphers can 
 propose all manner of glyph variants which they might find useful, while 
 the rest of us ignore these blocks get on with encoding our texts using 
 the existing Hebrew, Latin etc blocks with markup for glyph variants.

That's an approach which would probably be workable.

Two reasons that variation selectors were mentioned are because
we have some precedent for variation selectors being used for
specific glyph forms for certain math symbol characters.  And,
variation selectors are supposed to be ignored in searching
and indexing, more or less.  (Default ignorable)

So, that approach might meet epigraphers' needs while enabling
painless cross-variant searching, and still permit scholars to
get on with encoding their texts as they see fit.

Best regards,

James Kass
.



Re: Ancient Northwest Semitic Script

2003-12-26 Thread jameskass
.
Dean Snyder responded to Michael Everson,

 Sounds very similar to the development of the Latin script variants,
 doesn't it?


Aren't there many common threads in the development of writing
systems?
 
 Should Latin be separately encoded?
 
 Latin *has* been separately encoded.
 
Not the Latin that is comparable to the Phoenician we are talking about.

(smile) If you're referring to Old Italic, it's in Plane One.

 Ancient Latin, as a parent script, is roughly analogous to the Phoenician
 under discussion. Ancient Latin does not have a J, U, or W in it, and yet
 Unicode, in the Latin block, has LATIN CAPITAL LETTER J, etc.

Some modern languages use extensions to the Latin script.  Others,
like some Polynesian languages, use only a subset.

 These are typically either paleographers, who are more interested in
 emphasizing glyphic variation than commonality,

Is it possible that paleographers are interested in representing and 
reproducing stone inscriptions accurately?  Could it be said that
paleographers must be aware of commonality as well as variance?

 or they are script
 taxonomists intent on delineating lines of derivation and innovation. 

Taxonomy, from the Greek taxis, arrangement + nomos, law.  It shouldn't
be much of a semantic stretch to say that some Unicoders are taxonomists.  
So, hopefully there's nothing really wrong with taxonomy.

 In
 neither case are they encoders, 

Aren't they?  The process is open and experts of any persuasion are
generally welcomed.  Besides, would it be fair to say that many
paleographers and script taxonomists have been interested in computer 
encoding all along?

 and in neither case do they use the word
 script with that meaning invested in it by Unicodists.

That may be.  But, in either case it is hoped that the needs of script
taxonomists and paleographers won't be disregarded.

 Well I, for one, prefer to read in more paleographically relevant
 renderings; and fonts combined with markup will, of course, take care of
 everything.

That's not very useful in plain text.  Unicode is an encoding standard for 
plain text.

 The same can be said for the Indic and Philippine and other scripts, 
 yet we (properly) encoded them. Some of the nodes on the tree show 
 enough variation to warrant separate encoding.
 
 But not the Phoenician, Punic, Moabite, Ammonite, Old Hebrew, and Old
 Aramaic nodes. In fact, the glyphic, or paleographic, variation is so
 slight at times between texts in these languages and dialects, that it is
 the extra-script evidence that is diagnostic for identification. 

Quoting from N2311.PDF:

quote
Phoenician encompasses:
 Proto-Sinaitic/Proto-Canaanite
 Punic
 Neo-Punic
 Phoenician proper
 Late Phoenician cursive
 Phoenician papyrus
 Siloam Hebrew
 Hebrew seals
 Ammonite
 Moabite
 Palaeo-Hebrew
end quote

quote
...most of the scripts are so similar that there doesn't seem to be any
point to encoding them separately.
end quote

Best regards,

James Kass
.



Re: Aramaic unification and information retrieval

2003-12-22 Thread jameskass
.
Quoting from:
http://www.jewishencyclopedia.com/view.jsp?artid=1308letter=A

quote
...  In the letter מ the original bent stem was curved upward still 
more until it reached the upper horizontal stroke, so that the 
final Mem to-day has the form ם. The Palmyrene script possesses 
a final Nun with a lengthened stem; the Nabatean contains similarly 
final Kaph, Nun, Ẓade, and Shin, and further a closed final Mem 
and final He. ...
end quote

So, apparently we have contextual forms which differ a bit between
scripts.  (Hebrew has final KAF, MEM, NUN, PE, and TSADI.)

***

If ancient Hebrew and modern Hebrew were the same script, we
wouldn't need the modifiers, we could just say Hebrew and
everyone would know what we were talking about.

***

The opening line from the Moabite Stone (Mesha Stele) could be
expressed as ANK MSO BN KMSMLD MLK MAB, but that's not
a compelling argument in favor of unifying Phœnician and Latin.
Likewise, the fact that some members of the user communities
often transcribe such inscriptions into modern Hebrew is not
a compelling argument in favor of unifying ancient and modern
Hebrew.

***

If it's perfectly acceptable to write old Aramaic using modern
Hebrew glyphs, would the converse also be true?

In other words, would it be perfectly acceptable to use old Aramaic
glyphs along with cantillation marks and modern Hebrew points to
represent the Bible?  Or, would it be a travesty to do so?

***

If referring generically to many of the Indic scripts won't float
your boat, suppose we consider the Philippine scripts.  Some of
these are arguably glyph variants of each other, yet they
were not unified.  (Well, the punctuation was unified.)

***

Referring to the 2311.PDF document, it should be noted that the
phrase Further research is required is used twice in the short
section on Aramaic.  Michael Everson's submission doesn't strike
me as by gosh and by golly - this is how we're going to do it,
but rather seems to be a preliminary report offering guidelines
derived from respected sources.

***

Ideally, input would be solicited from members of the user
communities who have read Daniels and Bright (as well as other
germaine publications) and who know something about computer
encoding and the Unicode Standard.  (smile)  Rara avis.

Best regards,

James Kass
.



Re: Aramaic unification and information retrieval

2003-12-20 Thread jameskass
.
Peter Kirk wrote,

  There are no distinctive features other than glyph shapes 
  distinguishing Hebrew, Phoenician, Samaritan and Early Aramaic as  
  proposed in ...

Couldn't the same observation be made about many of the Indic scripts?

Best regards,

James Kass
.



RE: Swastika to be banned by Microsoft?

2003-12-14 Thread jameskass
verdy_p @ wanadoo.fr wrote,

 ... For now African languages are only representable on
 Windows with Arial Unicode MS ...

What utter nonsense!  Bosh.  Balderdash.  ␈.

Yet another blatantly false statement from a generally unreliable 
source.

This is really tiresome.
.
 



RE: Swastika to be banned by Microsoft?

2003-12-14 Thread jameskass
.
 May be the Unicode name should not be swastika but a transliteration of an
 Asian name (Tibetan, Chinese Pinyin...), ...

How about Sanskrit?

***

The swastika was also used as a symbol in scouting.  
(As in Boy Scouts.)

http://www.pinetreeweb.com/bp-can3.htm

http://www.scouting.milestones.btinternet.co.uk/badges.htm

Best regards,

James Kass
.



Re: Swastika to be banned by Microsoft?

2003-12-14 Thread jameskass
.
Mark E. Shoulson wrote,

 I'm embarrassed to admit it, but I find myself thinking that the 
 swastika, THE Nazi swastika, right-facing, tilted 45°, proper ratio of 
 stroke-thickness, the whole deal, should be encoded in Unicode.  As a 
 matter of history: it *is* a symbol of profound significance in the 
 history of the world.

Indeed it is.

Perhaps what is needed is a new combining character.

Maybe some kind of COMBINING REPLACEMENT WHITEWASH 
CHARACTER could be proposed.  It could be applied by the system 
wherever appropriate, as deemed by user preferences or regional 
insistence, in order to obliterate any characters or character strings 
which might offend.

One suggestion for a display glyph would be an ostrich with its 
head buried.

It is said that one who ignores history is doomed to repeat it.

Or, we might consider that the same characters used to represent 
holy books or love poetry can also render 'Mein Kampf'.

Ultimately, the ability to freely and openly exchange information
and ideas may prove to be harmful only to despots and the like.

Best regards,

James Kass
.



RE: Swastika to be banned by Microsoft?

2003-12-14 Thread jameskass
.
James Kass wrote,

 Yet another blatantly false statement from a generally unreliable 
 source.

That was not only ad hominem, it was probably redundant, as well,
and I'm sorry for it.  It would have been better left unsaid.

Best regards,

James Kass
.



RE: Swastika to be banned by Microsoft?

2003-12-14 Thread jameskass
.
Philippe Verdy wrote,

   ... For now African languages are only representable on
   Windows with Arial Unicode MS ...
  
  What utter nonsense!  Bosh.  Balderdash.
 
 I spoke only of the default core fonts that come with Windows.

It's too bad that Arial Unicode MS is not a Windows default
core font, then.

 So please stop insults...

I'll try to restrain myself.

 ... there was no offense in what I said ...

Except that it was untrue.

Best regards,

James Kass
.



RE: character map in Microsoft Word

2003-12-12 Thread jameskass
.
Philippe Verdy wrote,

 Note that Windows keyboard drivers do not support input of Unicode code
 points.

Keyboard DLLs for modern Windows systems are Unicode-based.

 What you have is (below, replace AltGr by Alt+Ctrl on US keyboards that
 don't have a AltGr key):

Alt+Ctrl + any sequence of digits from the numeric key pad produces
nothing at all.  (At least not on Win XP.)

The right-hand Alt key on U.S. keyboards is the AltGr key, even 
though the physical keyboard may not be labelled as such.

Either the right or left Alt key plus digits from the numeric key pad
can be used to insert special characters.

As Chris Jacobs mentioned, in WordPad (on Win XP, at least) Alt plus
8531 (from the digital key pad) inserts the 1/3 character (U+2153).
Chris said this doesn't work in Outlook Express, though.  It also
doesn't work in Notepad.

Best regards,

James Kass
.



Re: Glottal stops (bis) (was RE: Missing African Latin letters (bis))

2003-12-07 Thread jameskass
.
John Hudson wrote,

 ... If I'd been asked to design upper- and lowercase forms from 
 scratch, I would make the cap form the same height as e.g. P, 
 and as massive, and I would make the lowercase form a *descending* 
 letter, with the bowl filling the x-height  and with a straight 
 descender terminating like that of p.

Interesting approach.  This should look quite pleasing in running 
text.

If a new upper case glottal stop character were added to Unicode,
I'd move the existing glottal stop glyph to the new upper case
code point and make a lower case glyph which would match the
t height and be a bit narrower than the upper case.  This would 
represent a typographic compromise offering a distinction 
between cases while preserving, more or less, user expectations 
for existing data display.

Best regards,

James Kass
.



RE: MS Windows and Unicode 4.0 ?

2003-12-03 Thread jameskass
.
Arcane Jill wrote,

(Ah, well, it was apparently in rich text (or something other
than plain text) format, so I guess I can't copy/paste it
into my reply, and now it isn't visible on the screen, so
I will have to do this from memory...)

 ... calligraphic (is that a word?) ...

Yes.

Best regards,

James Kass
.



Re: MS Windows and Unicode 4.0 ?

2003-12-03 Thread jameskass
.
Edward H. Trager wrote,

 WHY NOT just *give* away the Linear B, Ogham, Cherokee, and lots 
 ...

 However, I would not suggest giving those fonts away to an OS vendor
 like ...

It's hard to sell something you're giving away.

Best regards,

James Kass
.



RE: Oriya: mba / mwa ?

2003-12-01 Thread jameskass
.
Michael Everson wrote,

 You should implement according to what is on page 238 of the Unicode 
 Standard, and if there are people in India who think otherwise they 
 had better argue their case convincingly to the UTC.
 
 I don't personally care which character is used.
 
 I *do*. Someone at the TDIL has decided he's got a bright idea about 
 how to use WA, and that changes the traditional orthography.

The TDIL document was published in April of 2002.  At that time,
page 238 of TUS 4.0 did not exist.  The authors of the Oriya section
of the report really only had the sparse information on page 227 of 
TUS 3.0 upon which to expand.

Perhaps many of us on this list have, in the past, attempted to
exptrapolate the direction the consortium might take -- only to
be surprised when a different path is chosen.

Other than the fine work by Maurice Bauhahn on Khmer, the existence
of these comprehensive TDIL reports written by technically-oriented
expert members of the script user communities who also are familiar
with computer encoding issues *and Unicode* appears to be unprecedented.

We should rejoice that these TDIL reports exist and urge the
various authors to contribute to discussions on any edge-case 
issues.

Rather than revising history or revising encoding practices, maybe 
the TDIL reports could be revised where appropriate.

Best regards,

James Kass
.



RE: Complex Combining

2003-12-01 Thread jameskass
.
Jonathan Coxhead wrote,

 ...http://www.doves.demon.co.uk/atomic.html. 

Quoting from the page,
... the longest word you can write upside-down in Unicode 
is `aftereffect?). 

In UTF-8:
zʎxʍʌnʇsɹbdouɯլʞſ̣ı̣ɥɓɟəpɔqɐ

Best regards,

James Kass
.



Re: Latin Capital Letter Turned T/K?

2003-11-28 Thread jameskass
 Oh, yes, pictures of the characters: due to the miracles of modern 
technology,
 I can include them in plain text, but you'll have to stand on your head (-:
 
 T K

LOL.

Aren't these turned letters (and several others) used in the Fraser
script?

Best regards,

James Kass
. 



Re: Oriya: mba / mwa ?

2003-11-28 Thread jameskass
.
Peter Constable wrote,

 The question, then, is how MBA should be encoded: as 
 0B2E MA, 0B4D VIRAMA, 0B2C BA , or as  0B2E MA, 0B4D VIRAMA, 0B71 WA
 ?
 

MA + VIRAMA + BA, according to TUS 4.0, page 238.

Best regards,

James Kass
.



Re: Oriya: nndda / nnta?

2003-11-27 Thread jameskass
.
Michael Everson wrote,

 It would be just so cool if you would say what page and column 
 and line of that document you are referring to.

Subjoined DDA (0B21) and TA (0B24) seem to be mentioned on page 8
of 59 in the right-hand column under the heading Consonant Signs.

...It must be noted here that when the consonant sign  ...  is
attached below NNA (0B23) it is pronounced as DDA (0B21).  In other
words the same consonant sign represents both TA (0B24) and DDA
(0B21).

Ah, there's nothing quite like glyphic ambiguity...

Best regares,

James Kass
.



Re: Oriya: nndda / nnta?

2003-11-27 Thread jameskass
.
Peter Constable wrote,

 ?The Indian gov?t doc at http://tdil.mit.gov.in/ori-guru-telu.pdf 
 describes the conjunct shown in the attached PNG as being pronounced 
 as though NNA + VIRAMA + DDA (0B21). The component attached to the 
 NNA otherwise represents TA (0B24), however.

 My question is this: should this conjunct be encoded as  0B23 NNA, 
 0B4D VIRAMA, 0B24 TA  or as  0B23 NNA, 0B4D VIRAMA, 0B21 DDA ?

Page 13 of 59, right-hand column shows four examples of the
subjoined reduced TA under TA (Sign).

The only example given for the subjoined reduced DDA immediately
follows.  It seems clear from the illustration that the authors of the
document expect that the glyph in question would be encoded as 
NNA + VIRAMA + DDA.

Since base letters TA and DDA are similar in appearance, their
reduced form(s) could be identical.  If this is the case, then
probably NNA + VIRAMA + DDA.  

Or, if it's supposed to be the reduced form of TA and is only
*pronounced* like DDA when it's under NNA, then probably
NNA + VIRAMA + TA.

Best regards,

James Kass
.





RE: Request

2003-11-25 Thread jameskass
.
Peter Constable wrote,

 On Behalf
 Of Ritu Malhotra
 
 Could someone kindly help me by providing an exe(Font utility) that
 will not only edit open type fonts(ex: Mangal.ttf)...
 
 Making changes to mangal.ttf or other Microsoft fonts would be in
 violation of the end-user license agreement which you agreed to when you
 installed the software, and is illegal.

Peter is absolutely correct.  And, it's not just Microsoft fonts
which mustn't be altered by the end-user.

Most font developers restrict rights on their fonts.  Obtaining a 
legal copy of a font only grants the user the right to use the font; 
not to make changes.

Best regards,

James Kass
.
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 On Behalf
  Of Ritu Malhotra
 
  Could someone kindly help me by providing an exe(Font utility) that
 will not
  only edit open type fonts(ex: Mangal.ttf)...
 
 Making changes to mangal.ttf or other Microsoft fonts would be in
 violation of the end-user license agreement which you agreed to when you
 installed the software, and is illegal.
 
 If you want a Devanagari font that uses a non-standard encoding, there
 are plenty of them out there that you can use without doing anything
 illegal. See, for instance,
 http://www.sil.org/computing/fonts/LANG/HINDI.HTML.
 
 
 
 Peter
  
 Peter Constable
 Globalization Infrastructure and Font Technologies
 Microsoft Windows Division
 
 
 
 
 
 




RE: Request

2003-11-25 Thread jameskass
.
John Hudson wrote,

 If in doubt, check your license agreement.

Windows users can check the licensing material on many newer fonts 
with a program called TTFEXT.EXE, freely available from Microsoft:

http://www.microsoft.com/typography/property/property.htm

It's too bad that this feature is not included by default with the
font folder.

Best regards,

James Kass
.



RE: Definitions

2003-11-25 Thread jameskass
.
Peter Constable wrote,

 James: 

  Inside a program, for instance...
 
 This is *very* faulty logic. ...

Jeepers!

 ... Variable names exist in source code only,
 and have nothing whatsoever to do with the data actually processed.

Exactly.  Variable names are always internal while data may be
external.

 You're also referring to an assigned character in your example, not a
 PUA codepoint. ...


Since it was supposed to draw a correlation between ASCII-conformant
and Unicode-conformant, an assigned ASCII character was used in the
example.  After all, ASCII didn't have much to offer in the way
of Private Use Areas or unassigned code points.

 A software product could assign every single PUA codepoint to mean some
 kind of formatting instruction, and insert these into the text like
 markup. In that case, a user's PUA characters will be re-interpreted by
 that software as formatting instructions. 

HTML manages to use ASCII characters as formatting mark-up yet
still allows ASCII text to be processed as expected.

Briefly, it's my opinion that applications which claim to support
and comply with Unicode should not 'step on' Unicode text.  Any
loopholes in the 'letter of the law' which allow applications to
mung or reject Unicode text should be plugged.

Best regards,

James Kass
.



Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread jameskass
.
Gary P. Grosso wrote,


 On Win2K, Character Map (charmap.exe) does not show anything 
 beyond the BMP.  I haven't tried this on XP.

Have you tried BabelMap or BabelPad?  Both can show non-BMP...

http://uk.geocities.com/BabelStone1357

Best regards,

James Kass
.
 Since we're comparing notes on font tools, I recently was
 asked to look over an experimental font which had, among
 other things, characters in the Supplemental Multilingual
 Plane and used CFF format.  (I had to look up what CFF
 format even was.)
 
 PFAEdit was able to load the font.  At least I could see the 
 SMP characters; I didn't attempt any editing, kerning, etc.
 I've always been fairly impressed with PFAEdit, which probably
 deserves a name which reflects the fact that it goes well
 beyond PFA files or even Type 1 fonts.  In fact, I'd like
 to see it ported to Windows.
 
 Font Creator Program couldn't load the font due to the CFF 
 format, which was disappointing, because I like FCP's
 interface and other features, and was hoping to get an up
 close and personal look at some of the glyphs, which seemed
 to have some sort of height anomaly.
 
 On Win2K, Character Map (charmap.exe) does not show anything 
 beyond the BMP.  I haven't tried this on XP.
 
 Gary
 
 At 09:00 AM 11/20/2003 -0500, Mark E. Shoulson wrote:
 I haven't tested this myself, but from a look at the source code, it appears 
 that pfaedit (pfaedit.sourceforge.net) can generate format12 TTFs.  (Open 
 Source, for UNIX).
 
 ~mark
 
 On 11/20/03 03:12, Arcane Jill wrote:
 
 
 Is anyone able to answer this? I for one would really like to know.
 Thanks
 
 
  -Original Message-
  From: Frank Yung-Fong Tang [mailto:[EMAIL PROTECTED]
  Sent: Thursday, November 20, 2003 2:29 AM
  To: John Jenkins
  Cc: [EMAIL PROTECTED]
  Subject: Re: creating a test font w/ CJKV Extension B characters.
 
 
  Does FontLab support generating TTF in format12 (32 bits)?
  Which cheaper solutions could  generating TTF in format12 (32 bits)?
 
 
 ---
 Gary Grosso
 Arbortext, Inc.
 Ann Arbor, MI, USA
 
 



  1   2   >